-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implements read(buf[], offset, len) #19
Implements read(buf[], offset, len) #19
Conversation
"Premature end of file. Did not expect to read -1 at position %s", | ||
positionInCurrentBuffer)); | ||
} | ||
public void setPositionInBuffer(long pos) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is making IOBlock mutable, whereas it was immutable before. I wonder if there is a way to make an IOBlock position-agnostic and just make it return a range of bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need the ability to set positions in the byte buffer. What is your concern with having a mutable IOBlock?
ByteBuffer blockData = ioBlock.getBlockContent(); | ||
|
||
int numBytesToRead = Math.min(blockData.remaining(), numBytesRemaining); | ||
blockData.get(buffer, offset, numBytesToRead); | ||
nextReadPos += numBytesToRead; | ||
numBytesRemaining -= numBytesToRead; | ||
numBytesRead += numBytesToRead; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bit can probably live in the IOBlock itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it belongs here, as it is related to "get me as many IOBlocks I need so I can fill up this buffer". Not really related to IOBlock
try (InputStream inputStream = this.content.join().getStream()) { | ||
while (numBytesRemaining > 0) { | ||
numBytesToRead = Math.min(READ_BUFFER_SIZE, numBytesRemaining); | ||
bytesRead = inputStream.read(buffer, 0, numBytesToRead); | ||
|
||
if (bytesRead < 0) { | ||
String message = | ||
String.format("Unexpected end of stream: numRemainingBytes = %d", numBytesRemaining); | ||
throw new EOFException(message); | ||
} | ||
|
||
if (bytesRead > 0) { | ||
numBytesRemaining -= bytesRead; | ||
blockContent.put(buffer, 0, bytesRead); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think our main decision point is whether this should happen in a sync or async way.
I think this works, but you know your implementation best, so as long as the patterns remain forward-compatible and allow for changing this to async I think we are good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline, we are ok with this for now. Ofcourse implementations may change and all this code may disappear :(
int numBytesToRead; | ||
int numBytesRemaining = this.bufferSize; | ||
int bytesRead; | ||
byte[] buffer = new byte[ONE_MB]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is ONE_MB the best practice? Or just magic number for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Magic number, S3A uses 64K. I do want to experiment with that and see if it makes a difference
@@ -130,6 +133,30 @@ public void testReadAndSeek() throws IOException { | |||
} | |||
} | |||
|
|||
@Test | |||
public void testReadWithBuffer() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to see this randomised: seeking to random positions within the object and reading random sizes many times. Just so that we have some more coverage on patterns which we perhaps may not think about normally (I know the proper way to do this would be to implement property-based testing but agility dictates that's not the right focus right now -- but random is low hanging I feel like.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, will do
this.objectClient = objectClient; | ||
this.s3URI = s3URI; | ||
this.metadata = | ||
objectClient.headObject( | ||
HeadRequest.builder().bucket(s3URI.getBucket()).key(s3URI.getKey()).build()); | ||
|
||
if (blockSize > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when blockSize is 0 or negative?
return getBlockForPosition(pos).getByte(pos); | ||
} | ||
|
||
/** | ||
* Reads request data ito the provided buffer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: into
ioBlock.setPositionInBuffer(nextReadPos); | ||
ByteBuffer blockData = ioBlock.getBlockContent(); | ||
|
||
int numBytesToRead = Math.min(blockData.remaining(), numBytesRemaining); | ||
blockData.get(buffer, nextReadOffset, numBytesToRead); | ||
nextReadOffset += numBytesToRead; | ||
nextReadPos += numBytesToRead; | ||
numBytesRemaining -= numBytesToRead; | ||
numBytesRead += numBytesToRead; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We synced offline about this piece of implementation and where it can live in the future.
We are suspecting there might be a more elegant implementation (from testability/maintainability point of view) that naturally visits the relevant IOBlocks and does not need the IOBlocks to carry a position state. This is roughly (in pseudo-code, without corner cases):
while (I can read):
ioBlock = getIOBlockForPosition(pos)
numBytesRead = ioBlock.read(pos, buf)
pos += numBytesRead
For now, we are focusing on agility + performance though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Great to see this getting us closer to the baseline!
Merging, comments will be addressed in the next PR (which will hopefully get us equal to baseline). |
Issue #, if available:
Implements read(buf[], offset, len) and hopefully avoids any off by one errors :)
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.