Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements read(buf[], offset, len) #19

Merged
merged 3 commits into from
May 7, 2024

Conversation

ahmarsuhail
Copy link
Collaborator

@ahmarsuhail ahmarsuhail commented May 2, 2024

Issue #, if available:

Implements read(buf[], offset, len) and hopefully avoids any off by one errors :)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

"Premature end of file. Did not expect to read -1 at position %s",
positionInCurrentBuffer));
}
public void setPositionInBuffer(long pos) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is making IOBlock mutable, whereas it was immutable before. I wonder if there is a way to make an IOBlock position-agnostic and just make it return a range of bytes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need the ability to set positions in the byte buffer. What is your concern with having a mutable IOBlock?

Comment on lines 76 to 82
ByteBuffer blockData = ioBlock.getBlockContent();

int numBytesToRead = Math.min(blockData.remaining(), numBytesRemaining);
blockData.get(buffer, offset, numBytesToRead);
nextReadPos += numBytesToRead;
numBytesRemaining -= numBytesToRead;
numBytesRead += numBytesToRead;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit can probably live in the IOBlock itself

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it belongs here, as it is related to "get me as many IOBlocks I need so I can fill up this buffer". Not really related to IOBlock

Comment on lines 59 to 75
try (InputStream inputStream = this.content.join().getStream()) {
while (numBytesRemaining > 0) {
numBytesToRead = Math.min(READ_BUFFER_SIZE, numBytesRemaining);
bytesRead = inputStream.read(buffer, 0, numBytesToRead);

if (bytesRead < 0) {
String message =
String.format("Unexpected end of stream: numRemainingBytes = %d", numBytesRemaining);
throw new EOFException(message);
}

if (bytesRead > 0) {
numBytesRemaining -= bytesRead;
blockContent.put(buffer, 0, bytesRead);
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think our main decision point is whether this should happen in a sync or async way.

I think this works, but you know your implementation best, so as long as the patterns remain forward-compatible and allow for changing this to async I think we are good.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, we are ok with this for now. Ofcourse implementations may change and all this code may disappear :(

int numBytesToRead;
int numBytesRemaining = this.bufferSize;
int bytesRead;
byte[] buffer = new byte[ONE_MB];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ONE_MB the best practice? Or just magic number for now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic number, S3A uses 64K. I do want to experiment with that and see if it makes a difference

@ahmarsuhail ahmarsuhail changed the title initial basic implementation of read(x, y, z) [DRAFT] Implements read(buf[], offset, len) May 2, 2024
@@ -130,6 +133,30 @@ public void testReadAndSeek() throws IOException {
}
}

@Test
public void testReadWithBuffer() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to see this randomised: seeking to random positions within the object and reading random sizes many times. Just so that we have some more coverage on patterns which we perhaps may not think about normally (I know the proper way to do this would be to implement property-based testing but agility dictates that's not the right focus right now -- but random is low hanging I feel like.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, will do

this.objectClient = objectClient;
this.s3URI = s3URI;
this.metadata =
objectClient.headObject(
HeadRequest.builder().bucket(s3URI.getBucket()).key(s3URI.getKey()).build());

if (blockSize > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when blockSize is 0 or negative?

return getBlockForPosition(pos).getByte(pos);
}

/**
* Reads request data ito the provided buffer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: into

Comment on lines +93 to +101
ioBlock.setPositionInBuffer(nextReadPos);
ByteBuffer blockData = ioBlock.getBlockContent();

int numBytesToRead = Math.min(blockData.remaining(), numBytesRemaining);
blockData.get(buffer, nextReadOffset, numBytesToRead);
nextReadOffset += numBytesToRead;
nextReadPos += numBytesToRead;
numBytesRemaining -= numBytesToRead;
numBytesRead += numBytesToRead;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We synced offline about this piece of implementation and where it can live in the future.

We are suspecting there might be a more elegant implementation (from testability/maintainability point of view) that naturally visits the relevant IOBlocks and does not need the IOBlocks to carry a position state. This is roughly (in pseudo-code, without corner cases):

while (I can read):
    ioBlock = getIOBlockForPosition(pos)
    numBytesRead = ioBlock.read(pos, buf)
    pos += numBytesRead

For now, we are focusing on agility + performance though.

Copy link
Contributor

@CsengerG CsengerG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Great to see this getting us closer to the baseline!

@ahmarsuhail
Copy link
Collaborator Author

Merging, comments will be addressed in the next PR (which will hopefully get us equal to baseline).

@ahmarsuhail ahmarsuhail merged commit 1351ee1 into awslabs:main May 7, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants