Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crc polynomial #43752

Open
wants to merge 13 commits into
base: feature/storage/content-validation
Choose a base branch
from

Conversation

ibrahimrabab
Copy link
Contributor

Description

Implementation of CRC64 Polynomial for Content Validation.

@github-actions github-actions bot added the Storage Storage Service (Queues, Blobs, Files) label Jan 9, 2025
@azure-sdk
Copy link
Collaborator

API change check

API changes are not detected in this pull request.

long b0, b1, b2, b3;

// Load and XOR data with CRC
b0 = ByteBuffer.wrap(src, pData, 8).order(ByteOrder.LITTLE_ENDIAN).getLong() ^ uCrc0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If additional performance is needed converting the 8 bytes to a long could be done manually.

Copy link
Member

@kyleknap kyleknap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This is looking good. Really only had minor comments and questions.

I think it would be also worth to sync up with @jaschrep-msft and @jalauzon-msft at some point to see if they notice anything with the implementation since they implemented it for .NET and Python.


private static final long POLY = 0x9A6C9329AC4BC9B5L;

private static final long[] M_U1 = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking over the .NET and Python implementations:

They both represent these tables as unsigned 64-bit integers, but for us we are using longs, which is slightly different because it allows for negative numbers. We should double check if there is a specific reason unsigned integers were used for these implementations and see if we need to do something similar.

private static final long POLY = 0x9A6C9329AC4BC9B5L;

private static final long[] M_U1 = {
0x0000000000000000L,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of a nitpick, but it would be nice if these tables were ordered in columns of four so it is easier to compare these values with the Python and .NET values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately, Java defaults it to be a single value on each line whenever we build the package. It might be due to some underlying linting mechanism.

b3 >>>= 8;

// Repeat for remaining bytes
for (int i = 6; i >= 0; i--) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I noticed looking over the .NET implementation: https://github.com/Azure/azure-sdk-for-net/blob/c85cc69742b4a5a325eb2f077cbdaf548e7e6362/sdk/storage/Azure.Storage.Common/src/Shared/StorageCrc64Calculator.cs#L840-L891 was that it did not use any for loops when doing these XORs. I'd prefer what we have for readability but curious if there was a specific reason the .NET implementation was done that way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we explicitly unrolled the loop. Whoever built this way back in the day for the service did the perf testing and wanted it that way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good to know. Thanks for confirming. @ibrahimrabab once we get the checksum working fully and some more performance benchmarking in place, let's try unrolling it to see if it makes a difference performance wise in the implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants