-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a hashlib-esque wrapper class + slight general cleanup which was necessary #50
Conversation
Reviewer's Guide by SourceryThis pull request introduces a new File-Level Changes
Tips
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @jonded94 - I've reviewed your changes and they look great!
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🟢 Security: all looks good
- 🟡 Testing: 4 issues found
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.
src/crc32c/__init__.py
Outdated
def base64(self): | ||
return base64.b64encode(self.digest()).decode(encoding="ascii") | ||
|
||
def digest(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Document the use of big-endian byte order
The digest method uses big-endian byte order. Consider adding a comment to explicitly document this choice, as it can be important for interoperability and understanding the implementation.
def digest(self): | |
def digest(self) -> bytes: | |
""" | |
Returns the CRC-32C checksum as a 4-byte big-endian byte string. | |
""" | |
return self._checksum.to_bytes(4, "big") |
test/test_crc32c.py
Outdated
|
||
assert crc32c_hash.digest() != crc32c_hash_copy.digest() | ||
|
||
@pytest.mark.parametrize("data,digest,hexdigest,base64", [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Consider adding more diverse test data for parameterized tests
The parameterized tests currently cover a few specific cases. It would be beneficial to add more diverse test data, such as inputs with special characters, different lengths, and binary data, to ensure the Crc32cHash
class handles a wide range of inputs correctly.
@pytest.mark.parametrize("data,digest,hexdigest,base64", [ | |
@pytest.mark.parametrize("data,digest,hexdigest,base64", [ | |
(b"", b"\x00\x00\x00\x00", "00000000", "AAAAAA=="), | |
(b"23456789", b"\xbf\xe9\x2a\x83", "bfe92a83", "v+kqgw=="), | |
(b"Hello, World!", b"\x6f\xa1\x1e\x6c", "6fa11e6c", "b6EebA=="), | |
(b"\x00\xff\x55\xaa", b"\x51\x66\x29\x6c", "5166296c", "UWYpbA=="), | |
(b"a" * 1000, b"\x20\x4b\xc5\x39", "204bc539", "IEvFOQ=="), | |
]) |
test/test_crc32c.py
Outdated
crc32c_hash.update(x) | ||
self._check_values(crc32c_hash, digest, hexdigest, base64) | ||
|
||
def test_all(self, data: bytes, digest: bytes, hexdigest: str, base64: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Consider renaming test_all
to a more descriptive name
The method name test_all
is somewhat generic. Consider renaming it to something more descriptive, such as test_single_update
, to better convey the purpose of the test.
def test_single_update(self, data: bytes, digest: bytes, hexdigest: str, base64: str) -> None:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jonded94! Yes, I do like the idea. Even though hashlib
contains only secure hashes, I do recognise the utility of having this class in place.
Could you make sure the README
and CHANGELOG
files are updated accordingly? We also need some small, concise docstrings for the new code.
Okay, so, as you can see, a bit more commits flew in. I took the "bold" decision to really drop support for Python <3.7 because it enables quite a bunch of much cleaner implementation structures. Note that Python 3.6 is EOL since 2021-12-23 (and Python 3.7 is EOL since 2023-06-27, but supporting it is not that hard), so I think this really is not a too bad thing to do. Rocking a Python version this old is bad even just from a security perspective. This enabled me to use stub files only for the shared library part which simply has no other means of type hinting. All the other Python code is type hinted directly. The You will notice that there are some other changes that are more of a general cleanup. I'd really like to see them included and not again be moved out. Separating every little change in separate PRs is just of no value here (IMHO), compared to just do one, slightly larger cleanup in one go. EDIT: Notice that |
I'm a bit unsure about this |
@jonded94 thanks for the further updates. Indeed I would ideally have requested some of changes to come in separate PRs -- at least the general black + cleanup, and the Python 3.7 update. Unlike you, I do see value in doing these separately, as it allows discussions and changesets to be more focused, and helps with better bookkeeping. You are moving fast though, are very eager, and I'm currently on parental leave, so I can't act as fast as I'd like to. I thus prefer to be practical and don't hinder progress. I see nothing wrong with the further changes in principle. The push for 3.7+ is fine, and indeed I did ponder about doing it when dropping 2.7 the other day, but seeing how we still only had the C extension to annotate there was no further gain from restricting 3.x versions. I'll again take the liberty to force-push some small things (again, just trying to be practical). For instance, now that there's formatting, CI should enforce it, otherwise there's no point. I'll also add some text to the CHANGELOG and README, fix some minor things around docs, things like that. I'll also delete the |
We moved from Travis to GHA a long time ago, this was a leftover from old times. Signed-off-by: Rodrigo Tobar <[email protected]>
Signed-off-by: Rodrigo Tobar <[email protected]>
This will allow us more modern features in python code to be added to this package, like type annotations (3.5+) and dataclasses (3.7+). 3.7 is EOL already, but it's the most popular Python version our users are based on. Signed-off-by: Rodrigo Tobar <[email protected]>
Signed-off-by: Rodrigo Tobar <[email protected]>
rtobar: Let's also add it to CI so we enforce its usage. Signed-off-by: Rodrigo Tobar <[email protected]>
This will allow us to then add annotations in __init__ directly Signed-off-by: Rodrigo Tobar <[email protected]>
e8451c4
to
43d0179
Compare
Comment from rtobar: while crc32c is no not strictly a secure hash, it is useful to have a hashlib-like "hash" class for our checksum algorithm, as it allows it to more easily plug into code using hashlib hash objects. Signed-off-by: Rodrigo Tobar <[email protected]>
Signed-off-by: Rodrigo Tobar <[email protected]>
43d0179
to
5cb7d90
Compare
OK, I've addressed the points I mentioned above and force pushed, see diff in https://github.com/jonded94/crc32c/compare/e8451c4..add-hashlib-esque-wrapper-class. I re-arranged commits slightly so they are self-contained, and in a way that changes are introduced in the correct order (declare 3.7+ compatibility first, then use 3.7+ features). Other than what I mentioned before, I also removed the Let me know if you're happy with the changes, or if there's something else you think is missing. If happy, I'll merge and release. |
Hi @rtobar, thank you very much for your very fast responses and much effort that goes into these PRs!
Sorry that you had to do this, I usually push a lot of WIP commits, then let the review happen, implement necessary changes and then squash/split up into self-contained commits. Thank you for cleaning up my work! This all looks great, very happy to have this in. Thank you for the great, productive time doing this! Hope you can concentrate on your probably well deserved parental leave time. |
Now that this project is implemented as a python package (instead of a single-file distribution), I thought about adding some nice pythonic sugar on to it to make using it a bit nicer.
Internally, we're using this project to calculate hashes of large files and wrap it in something which already basically conforms to the interfaces given by the stdlib library
hashlib
.This means that there are some wrapper classes with helper functions such as
digest
,hexdigest
,update
, ..More info can be found here:
Instead of implementing this purely on our side, I asked myself why not to bring it upstream. The tests already are using modern
pytest
syntax and I also included additional type information in the stub file.Let me know what you think about it.
Summary by Sourcery
Introduce a Crc32cHash class with a hashlib-like interface for CRC32C hashing, enhance type annotations, and add comprehensive tests for the new class.
New Features:
Enhancements:
Tests: