Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hashing API for model signing #188

Merged
merged 18 commits into from
May 30, 2024

Conversation

mihaimaruseac
Copy link
Collaborator

@mihaimaruseac mihaimaruseac commented May 22, 2024

Summary

This is the lowest layer of the model signing API (#172). It only supports computing the digest of a single object, in a flexible way (#140). We add a precomputed hasher to allow benchmarking (remove startup, etc. time; this will come up later). We add one memory hasher for now, but we might add others later (#13).

The important part of the API is under hashing/file. We have 2 ways to hash a file: either completely or by passing a span (start, end pair) and only hashing the contents within the span. Next level API will generate corresponding spans to hash a file in a distributed fashion. This is also useful for models that are loaded in a distributed way: each host is able to hash only the part that it accesses.

Tested added in this CL, microbenchmarks will follow later.

Drive-by-fixes:

  • add a missing .gitignore
  • add a missing copyright notice and fix some typos in the existing ones (separated to Fix typo in license stub #189 for convenience)
  • fix serialize_test.py import to work with directory packages. We will remove this later once the API is implemented and code migrated, with testing in the proper places
  • increase flake8 max line length to 80 to match Google style

Release Note

NONE

Documentation

NONE

mihaimaruseac and others added 10 commits May 28, 2024 05:00
Signed-off-by: Mihai Maruseac <[email protected]>
Signed-off-by: Mihai Maruseac <[email protected]>
Signed-off-by: Mihai Maruseac <[email protected]>
Signed-off-by: Mihai Maruseac <[email protected]>
Signed-off-by: Mihai Maruseac <[email protected]>
Signed-off-by: Mihai Maruseac <[email protected]>
Signed-off-by: Mihai Maruseac <[email protected]>
Signed-off-by: Mihai Maruseac <[email protected]>
mihaimaruseac and others added 5 commits May 28, 2024 10:33
Remove bad design patterns around the ahshing engine concerns.

Signed-off-by: Mihai Maruseac <[email protected]>
Signed-off-by: Mihai Maruseac <[email protected]>
Signed-off-by: Mihai Maruseac <[email protected]>
Signed-off-by: Mihai Maruseac <[email protected]>
Signed-off-by: Mihai Maruseac <[email protected]>
Signed-off-by: Mihai Maruseac <[email protected]>
@mihaimaruseac
Copy link
Collaborator Author

This is now ready for review.

Copy link
Collaborator

@laurentsimon laurentsimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

model_signing/hashing/file.py Show resolved Hide resolved
@mihaimaruseac mihaimaruseac merged commit 220d5c7 into sigstore:main May 30, 2024
18 checks passed
@mihaimaruseac mihaimaruseac deleted the api-hashing branch May 30, 2024 00:41
mihaimaruseac added a commit to mihaimaruseac/model-transparency that referenced this pull request Jun 3, 2024
Missed this in sigstore#188, but found out I need it when working on sigstore#190. The
`serialize_v0`/`serialize_v1` methods all had headers in front of the
files, so we need to do that too. Will update usage of header on sigstore#190
shortly.

As a benefit, we can simulate hashing a file with a header for the first
portion of the file and a sharded hasher for the remainder of the file.

Signed-off-by: Mihai Maruseac <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants