-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Playbook for hub integration #173
Comments
CC @haydentherapper as we were discussing this too |
Thanks for the writeup @laurentsimon. FYI on our side we usually integrate standards/formats/3rd-party tooling on the hub when we observe sufficient usage within the community to justify the integration. Nonetheless, the places where I see this has the most value is in the web UI of the Hub & in The easy part imo is displaying the signer or some metadata contained in the cert in the UI. Re:
I'm not convinced that you must be the owner of a certificate to upload it to your repository; usage of a non-owned model could definitely be legit, e.g. a fork of a repo. And by owned I mean the identity linked with your account must be present in the cert. Side question, does a certificate support a signing chain? Say Meta releases Llama-4 and I fine tune that model, will I need to overwrite the cert entirely or can there be proof that I modified the original file? I agree though that matching a Hub username to his private key / cryptographic identity is a useful feature (sorry if that's incorrect, not very up to speed on how the certs are created). I would be careful though before displaying "safe" or "unsafe" badges all over the UI. My gut feeling is telling me I'd rather rely on an external identity provider to confirm "ownership" than TOFU. |
Good point. Maybe this can be viewed as a dependency rather than model ownership / maintainer? Do you have example repos we could look at?
For signing, I'm not sure, because you can't trust what the signer says, ie how do you verify that they're not lying about how they transformed the original model?
An identity provider provides unforgeable proof of an entity's identity. TOFU is for mapping an identity to a repository / model. So they go hand-in-hand. |
+1 to Laurent. We need SLSA to record the provenance of downstream models. |
We need a playbook to explain how a hub would integrate our library and what verification needs to be supported. Here is a list of integration paths (not necessarily in the right priority order, which may depend on hubs)
Hub + UX
The hub verifies the signature and adds a "Signed by X" in the UX. The hub accepts any signer
Hub with identity enforcement
Hub enforces that the right signer has signed the model. The "right" signer is not always obvious. A proposal is to have TOFU (trust on first use), ie trust the first identity that uploaded the model. Any change in identity needs to be challenged via, eg 2FA. Because certain models may be uploaded by different users, we may have to trust on first k signers. This may be a configuration settings that, if it need to be changed, requires a user challenge (2FA). Details TBD
Hub provides list of signing history
The hub keeps metadata and evidence (signatures) about who signed the model for different model versions. The list is displayed via UX, may be available thru a REST API. This allows anyone to monitor for signing identity changes.
Hub provides immutable list of signing history
Like the previous point, but stores the history in a transparency log. This reduces the trust in the hub. This can also pave the way to binary (model) transparency given the right type / info in log.
Client verify with expected identity
The hub framework API (e.g., huggingface API) lets users verify model given a set of identities (PKI, Sigstore identity). This is mostly useful for users who know who sign the model, so likely most useful when a user verified their own model.
Clients trust a third party to verify
In this case, the client trusts a third party for verification and does not provide a signer identity to the framework API. Trusting a third party is often necessary because the identities of the legitimate signers is difficult to determine, and legitimate changes to signer identities is hard to assess. The third party would typically be the hub (which performs verification, ideally with "identity enforcement" point 2 above) and / or the monitors (when a transparency log is used to monitor identities). The hub may attest to verifications by issuing an attestation "I, the hub, have verified model X [with config Z], and the signature for this model can be found at Y". Z could be "identity verified and enforced thru 2FA, etc". The attestation may be verified on the client side when loading a model. (Trusting the TLS connection to the hub is a good first step without the need to verify an attestation from a hub / transparency log)
@McPatate please feel free to share your thoughts.
The text was updated successfully, but these errors were encountered: