Add signing and verification CLI scripts.

Signed-off-by: Martin Sablotny <[email protected]>
sigstore · Jul 16, 2024 · 7e90e2f · 7e90e2f
1 parent 79848fe
commit 7e90e2f
Show file tree

Hide file tree

Showing 11 changed files with 448 additions and 402 deletions.
diff --git a/model_signing/README.md b/model_signing/README.md
@@ -19,157 +19,75 @@ Model signers should monitor for occurences of their signing identity in the
 log. Sigstore is actively developing a [log
 monitor](https://github.com/sigstore/rekor-monitor) that runs on GitHub Actions.
 
-![Signing models with Sigstore](images/sigstore-model-diagram.png)
+## Model Signing CLI
 
-## Usage
+The `sign.py` and `verify.py` scripts aim to provide the necessary functionality
+to sign and verify ML models. For signing and verification the following methods
+are supported:
 
-You will need to install a few prerequisites to be able to run all of the
-examples below:
+* Sigstore (sigstore.dev)
+* Bring your own key pair
+* Bring your own PKI
+* Skip signing (only hash and create a bundle)
 
-```bash
-sudo apt install git git-lfs python3-venv python3-pip unzip
-git lfs install
-```
+The signing part creates a [sigstore bundle](https://github.com/sigstore/protobuf-specs/blob/main/protos/sigstore_bundle.proto)
+protobuf that is stored as in JSON format. The bundle contains the verification
+material necessary to check the payload and a payload as a [DSSE envelope](https://github.com/sigstore/protobuf-specs/blob/main/protos/envelope.proto).
+Further the DSSE envelope contains an in-toto statment and the signature over
+that statement. The signature format and how the the signature is computed can
+be seen [here](https://github.com/secure-systems-lab/dsse/blob/v1.0.0/protocol.md).
 
-After this, you can clone the repository, create a Python virtual environment
-and install the dependencies needed by the project:
+Finally, the statement itself contains subjects which are a list of (file path,
+digest) pairs a predicate type set to `model_signing/v1/model`and a dictionary
+f predicates. The idea is to use the predicates to store (and therefor sign) model
+card information in the future.
 
-```bash
-git clone [email protected]:sigstore/model-transparency.git
-cd model-transparency/model_signing
-python3 -m venv test_env
-source test_env/bin/activate
-os=Linux # Supported: Linux, Windows, Darwin.
-python3 -m pip install --require-hashes -r "install/requirements_${os}".txt
-```
+The verification part reads the sigstore bundle file and firstly verifies that the
+signature is valid and secondly compute the model's file hashes again to compare
+against the signed ones.
 
-After this point, you can use the project to sign and verify models and
-checkpoints. A help message with all arguments can be obtained by passing `-h`
-argument, either to the main driver or to the two subcommands:
+### Usage
 
-```bash
-python3 main.py -h
-python3 main.py sign -h
-python3 main.py verify -h
-```
+There are two scripts one can be used to create and sign a bundle and the other to
+verify a bundle. Furthermore, the functionality can be used directly from other
+Python tools. The `sign.py` and `verify.py` scripts can be used as canonical
+how-to examples.
 
-Signing a model requires passing an argument for the path to the model. This can
-be a path to a file or a directory (for large models, or model formats such as
-`SavedModel` which are stored as a directory of related files):
+The easiest way to use the scripts directly is from a virtual environment:
 
 ```bash
-path=path/to/model
-python3 main.py sign --path "${path}"
+$ python3 -m venv .venv
+$ source .venv/bin/activate
+(.venv) $ pip install -r install/requirements.in
 ```
 
-The sign process will start an OIDC workflow to generate a short lived
-certificate based on an identity provider. This will be relevant when verifying
-the signature, as shown below.
-
-**Note**: The signature is stored as `<file>.sig` for a model serialized as a
-single file, and `<dir>/model.sig` for a model in a folder-based format.
-
-For verification, we need to pass both the path to the model and identity
-related arguments:
+## Sign
 
 ```bash
-python3 main.py verify --path "${path}" \
-    --identity-provider https://accounts.google.com \
-    --identity [email protected]
+(.venv) $ python3 sign.py --model_path ${MODEL_PATH} --method {sigstore, private-key, pki} {additional parameters depending on method}
 ```
 
-For developers signing models, there are three identity providers that can
-be used at the moment:
-
-* Google's provider is `https://accounts.google.com`.
-* GitHub's provider is `https://github.com/login/oauth`.
-* Microsoft's provider is `https://login.microsoftonline.com`.
-
-For automated signing using a workload identity, the following platforms
-are currently supported, shown with their expected identities:
-
-* GitHub Actions
-  (`https://github.com/octo-org/octo-automation/.github/workflows/oidc.yml@refs/heads/main`)
-* GitLab CI
-  (`https://gitlab.com/my-group/my-project//path/to/.gitlab-ci.yml@refs/heads/main`)
-* Google Cloud Platform (`SERVICE_ACCOUNT_NAME@PROJECT_ID.iam.gserviceaccount.com`)
-* Buildkite CI (`https://buildkite.com/ORGANIZATION_SLUG/PIPELINE_SLUG`)
-
-### Supported Models
-
-The library supports multiple models, from multiple training frameworks and
-model hubs.
-
-For example, to sign and verify a Bertseq2seq model, trained with TensorFlow,
-stored in TFHub, run the following commands:
+## Verify
 
 ```bash
-model_path=bertseq2seq
-wget "https://tfhub.dev/google/bertseq2seq/bert24_en_de/1?tf-hub-format=compressed" -O "${model_path}".tgz
-mkdir -p "${model_path}"
-cd "${model_path}" && tar xvzf ../"${model_path}".tgz && rm ../"${model_path}".tgz && cd -
-python3 main.py sign --path "${model_path}"
-python3 main.py verify --path "${model_path}" \
-    --identity-provider https://accounts.google.com \
-    --identity [email protected]
+(.venv) $ python3 verify.py --model_path ${MODEL_PATH} --method {sigstore, private-key, pki} {additional parameters depending on method}
 ```
 
-For models stored in Hugging Face we need the large file support from git, which
-can be obtained via
-
-```bash
-sudo apt install git-lfs
-git lfs install
-```
-
-After this, we can sign and verify a Bert base model:
-
-```bash
-model_name=bert-base-uncased
-model_path="${model_name}"
-git clone --depth=1 "https://huggingface.co/${model_name}" && rm -rf "${model_name}"/.git
-python3 main.py sign --path "${model_path}"
-python3 main.py verify --path "${model_path}" \
-    --identity-provider https://accounts.google.com \
-    --identity [email protected]
-```
-
-Similarly, we can sign and verify a Falcon model:
-
-```bash
-model_name=tiiuae/falcon-7b
-model_path=$(echo "${model_name}" | cut -d/ -f2)
-git clone --depth=1 "https://huggingface.co/${model_name}" && rm -rf "${model_name}"/.git
-python3 main.py sign --path "${model_path}"
-python3 main.py verify --path "${model_path}" \
-    --identity-provider https://accounts.google.com \
-    --identity [email protected]
-```
-
-We can also support models from  the PyTorch Hub:
-
-```bash
-model_name=hustvl/YOLOP
-model_path=$(echo "${model_name}" | cut -d/ -f2)
-wget "https://github.com/${model_name}/archive/main.zip" -O "${model_path}".zip
-mkdir -p "${model_path}"
-cd "${model_path}" && unzip ../"${model_path}".zip && rm ../"${model_path}".zip && shopt -s dotglob && mv YOLOP-main/* . && shopt -u dotglob && rmdir YOLOP-main/ && cd -
-python3 main.py sign --path "${model_path}"
-python3 main.py verify --path "${model_path}" \
-    --identity-provider https://accounts.google.com \
-    --identity [email protected]
-```
+### Examples
 
-We also support ONNX models, for example Roberta:
+#### Bring Your Own Key
 
 ```bash
-model_name=roberta-base-11
-model_path="${model_name}.onnx"
-wget "https://github.com/onnx/models/raw/main/text/machine_comprehension/roberta/model/${model_name}.onnx"
-python3 main.py sign --path "${model_path}"
-python3 main.py verify --path "${model_path}" \
-    --identity-provider https://accounts.google.com \
-    --identity [email protected]
+$ MODEL_PATH='/path/to/your/model'
+$ openssl ecparam -name secp256k1 -genkey -noout -out ec-secp256k1-priv-key.pem
+$ openssl ec -in ec-secp256k1-priv-key.pem -pubout > ec-secp256k1-pub-key.pem
+$ source .venv/bin/activate
+# SIGN
+(.venv) $ python3 sign.py --model_path ${MODEL_PATH} --method private-key --private-key ec-secp256k1-priv-key.pem
+...
+#VERIFY
+(.venv) $ python3 verify.py --model_path ${MODEL_PATH} --method private-key --public-key ec-secp256k1-pub-key.pem
+...
 ```
 
 ## Benchmarking

diff --git a/model_signing/main.py b/model_signing/main.py
diff --git a/model_signing/manifest/in_toto.py b/model_signing/manifest/in_toto.py
@@ -0,0 +1,85 @@
+# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+This package provides functionality to convert file based manifests
+to in-toto statements. It is necessary because sigstore does not
+support arbitrary payloads in DSSE envelopes.
+"""
+import pathlib
+
+from in_toto_attestation.v1 import statement
+from in_toto_attestation.v1 import resource_descriptor
+
+from model_signing.hashing.hashing import Digest
+from model_signing.manifest.manifest import FileLevelManifest
+from model_signing.manifest.manifest import FileManifestItem
+
+
+_PREDICATE_TYPE = 'model_signing/v1/model'
+
+
+def manifest_to_statement(
+        manifest: FileLevelManifest, algorithm: str
+        ) -> statement.Statement:
+    """
+    Converts a model signing FileLevelManifest to an
+    in-toto statement.
+
+    Args:
+        manifest (FileLevelManifest): the manifest to convert
+        algorithm (str): the used hash algorithm
+
+    Returns:
+        statement.Statement: the in-toto statement representing the manifest
+    """
+    subjects: list[resource_descriptor.ResourceDescriptor] = []
+    for k, d in manifest.digests.items():
+        s = resource_descriptor.ResourceDescriptor(
+            name=str(k),
+            digest={algorithm: d.digest_hex},
+        ).pb
+        subjects.append(s)
+    return statement.Statement(
+        subjects=subjects,
+        predicate_type=_PREDICATE_TYPE,
+        predicate={'signed_model': True},
+        )
+
+
+def statement_to_manifest(
+        statement: statement.Statement, algorithm: str
+        ) -> FileLevelManifest:
+    """
+    Converts an in-toto statement to a FileLevelManifest.
+
+    Args:
+        statement (statement.Statement): the in-toto statement
+        algorithm (str): the hash algorithm used
+
+    Returns:
+        FileLevelManifest: the resutling FileLevelManifest
+    """
+    items: list[FileManifestItem] = []
+    for s in statement.pb.subject:
+        # no support for multiple hashes
+        items.append(
+            FileManifestItem(
+                path=pathlib.Path(s.name),
+                digest=Digest(
+                    algorithm=algorithm,
+                    digest_value=bytes.fromhex(s.digest[algorithm]),
+                )
+            )
+        )
+    return FileLevelManifest(items)