Inital commit

mc2-project · Aug 11, 2021 · a13e3d0 · a13e3d0
commit a13e3d0
Show file tree

Hide file tree

Showing 152 changed files with 31,357 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,10 @@
+# Generated by Cargo
+# will have compiled files and executables
+target/
+
+# Remove Cargo.lock from gitignore if creating an executable, leave it for libraries
+# More information here https://doc.rust-lang.org/cargo/guide/cargo-toml-vs-cargo-lock.html
+Cargo.lock
+
+# These are backup files generated by rustfmt
+**/*.rs.bk
diff --git a/README.md b/README.md
@@ -0,0 +1,197 @@
+<h1 align="center">Muse</h1>
+
+___Muse___ is a Python, C++, and Rust library for **Secure Convolutional Neural Network Inference Resilient to Malicious Clients**. 
+
+This library was initially developed as part of the paper *"[Muse: Secure Inference Reslient to Malicious Clients][muse]"*, and is released under the MIT License and the Apache v2 License (see [License](#license)).
+
+**WARNING:** This is an academic proof-of-concept prototype, and in particular has not received careful code review. Several components necessary for full security (but which don't affect benchmarks) are not completely implemented. Consequently, this implementation is NOT ready for production use.
+
+## Overview
+
+This library implements the components of a cryptographic system for efficient client-malicious inference on general convolutional neural networks as well as a model-extraction attack against semi-honest secure inference protocols based on additive secret sharing.
+
+These constructions utilize an array of multi-party computation and machine-learning techniques, as described in the [Muse paper][muse].
+
+## Directory structure
+
+This repository contains several folders that implement the different building blocks of Muse. The high-level structure of the repository is as follows.
+
+* [`python`](python): Python scripts for the model-extraction attack
+
+* [`rust/algebra`](rust/algebra): Rust crate that provides finite fields
+
+* [`rust/crypto-primitives`](rust/crypto-primitives): Rust crate that implements some useful cryptographic primitives
+
+* [`rust/experiments`](rust/experiments): Rust crate for running latency and communication experiments
+
+* [`rust/neural-network`](rust/neural-network): Rust crate that implements generic neural networks
+
+* [`rust/protocols`](rust/protocols): Rust crate that implements cryptographic protocols
+
+* [`rust/protocols-sys`](rust/crypto-primitives): Rust crate that provides the C++ backend for Muse's pre-processing phase and an FFI for the backend
+
+## Build guide
+
+The library compiles on the `nightly` toolchain of the Rust compiler. To install the latest version of Rust, first install `rustup` by following the instructions [here](https://rustup.rs/), or via your platform's package manager. Once `rustup` is installed, install the Rust toolchain by invoking:
+```bash
+rustup install nightly
+```
+
+Additionally, you will need to have the GCC, G++, pkg-config, OpenSSL, CMake, and Clang packages. On Ubuntu, these can be installed via:
+```bash
+sudo apt install pkg-config libssl-dev cmake g++ libclang-dev
+```
+
+After that, use `cargo`, the standard Rust build tool, to build the library:
+```bash
+git clone https://github.com/mc2-project/muse
+cd muse/rust
+cargo +nightly build --release
+```
+
+This library comes with unit and integration tests for each of the provided crates. Run these tests with:
+```bash
+cargo +nightly test
+``` 
+
+### Experiments
+
+The rest of this README will explain how to run experiments on the various components of Muse in order to reproduce the results provided in the paper.
+
+#### Tables 3/4 and Figure 10
+
+##### Authenticated correlations generator (ACG)
+
+To measure the cost of the ACG, first build the relevant binaries:
+```bash
+cargo +nightly build --bin acg-client --release --all-features;
+cargo +nightly build --bin acg-server --release --all-features;
+```
+
+Then, execute these commands to run the experiment:
+```bash
+# On the server instance:
+env RAYON_NUM_THREADS=2 cargo +nightly run --bin acg-server --release --all-features -- -m <0/1> 2>/dev/null > "./acg_time.txt"
+# On the client instance:
+env RAYON_NUM_THREADS=2 cargo +nightly run --bin acg-client --release --all-features -- -m <0/1> -i <server_ip> 2>/dev/null > "./acg_time.txt"
+```
+This will write out a trace of execution times and bandwidth used to `./acg_time.txt`.
+
+Note that the `-m` flag controls which model architecture is used: MNIST (0) or MiniONN (1).
+
+##### Garbling
+
+To measure the cost of garbling the ReLU circuits, first build the relevant binaries:
+```bash
+cargo +nightly build --bin garbling-client --release --all-features;
+cargo +nightly build --bin garbling-server --release --all-features;
+```
+
+Then, execute these commands to run the experiment:
+```bash
+# On the server instance:
+env RAYON_NUM_THREADS=2 cargo +nightly run --bin garbling-server --release --all-features -- -m <0/1> 2>/dev/null > "./garbling_time.txt"
+# On the client instance: 
+env RAYON_NUM_THREADS=2 cargo +nightly run --bin garbling-client --release --all-features -- -m <0/1> -i <server_ip> 2>/dev/null > "./garbling_time.txt"
+```
+This will write out a trace of execution times and bandwidth used to `./garbling_time.txt`.
+
+##### Triple Generation
+
+To measure the cost of triple generation for the CDS protocol, first build the relevant binaries:
+```bash
+cargo +nightly build --bin triples-gen-client --release --all-features;
+cargo +nightly build --bin triples-gen-server --release --all-features;
+```
+
+Then, execute these commands to run the experiment:
+```bash
+# On the server instance:
+env RAYON_NUM_THREADS=6 cargo +nightly run --bin triples-gen-server --release --all-features -- -m <0/1> 2>/dev/null > "./triples_times.txt";
+# On the client instance:
+env RAYON_NUM_THREADS=6 cargo +nightly run --bin triples-gen-client --release --all-features -- -m <0/1> -i <server_ip> 2>/dev/null > "./triples_time.txt"
+```
+This will write out a trace to `./triples_time.txt`. Note that the results from Figure 10 can be reproduced by varying the number of threads in the `RAYON_NUM_THREADS` environment variable, and additionally including the `-n 10000000` flag.
+
+##### Input Authentication
+
+To measure the cost of input sharing for the CDS protocol, first build the relevant binaries:
+```bash
+cargo +nightly build --bin input-auth-client --release --all-features;
+cargo +nightly build --bin input-auth-server --release --all-features;
+```
+
+Then, execute these commands to run the experiment:
+```bash
+# On the server instance:
+env RAYON_NUM_THREADS=3 cargo +nightly run --bin input-auth-server --release --all-features -- -m <0/1> 2>/dev/null > "./input_auth_times.csv"
+# On the client instance:
+env RAYON_NUM_THREADS=3 cargo +nightly run --bin input-auth-client --release --all-features -- -m <0/1> -i <server_ip> 2>/dev/null > "./input_auth_time.txt"
+```
+This will write out a trace to `./input_auth_time.txt`.
+
+##### CDS Evaluation
+
+To measure the cost of evaluating the CDS protocol, first build the relevant binaries:
+```bash
+cargo +nightly build --bin cds-client --release --all-features;
+cargo +nightly build --bin cds-server --release --all-features;
+```
+
+Then, execute these commands to run the experiment:
+```bash
+# On the server instance:
+env RAYON_NUM_THREADS=2 cargo +nightly run --bin cds-server --release --all-features -- -m <0/1> 2>/dev/null > "./cds_time.csv"
+# On the client instance:
+env RAYON_NUM_THREADS=2 cargo +nightly run --bin cds-client --release --all-features -- -m <0/1> -i <server_ip> 2>/dev/null > "./cds_time.txt"
+```
+This will write out a trace of execution times to  `./cds_time.txt`.
+
+##### Online phase
+
+To measure the cost of the online phase, first build the relevant binaries.
+(The code examples show how to do this for MNIST; for the MINIONN network, replace `mnist` with `minionn`)
+```bash
+cargo +nightly build --bin mnist-client --release --all-features;
+cargo +nightly build --bin mnist-server --release --all-features;
+```
+
+Then, execute these commands to run the experiment:
+```bash
+# Start server:
+env RAYON_NUM_THREADS=8 cargo +nightly run --bin mnist-server --release --all-features -- 2>/dev/null > "./mnist.txt"
+# Start client:
+env RAYON_NUM_THREADS=8 cargo +nightly run --bin mnist-client --release --all-features -- -i <server_ip> 2>/dev/null > "./mnist.txt"
+```
+This will write out a trace to `./mnist.txt`.  Note that the pre-processing phase times in this trace will be incorrect.
+
+#### Figures 8 and 9
+
+End-to-end experiments are currently implemented in the `end-to-end` branch (some bugs exist which are keeping this branch from being merged with `main` - these should be resolved soon).
+
+To run these experiments, use the same commands described in the `Online phase` section above.
+
+## License
+
+Muse is licensed under either of the following licenses, at your discretion.
+
+ * Apache License Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0)
+ * MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT)
+
+Unless you explicitly state otherwise, any contribution submitted for inclusion in Muse by you shall be dual licensed as above (as defined in the Apache v2 License), without any additional terms or conditions.
+
+[muse]: https://www.usenix.org/system/files/sec21fall-lehmkuhl.pdf
+
+## Reference paper
+
+[_Muse: Secure Inference Resilient to Malicious Clients_][muse]    
+[Ryan Lehmkuhl](https://www.github.com/ryanleh), [Pratyush Mishra](https://www.github.com/pratyush), Akshayaram Srinivasan, and Raluca Ada Popa    
+*Usenix Security Symposium 2021*
+
+## Acknowledgements
+
+This work was supported by:
+the National Science Foundation;
+and donations from Sloan Foundation, Bakar and Hellman Fellows Fund, Alibaba, Amazon Web Services, Ant Financial, Arm, Capital One, Ericsson, Facebook, Google, Intel, Microsoft, Scotiabank, Splunk and VMware
+
+Some parts of the finite field arithmetic infrastructure in the `algebra` crate have been adapted from code in the [`algebra`](https://github.com/arkworks-rs/snark) crate.
diff --git a/python/end_to_end.py b/python/end_to_end.py
@@ -0,0 +1,135 @@
+import numpy as np
+import sys
+
+
+def evaluate_linear_layer(layer, state):
+    return np.dot(layer, state)
+
+
+def evaluate_relu(state):
+    return (state > 0) * state
+
+
+def evaluate_malleated_relu(state, shift=20):
+    state = state + shift
+    state = (state > 0) * state
+    state = state - shift
+    return state
+
+
+def evaluate_masked_relu(state, mask_start, mask_stop, shift=20):
+    mask = np.full(state.shape, -shift)
+    mask[mask_start:mask_stop] = shift
+    state = state + mask
+    state = (state > 0) * state
+    mask = (mask > 0) * mask
+    state = state - mask
+    return state
+
+
+def evaluate_network_upto(linear_layers, state, stop_at):
+    for (i, layer) in enumerate(linear_layers[:stop_at]):
+        state = evaluate_linear_layer(layer, state)
+        # only perform ReLU if we're not the last layer
+        if i != len(layer) - 1:
+            state = evaluate_relu(state)
+    return state
+
+
+def evaluate_network_after_malleation(linear_layers, state, start_at,
+                                      shift=20):
+    for (i, layer) in enumerate(linear_layers[start_at:]):
+        state = evaluate_linear_layer(layer, state)
+        # only perform ReLU if we're not the last layer
+        if i != len(layer) - 1:
+            state = evaluate_malleated_relu(state, shift)
+    return state
+
+
+def unit_vector(dim, i):
+    vec = np.zeros(dim)
+    vec[i] = 1.0
+    return vec
+
+
+def extract_network(linear_layers):
+    starting_dim = linear_layers[0].shape[1]
+    num_classes = linear_layers[-1].shape[0]
+    initial_state = np.zeros((starting_dim, 1))
+
+    extracted_layers = []
+    num_queries = 0
+    # We iterate in reverse.
+    for (i, layer) in list(enumerate(linear_layers))[::-1]:
+        (num_rows, num_cols) = layer.shape
+        extracted_layer = np.zeros(layer.shape)
+        # If we haven't extracted the last layer yet:
+        if len(extracted_layers) == 0:
+            # this is the simple case
+            for col in range(0, num_cols):
+                state = initial_state
+                num_queries += 1
+                last_state = evaluate_network_upto(linear_layers, state, i)
+                # At this point, the `last_state` should be all-zero vector.
+                # To extract the column number `col`, we set the col-th column
+                # of `last_state` to be 1.
+                last_state = last_state + unit_vector(last_state.shape, col)
+
+                result = evaluate_linear_layer(layer, last_state)
+
+                # update extracted_layer with results
+                for row in range(0, num_rows):
+                    extracted_layer[row, col] = result[row, 0]
+        else:
+            # we are now recovering intermediate layers
+            next_matrix = np.identity(linear_layers[i+1].shape[1])
+            for _layer in linear_layers[i + 1:]:
+                next_matrix = np.dot(_layer, next_matrix)
+            assert(next_matrix.shape[0] == num_classes)
+
+            for col in range(0, num_cols):
+                for row in range(0, num_rows, num_classes):
+                    state = initial_state
+                    num_queries += 1
+                    state = evaluate_network_upto(linear_layers, state, i)
+                    # At this point, the `last_state` should be all-zero
+                    # vector.
+                    # 
+                    # To extract elements of the column `col`, we set the col-th
+                    # column of `last_state` to be 1.
+                    state = state + unit_vector(state.shape, col)
+
+                    state = evaluate_linear_layer(layer, state)
+
+                    # At this point, we have all the rows in column i.
+                    # However, because eventually we'll only obtain information
+                    # about `num_classes` rows at a time, we mask out the rest.
+                    start = (num_rows - num_classes
+                             if row + num_classes > num_rows
+                             else row)
+
+                    end = min(row + num_classes, num_rows)
+                    state = evaluate_masked_relu(state, start, end)
+
+                    # evaluate the rest of the network
+                    result = evaluate_network_after_malleation(linear_layers, state, i + 1)
+                    sub_matrix = next_matrix[:, start:end]
+                    result = np.linalg.solve(sub_matrix, result)
+                    extracted_layer[start:end, col] = result.reshape((num_classes,))
+        extracted_layers.append(extracted_layer)
+    extracted_layers.reverse()
+    print(num_queries)
+    return extracted_layers
+
+
+if __name__ == '__main__':
+    sizes = list(map(int, sys.argv[1].split("-")))
+    dimensions = [tuple([x]) for x in sizes]
+    layers = []
+    for (row, col) in zip(sizes[1:], sizes):
+        layers.append(np.random.rand(row, col))
+
+    extracted_layers = extract_network(layers)
+
+    for (layer, extracted_layer) in zip(layers, extracted_layers):
+        assert(np.allclose(layer, extracted_layer))
diff --git a/python/model.h5 b/python/model.h5