This repository contains the code necessary to reproduce results in our paper Empirical Methods for Estimating Privacy.
- We use
docker
to create a reproducible execution environment. - We use
luigi
to express our privacy estimation algorithms as a DAG of dependencies. We also useluigi
to define our experiments. The advantages of expressing experiments and the privacy estimation algorithms as a DAG is that theluigi
scheduler can avoid re-computing previouslly computed results, and it can compute independent nodes in parallel. - Each experiment described in our paper has a corresponding jupyter notebook. Each experiment can be entirely reproduced by selecting "Kernel > Restart & Run All"
We hope that our privacy estimation algorithms, and experiment framework can be used to study 'privacy problem settings' that we haven't thought of. :)
- Clone & Build+Run Docker (the build takes 5-10 min):
git clone https://github.com/maksimt/empirical_privacy
cd empirical_privacy
docker-compose up
- Navigate to the jupyter-notebook running inside the docker container.
- Get the jupyter token from the console output.
- Navigate to
127.0.0.1:8888
and enter the token you just got. - Open Notebooks/Experiment 1 -- Bootstrap Validation.ipynb.ipynb and run the cells in order from top to bottom.
luigi
is a python-based dependency specification framework.
It provides a central scheduler which makes it easy to parallelize the execution of a computation graph
while ensuring that work isn't duplicated and hardware is fully utilized.
We provide a framework that will orchestrate the experiments needed to measure empirical_privacy. The goal is to minimize the amount of code that needs to be written for a new problem setting, as well as take care of the implementation and testing for the key algorithms.
- The main task is to implement a GenSample subclass that overrides the
gen_sample(sample_number)
method. See the one-bit-sum example to start out, and then see row_distributed_svd. Problem-specific parameters can be passed in thedataset_settings
parameter. - Once that's done you can use the build_convergence_curve_helper to build a end-to-end pipeline with sensible defaults, or you can customize it by overriding the classes in the framework.
- To compute the targets using luigi they must be passed to
luigi.build
(see Notebooks for examples). These will typically need to communicate to a luigi scheduler server, which you can run by opening a terminal from Juptyer and runningluigid
. The scheduler will show you the progress of your computation on localhost:8082.
You may also be interested in my notes on integrating with PyCharm.