See the file build_images.sh
.
Update the .dockerignore
file if need be, for example if the amount of files sent to the docker daemon during build is too large.
You should have the following docker volumes created:
pip-cache
: Pip caching directory, to avoid downloading same dependencies when relaunching a server.results
: Contains the metrics recolted by the server (and the client, technically)build-vol
: Contains the builds of the libraries to use.
Configuration takes place with environment variables. Look at the template file envs/template.env
, which contains these parameters:
BUG_NAME=EXPERIMENT_NAME CLIENT_MANUAL_DEPENDENCY=COMMIT_SHA CLIENT_PY_VERSION=3.5.6 EVALUATION_TYPE= MODEL_LIBRARY=pytorch CLIENT_LOG_DIR=/results/client NUMBER_OF_RUNS=2 PY_CACHE_DIR=/pip-cache DATA_SERVER_ENDPOINT=tcp://IP_ADRESS_OF_SERVER:90002 USE_BUILD_MKL=1 USE_CUDA=True
There are also the following environment variables that need to be added when launching an evaluation:
CHALLENGE
, denotes the name of the challenge to useNUM_CLASSES
denotes the output size of the network to build, if the network supports it. Tied toCHALLENGE
usually.MODEL_NAME
denotes the name of the model to usesudo docker run --rm --mount source=pip-cache,target=/pip-cache --mount source=results,target=/results --mount source=build-vol,target=/builds --gpus all --env-file /path/to/env/buggy.env --env CHALLENGE=NAME_OF_CHALLENGE --env NUM_CLASSES=NUMBER_OF_POSSIBLE_CLASSES --env MODEL_NAME=MODEL_TO_USE -it emiliorivera/ml-frameworks:eval100_client
You should have the following docker volumes created:
pip-cache
: Pip caching directory, to avoid downloading same dependencies when relaunching a server.results
: Contains the metrics recolted by the server (and the client, technically)data
: Contains the data for the challenges (datasets)
Here is the following preferred command in order to launch a server:
sudo docker run --name eval100_server --mount source=pip-cache,target=/pip-cache --mount source=results,target=/results --mount source=data,target=/data --gpus all --env DATA_SERVER_DATA_ROOT=/data -it emiliorivera/ml-frameworks:eval100_server
The default values for the following parameters are set:
PY_CACHE_DIR
to/pip-cache
METRICS_LOG_DIR
to/results/server
SEED_CONTROLLER_FILE
to/results/server/seed_control
SERVER_LOG_FILE
to$METRICS_LOGDIR/server.log
}"SERVER_PY_VERSION
to3.6.8