Skip to content

SRI's implementation of CMA code for DARPA AIE-CriticalMAAS TA3

Notifications You must be signed in to change notification settings

jataware/sri-ta3

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SRI DARPA AIE - CriticalMAAS TA3

PyTorch Lightning Config: Hydra

Background

Key Tools

PyTorch - an open-source deep learning framework primarily developed by Facebook's AI Research lab (FAIR). It provides a flexible and dynamic computational graph computation model, making it popular among researchers and developers for building and training deep neural networks.

PyTorch Lightning - a lightweight PyTorch wrapper that simplifies the process of building, training, and deploying complex deep learning models. It provides a high-level interface and abstractions that abstract away boilerplate code, making it easier for researchers and practitioners to focus on experimenting with and improving their models rather than dealing with low-level implementation details.

Hydra - a framework for elegantly configuring complex applications. The key feature is the ability to dynamically create a hierarchical configuration by composition and override it through config files and the command line.

Project Structure

The directory structure looks like this:

├── data                   <- Project data
│   └── raster_libraries      <- Folder holding sets of individual rasters per CMA
|       ├── maniac_mini_raster_library  <- Raster Library for maniac_mini example
│       └── ...
├── docker                 <- Docker scripts to build images / run containers
│
├── logs                   <- Logs generated by hydra and lightning loggers
├── sri_maper              <- Primary source code folder for MAPER
│   ├── ckpts                 <- Optional folder to hold pretrained models (if not in logs)
│   │
│   ├── configs                 <- Hydra configs
│   │   ├── callbacks               <- Callbacks configs
│   │   ├── data                    <- Data configs
│   │   ├── debug                   <- Debugging configs
│   │   ├── experiment              <- Experiment configs
│   │   ├── extras                  <- Extra utilities configs
│   │   ├── hparams_search          <- Hyperparameter search configs
│   │   ├── hydra                   <- Hydra configs
│   │   ├── logger                  <- Logger configs
│   │   ├── model                   <- Model configs
│   │   ├── paths                   <- Project paths configs
│   │   ├── preprocess              <- Preprocessing configs
│   │   ├── trainer                 <- Trainer configs
│   │   │
│   │   ├── __init__.py        <- python module __init__
│   │   ├── test.yaml          <- Main config for testing
│   │   └── train.yaml         <- Main config for training
│   │
│   ├── notebooks              <- Jupyter notebooks
│   │
│   ├── src                    <- Source code
│   │   ├── data                    <- Data code
│   │   ├── models                  <- Model code
│   │   ├── utils                   <- Utility code
│   │   │
│   │   ├── __init__.py         <- python module __init__
│   │   ├── map.py              <- Run mapping via CLI
│   │   ├── pretrain.py         <- Run pretraining via CLI
│   │   ├── test.py             <- Run testing via CLI
│   │   └── train.py            <- Run training via CLI
│   │
│   ├── __init__.py        <- python module __init__
│
├── .gitignore                <- List of files ignored by git
├── LICENSE.txt               <- License for code repo
├── project_vars.sh           <- Project variables for infrastructure
├── setup.py                  <- File for installing project as a package
└── README.md

Installation

This repo is compatible with running locally, on docker locally, or on docker in a Kubernetes cluster. Please follow the corresponding instrcutions exactly, carefully so install is smooth. Once you are familiar with the structure, you can make changes.

Local install and run

This setup presents the easiest installation but is more brittle than using docker containers. Please make a virtual environment of your choosing, source the environment, clone the repo, and install the code using setup.py. Below are example commands to do so.

# creates and activates virtual environment
conda create -n [VIRTUAL_ENV_NAME] python=3.10
conda activate [VIRTUAL_ENV_NAME]
# clone repo source code locally
git clone https://github.com/DARPA-CRITICALMAAS/sri-ta3.git
cd sri-ta3
# installs from source code
python3 -m pip install -e .

If installation succeeded without errors, you should be able to run the code locally. Before we do that, let's prepare the data within the repo used to perform CMAs. Skip to Data Setup.

Install with docker container that is run locally

This setup is slightly more involved but provides more robustness across physical devices by using docker. We've written convenience bash scripts to make building and running the docker container much eaiser. First, edit the JOB_TAG REPO_HOST, DUSER, WANDB_API_KEY variables project_vars.sh to your use case. After editing project_vars.sh, please clone the repo, and build the docker image. Below are example commands to do so using the conenivence scripts.

# clone repo source code locally
git clone https://github.com/DARPA-CRITICALMAAS/sri-ta3.git
cd sri-ta3
# builds docker image (installing source in image) and pushes to docker repo
bash docker/run_docker_build_push.sh

Optionally, if you would like to override the default logs and data folders within this repo that are empty to use exisitng ones (e.g. on datalake) that might contain existing logs and data, simply mount (or overwite) the corresponding folders on the datalake to the empty logs and data folders within this repo. Below are examles commands to do so.

sudo mount.cifs -o username=${USER},domain=sri,uid=$(id -u),gid=$(id -g) /datalake/path/to/existing/logs ./logs
sudo mount.cifs -o username=${USER},domain=sri,uid=$(id -u),gid=$(id -g) /datalake/path/to/existing/data ./data

If installation succeeded without errors, you should be able to run the code locally. Before we do that, let's prepare the data within the repo used to perform CMAs. Skip to Data Setup.

Install with docker container that is run on the SRI International Kubernetes cluster

This setup is slightly more involved but provides more scalability to use more compute by using docker and Kubernetes. First we'll need to prepare some folders on the datalake to contain your data, code, and logs. Under the criticalmaas-ta3 folder (namespace) within the vt-open datalake, make the following directory structure for YOUR use using your employee ID number (i.e. eXXXXX). NOTE, you only need to make the folders with the comment CREATE in it, the others should exist already. Be careful not to corrupt the folders of other users or namespaces.

vt-open
├── ... # other folders for other namespaces - avoid
├── criticalmaas-ta3 # top-level of criticalmaas-ta3 namespace
│   ├── data # contains all criticalmaas-ta3 data - (k8s READ ONLY)
│   └── k8s # contains criticalmaas-ta3 code & logs for ALL users - (k8s READ & WRITE)
│       ├── eXXXXX # folder you should CREATE to contain your code & logs
│       │   ├── code # folder you should CREATE to contain your code
│       │   └── logs # folder you should CREATE to contain your logs
│       └── ... # other folders for other users - avoid
└── ... # other folders for other namespaces - avoid

Next you will need to mount the code folder above locally. By mounting the code folder on the datalake locally, your local edits to source code will be reflected in the datalake, and therefore, on the Kubernetes cluster.

# makes a local code folder
mkdir k8s-code
# mount the datalake folder that hosts the code (Kubernetes will have access)
sudo mount.cifs -o username=${USER},domain=sri,uid=$(id -u),gid=$(id -g) /datalake/path/to/vt-open/criticalmaas-ta3/k8s/${USER}/code ./k8s-code

Last, we'll install the repo. We've written convenience bash scripts to make building and running the docker container much eaiser. Edit the JOB_TAG REPO_HOST, DUSER, WANDB_API_KEY variables project_vars.sh to your use case. After editing project_vars.sh, please clone the repo, and build the docker image. Below are example commands to do so using the conenivence scripts.

# clone repo source code locally
git clone https://github.com/DARPA-CRITICALMAAS/sri-ta3.git
cd sri-ta3
# builds docker image (installing source in image) and pushes to docker repo
bash docker/run_docker_build_push.sh

If installation succeeded without errors, you should be able to run the code locally. Before we do that, let's prepare the data within the repo used to perform CMAs. Skip to Data Setup.

Data Setup

Background

Our models and data cannot be directly hosted on GitHub because GitHub has a strict limit of 100 MB max file size. However, we are hosting the models and data on a Microsoft Sharepoint. To get access, please email Angel Daruna ([email protected]) or Vasily Zadorozhnyy ([email protected]). We are blocked from making the link public, but most email domains work. Domains that have worked include: gmail.com, mitre.org, darpa.mil, usgs.gov, uky.edu.

The raster_libraries folder of this repo holds the sets of invidual rasters per CMA (see Project Structure). These individual rasters get stacked together in the preprocessing portion of our code to make a "raster stack" that is used directly for training, etc the CMA model (see preprocessing configs).

We provide a small window of the rasters under raster_libraries used to perform the national-scale Magmatic Nickel assessment (i.e. "MaNiAC" from Hackathon 2). This raster library is titled maniac_mini_raster_library. However, NOT ALL DATA is provided within the repo for reasons above. Please follow the steps below to download needed data from the Microsoft Sharepoint.

Setup Data

Within the Microfsoft Sharepoint (see Background) you will see a folder titled Input Data. The Input Data folder contains all raster libraries that can be used for various CMAs. To get started with the data for the national CMAs, download and extract the national_scale_raster_library from the Sharepoint into your local raster_libraries folder (i.e. alongside maniac_mini_raster_library). Care must be taken when downloading the data. Sometimes Micrsoft Sharepoint will miss files, etc. Ultimately, your local copy of national_scale_raster_library and its subdirectories should EXACTLY match the one found on the Sharepoint.

Command-Line-Interface (CLI) Tutorial

Verify Installation and Data Setup

First, remember navigate to the environment you've built to run the code. If you went the route of using docker to install, start your container and run a CLI (e.g. terminal) within it. The conda environment should already be activated. If you went the local install route, start your CLI (e.g. terminal) and activate your conda environment (see Local install and run).

With environment setup, let's run a single CLI command to make sure your install and data setup worked. The following command will train a ResNet model for the Magmatic Nickel national-scale CMA (i.e. Hackathon 2 MaNiAC):

python sri_maper/src/train.py experiment=exp_maniac_resnet_l22_uscont

If that ran without error, you can skip to Usage below.

Troubleshooting: If you are getting an error, don't panic. There's a few common minor errors that pop up due to your local hardware / environment setup differing from those preferred at SRI.

If you're seeing MisconfigurationException('No supported gpu backend found!'), it is possible you either have a system without a GPU or the driver for that GPU is incompatible. Instead, run the following command:

python sri_maper/src/train.py experiment=exp_maniac_resnet_l22_uscont trainer=cpu

If you're seeing hydra.errors.InstantiationException: Error in call to target 'pytorch_lightning.loggers.wandb.WandbLogger': AuthenticationError("The API key you provided is either invalid or missing. ..., you need to login with your wandb account. Use the command wandb login --relogin do so. If you don't have a wandb account or prefer not to make one, you can simply use a different logger. In that case, run the following command:

python sri_maper/src/train.py experiment=exp_maniac_resnet_l22_uscont logger=csv

If you're seeing, ... No such file or directory, confirm you followed the steps in Data Setup exactly.

If these troubleshooting steps did not resolve your problem, please contact Angel Daruna ([email protected]) or Vasily Zadorozhnyy ([email protected]) to troubleshoot. You can also create an issue, PR, etc.

Recreate Existing CMA Experiments

Train & Test CMA models using ResNet:

# National Lead-Zinc MVT
python sri_maper/src/train.py experiment=exp_mvt_resnet_l22_uscont
# National Magmatic Nickel
python sri_maper/src/train.py experiment=exp_maniac_resnet_l22_uscont
# National Tungsten-skarn
python sri_maper/src/train.py experiment=exp_w_resnet_l22_uscont
# National Porphyry Copper
python sri_maper/src/train.py experiment=exp_cu_resnet_l22_uscont
# Regional Mafic Magmatic Nickel-Cobalt in Upper-Midwest
python sri_maper/src/train.py experiment=exp_mamanico_resnet_umidwest
# Regional Tungsten-skarn in Yukon-Tanana Upland
python sri_maper/src/train.py experiment=exp_w_resnet_ytu
# Regional MVT Lead-Zinc in "SMidCont"
python sri_maper/src/train.py experiment=exp_mvt_resnet_smidcont

Pretrain, Train, and Test national-scale MVT Lead-Zinc and Tungsten-skarn CMAs using MAE:

# pretrains the MAE checkpoint (77 evidence layers)
python sri_maper/src/pretrain.py experiment=exp_maevit_pretrain_l22_uscont
# trains & tests Lead-Zinc MVT CMA model
python sri_maper/src/train.py experiment=exp_mvt_maevit_classifier_l22_uscont model.net.backbone_ckpt=logs/PATH_TO_PRETRAINED_CHECKPOINT_ABOVE/checkpoint.ckpt
# trains & tests Tungsten-skarn CMA model
python sri_maper/src/train.py experiment=exp_w_maevit_classifier_l22_uscont model.net.backbone_ckpt=logs/PATH_TO_PRETRAINED_CHECKPOINT_ABOVE/checkpoint.ckpt

Pretrain, Train, and Test national-scale Magmatic Nickel CMA using MAE:

# pretrains the MAE checkpoint (14 evidence layers)
python sri_maper/src/pretrain.py experiment=exp_maniac_maevit_pretrain_l22_uscont
# trains & tests Magmatic Nickel CMA model
python sri_maper/src/train.py experiment=exp_maniac_maevit_classifier_l22_uscont model.net.backbone_ckpt=logs/PATH_TO_PRETRAINED_CHECKPOINT_ABOVE/checkpoint.ckpt

Note: the MAE MaNiAC pretraining differs from the MVT Lead-Zinc and Tungsten-skarn ONLY because it was decided at Hackathon 2 that 14 evidence layers would be used (instead of the 77 available at national scale). One can just as easily modify the preprocessing config of the MAE MaNiAC training to use the same evidence layers at MVT Lead-Zinc and Tungsten-skarn.

Pretrain, Train, and Test National Porphyry Copper CMA using MAE:

# pretrains the MAE checkpoint (22 evidence layers)
python sri_maper/src/pretrain.py experiment=exp_cu_maevit_pretrain_l22_uscont
# trains & tests Tungsten-skarn in Yukon-Tanana Upland
python sri_maper/src/train.py experiment=exp_cu_maevit_classifier_l22_uscont model.net.backbone_ckpt=logs/PATH_TO_PRETRAINED_CHECKPOINT_ABOVE/checkpoint.ckpt

Pretrain, Train, and Test regional-scale Mafic Magmatic Nickel-Cobalt in Upper-Midwest CMA using MAE:

# pretrains the MAE checkpoint (28 evidence layers)
python sri_maper/src/pretrain.py experiment=exp_mamanico_maevit_pretrain_umidwest
# trains & tests Mafic Magmatic Nickel-Cobalt in Upper-Midwest
python sri_maper/src/train.py experiment=exp_mamanico_maevit_classifier_umidwest model.net.backbone_ckpt=logs/PATH_TO_PRETRAINED_CHECKPOINT_ABOVE/checkpoint.ckpt

Pretrain, Train, and Test regional-scale Tungsten-skarn in Yukon-Tanana Upland CMA using MAE:

# pretrains the MAE checkpoint (18 evidence layers)
python sri_maper/src/pretrain.py experiment=exp_w_maevit_pretrain_ytu
# trains & tests Tungsten-skarn in Yukon-Tanana Upland
python sri_maper/src/train.py experiment=exp_w_maevit_classifier_ytu model.net.backbone_ckpt=logs/PATH_TO_PRETRAINED_CHECKPOINT_ABOVE/checkpoint.ckpt

Pretrain, Train, and Test regional-scale MVT Lead-Zinc in "SMidCont" CMA using MAE:

# pretrains the MAE checkpoint (18 evidence layers)
python sri_maper/src/pretrain.py experiment=exp_mvt_maevit_pretrain_smidcont
# trains & tests Tungsten-skarn in Yukon-Tanana Upland
python sri_maper/src/train.py experiment=exp_mvt_maevit_classifier_smidcont model.net.backbone_ckpt=logs/PATH_TO_PRETRAINED_CHECKPOINT_ABOVE/checkpoint.ckpt

Build Maps with Trained Models

In the Microsoft Sharepoint folder (see Background under Data Setup) we provide trained classification model checkpoints for all existing experiments. Please download the corresponding model checkpoints you would like to use and place them in the ckpts folder. The commands below show how to build the prospectivity map for each CMA using the model checkpoint.

Pretrained model performance using defined experiment configs:

F1-score ResNet MAE
Lead-Zinc MVT national 61.5 68.6
Magmatic Nickel national 85.7 86.5
Tungsten-skarn national 55.8 61.1
Porphyry Copper national 57.6 60.3
Mafic Magmatic Nickel-Cobalt in Upper-Midwest 33.3 60.0
Tungsten-skarn in Yukon-Tanana Upland 35.3 40.0

Build prospectivity maps using ResNet model checkpoints:

# national Lead-Zinc MVT
python sri_maper/src/map.py experiment=exp_mvt_resnet_l22_uscont data.batch_size=128 enable_attributions=True ckpt_path=sri_maper/ckpts/natl_mvt_resnet.ckpt
# national Magmatic Nickel
python sri_maper/src/map.py experiment=exp_maniac_resnet_l22_uscont data.batch_size=128 enable_attributions=True ckpt_path=sri_maper/ckpts/natl_maniac_resnet.ckpt
# national Tungsten-skarn
python sri_maper/src/map.py experiment=exp_w_resnet_l22_uscont data.batch_size=128 enable_attributions=True ckpt_path=sri_maper/ckpts/natl_w_resnet.ckpt
# national Porphyry Copper
python sri_maper/src/map.py experiment=exp_cu_resnet_l22_uscont data.batch_size=128 enable_attributions=True ckpt_path=sri_maper/ckpts/natl_cu_resnet.ckpt
# regional Mafic Magmatic Nickel-Cobalt in Upper-Midwest
python sri_maper/src/map.py experiment=exp_mamanico_resnet_umidwest data.batch_size=128 enable_attributions=True ckpt_path=sri_maper/ckpts/umidwest_mamanico_resnet.ckpt
# regional Tungsten-skarn in Yukon-Tanana Upland
python sri_maper/src/map.py experiment=exp_w_resnet_ytu data.batch_size=128 enable_attributions=True ckpt_path=sri_maper/ckpts/ytu_w_resnet.ckpt
# regional MVT Lead-Zinc in "SMidCont"
python sri_maper/src/map.py experiment=exp_mvt_resnet_smidcont data.batch_size=128 enable_attributions=True ckpt_path=sri_maper/ckpts/exp_mvt_resnet_smidcont.ckpt

Build prospectivity maps using MAE model checkpoints:

# national Lead-Zinc MVT
python sri_maper/src/map.py experiment=exp_mvt_maevit_classifier_l22_uscont model.net.backbone_ckpt=sri_maper/ckpts/natl_pretrain.ckpt data.batch_size=64 enable_attributions=True ckpt_path=sri_maper/ckpts/natl_mvt_mae.ckpt
# national Magmatic Nickel
python sri_maper/src/map.py experiment=exp_maniac_maevit_classifier_l22_uscont model.net.backbone_ckpt=sri_maper/ckpts/natl_pretrain_maniac.ckpt data.batch_size=64 enable_attributions=True ckpt_path=sri_maper/ckpts/natl_maniac_mae.ckpt
# national Tungsten-skarn
python sri_maper/src/map.py experiment=exp_w_maevit_classifier_l22_uscont model.net.backbone_ckpt=sri_maper/ckpts/natl_pretrain.ckpt data.batch_size=64 enable_attributions=True ckpt_path=sri_maper/ckpts/natl_w_mae.ckpt
# national Porphyry Copper
python sri_maper/src/map.py experiment=exp_cu_maevit_classifier_l22_uscont model.net.backbone_ckpt=sri_maper/ckpts/natl_pretrain_cu.ckpt data.batch_size=64 enable_attributions=True ckpt_path=sri_maper/ckpts/natl_cu_mae.ckpt
# regional Mafic Magmatic Nickel-Cobalt in Upper-Midwest
python sri_maper/src/map.py experiment=exp_mamanico_maevit_classifier_umidwest model.net.backbone_ckpt=sri_maper/ckpts/umidwest_mamanico_pretrain.ckpt data.batch_size=64 enable_attributions=True ckpt_path=sri_maper/ckpts/umidwest_mamanico_mae.ckpt
# regional Tungsten-skarn in Yukon-Tanana Upland
python sri_maper/src/map.py experiment=exp_w_maevit_classifier_ytu model.net.backbone_ckpt=sri_maper/ckpts/ytu_w_pretrain.ckpt data.batch_size=64 enable_attributions=True ckpt_path=sri_maper/ckpts/ytu_w_mae.ckpt
# regional MVT Lead-Zinc in "SMidCont"
python sri_maper/src/map.py experiment=exp_mvt_maevit_classifier_smidcont model.net.backbone_ckpt=sri_maper/ckpts/smidcont_mvt_pretrain.ckpt data.batch_size=64 enable_attributions=True ckpt_path=sri_maper/ckpts/smidcont_mvt_mae.ckpt

General Usage

It is important to remember, this tool IS EXTENSIBLE. The particular experiments for existing CMAs we provide are examples to use MAPER. HOWEVER, MAPER is built to be fully controlled from the experiment configuration. This design supports integration across TAs and offers the domain experts full control over MAPER without modifying source code, notebook files, etc. Below is a more general background of commands one can use with MAPER's CLI

Using the CLI is the suggested method of integration into the MAPER code. As additional documentation, we provide example notebook files that use the CLI internally within the jupyter notebook files. However, all actions performed in the jupyter notebook can be performed with the CLI (the notebooks just call the CLI functions internally). We suggest viewing the notebooks files as is (i.e. without running) to understand the CLI, then experiment with using the CLI directly.

Below we give examples of the train, test, map, and pretrain capabilties through the CLI. The section that follows gives background about the example notebook files.

You can choose your training hardware like this:

# train on CPU
python sri_maper/src/train.py trainer=cpu

# train on GPU
python sri_maper/src/train.py trainer=gpu

# train on multi-GPU
python sri_maper/src/train.py trainer=ddp

You can run a predefined experiment from configs/experiment/ like this:

python sri_maper/src/train.py experiment=[example]

You can override any parameter from command line like this

python sri_maper/src/train.py trainer.max_epochs=20 data.batch_size=64

You can pretain a model like this

python sri_maper/src/pretrain.py ckpt_path=<PATH_TO_CHECKPOINT/*.ckpt>

You can test an existing checkpoint like this

python sri_maper/src/test.py ckpt_path=<PATH_TO_CHECKPOINT/*.ckpt>

You can build prospectivity maps using an existing model checkpoint like this:

python sri_maper/src/map.py +experiment=[example] ckpt_path=<PATH_TO_CHECKPOINT/*.ckpt>

How It Works - Background about CLI

All PyTorch Lightning modules are dynamically instantiated from module paths specified in config using Hydra. Example model config:

_target_: src.models.mnist_model.MNISTLitModule
lr: 0.001
net:
  _target_: src.models.components.simple_dense_net.SimpleDenseNet
  input_size: 784
  lin1_size: 256
  lin2_size: 256
  lin3_size: 256
  output_size: 10

Using this config we can instantiate the object with the following line in the source code:

model = hydra.utils.instantiate(config.model)

This allows the user to fully control MAPER without modifying source code! Every parameter within the config provides a direct interface to the source code. Therefore, by modifying the config, you modify the parameters used in the source code. As a result, a GUI can be readily built up around this conig.

Example pipeline managing the instantiation logic: src/train.py.

Notebook Tutorial

As additional documentation, we provide example notebook files that use the CLI internally. However, all actions performed in the jupyter notebook can be performed with the CLI (the notebooks just call the CLI functions internally). We suggest viewing the notebooks files as is (i.e. without running) to understand the CLI (above) and what expected outputs are, then experiment with using the CLI directly.

Depending on your install approach, you will need to take different steps to start jupyter and view the example notebooks. Below are example commands to do so using the conenivence scripts.

Local install and run

# make sure your conda environment is activated
jupyter lab

Install with docker container that is run locally

# starts the docker container
bash docker/run_docker_local.sh
##### EXECUTED WITHIN THE DOCKER CONTIAINER #####
# begins jupyter notebook
jupyter lab --ip 0.0.0.0 --allow-root --NotebookApp.token='' --no-browser
# now you can access the notebook files by browsing to http://localhost:8888/lab

Install with docker container that is run on the SRI International Kubernetes cluster

# starts the docker container
bash docker/run_docker_k8s.sh
# jupter lab is already running, browse to http://localhost:8888/lab
# note, you'll want to forward the Kubernetes container port 8888

After following the correct route to start jupyter above, you can view the notebooks and OPTIONALLY run them.

To Do: DocStrings on entire repo

About

SRI's implementation of CMA code for DARPA AIE-CriticalMAAS TA3

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 94.9%
  • Python 4.9%
  • Other 0.2%