This repository hosts the code for
Uncertainty-Aware Pseudo-labeling for Quantum Calculations
By Kexin Huang, Mykola Bordyuh, Vishnu Sresht, Brajesh Rai.
- Create a conda environment:
conda create -n Pseudo python=3.8
conda activate Pseudo
- Git clone the repo:
git clone https://github.com/PfizerRD/pseudo.git
cd pseudo
-
Install PyTorch & Pytorch-Geometric
-
Install other libraries via
pip install -r requirements.txt
PSEUDO relies on QM9 and PC9 datasets. Please download the processed dataset through this link. Unzip it and place the folder "dataset" in the repository.
python train.py --label homo \ # molecule target
--setting low_data \ # low data setting or standard fully supervised setting
--training_fraction 0.01 \ # fraction of training QM9, rest is used as unlabeled
--pseudo_label True \ # whether or not to use pseudo-label or standard
--model dimenet \ # model backbone, select from schnet/dimenet
--iteration 10 \ # outer loop number of episodes
--epoch 100 \ # inner loop number of epochs for each episode
--initial_train_epoch 100 \ # Number of training epochs for the first episode on labeled data
--batch_size 128 \ # batch size
--lr 0.001 \ # learning rate
--lr_decay_factor 0.5 \ # step LR decay factor
--lr_decay_step_size 25 \ # steps to decay LR
--uncertainty_type epistemic \ # uncertainty type select from epistemic and aleatoric
--evi_lambda 0.5 \ # evidental loss regularization coefficient
For more options, checkout the parser in train.py
. Also checkout a demo test in the notebook demo.ipynb
.
To reproduce full data setting:
Click here for the code!
python train.py --label homo \
--model dimenet \
--pseudo_label True \
--setting standard \
--pseudo_ensemble True \
--lr 0.001 \
--lr_decay_factor 0.5 \
--lr_decay_step_size 25 \
--iteration 15 \
--evi_lambda 0.5 \
--epoch 75 \
--batch_size 128 \
--uncertainty_type epistemic
To reproduce low data setting, replace the training_fraction
value with the low data fraction you consider:
Click here for the code!
python train.py --label homo \
--model dimenet \
--pseudo_label True \
--setting low_data \
--training_fraction 0.1 \
--initial_train_epoch 300 \
--pseudo_ensemble True \
--lr 0.001 \
--lr_decay_factor 0.5 \
--lr_decay_step_size 15 \
--iteration 15 \
--evi_lambda 0.5 \
--epoch 50 \
--batch_size 128 \
--uncertainty_type epistemic
@inproceedings{huang2022uncertainty,
title={Uncertainty-Aware Pseudo-labeling for Quantum Calculations},
author={Huang, Kexin and Sresht, Vishnu and Rai, Brajesh and Bordyuh, Mykola},
booktitle={The 38th Conference on Uncertainty in Artificial Intelligence},
year={2022}
}
We use DIG library as the backbone for SchNet and DimeNet++ implementation.