Skip to content

Commit

Permalink
merge release_03 branch into master branch
Browse files Browse the repository at this point in the history
  • Loading branch information
brettinanl committed Jun 16, 2020
2 parents 5d2c892 + 1031bcb commit d518162
Show file tree
Hide file tree
Showing 190 changed files with 15,746 additions and 1,484 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
*.pyc
__pycache__/
Data
133 changes: 133 additions & 0 deletions Pilot1/Attn/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
The Pilot1 Attn Benchmark requires an hdf5 file specified by the hyperparameter "in", name of this file for default case is: top_21_1fold_001.h5

Benchmark auto downloads the file below:
http://ftp.mcs.anl.gov/pub/candle/public/benchmarks/Pilot1/uno/top_21_1fold_001.h5 (~4GB)

Any file of the form top*21_1fold*"ijk".h5 can be used as input

## Sample run:

```
python attn_baseline_keras2.py
Params: {'model_name': 'attn', 'dense': [2000, 600], 'batch_size': 32, 'epochs': 1, 'activation': 'relu', 'loss': 'categorical_crossentropy', 'optimizer': 'sgd', 'drop': 0.2, 'learning_rate': 1e-05, 'momentum': 0.7, 'scaling': 'minmax', 'validation_split': 0.1, 'epsilon_std': 1.0, 'rng_seed': 2017, 'initialization': 'glorot_uniform', 'latent_dim': 2, 'batch_normalization': False, 'in': 'top_21_1fold_001.h5', 'save_path': 'candle_save', 'save_dir': './save/001/', 'use_cp': False, 'early_stop': True, 'reduce_lr': True, 'feature_subsample': 0, 'nb_classes': 2, 'timeout': 3600, 'verbose': None, 'logfile': None, 'train_bool': True, 'experiment_id': 'EXP000', 'run_id': 'RUN000', 'shuffle': False, 'gpus': [], 'profiling': False, 'residual': False, 'warmup_lr': False, 'use_tb': False, 'tsne': False, 'datatype': <class 'numpy.float32'>, 'output_dir': '/nfs2/jain/Benchmarks/Pilot1/Attn/Output/EXP000/RUN000'}
...
...
processing h5 in file top_21_1fold_001.h5
x_train shape: (271915, 6212)
x_test shape: (33989, 6212)
Examples:
Total: 339893
Positive: 12269 (3.61% of total)
X_train shape: (271915, 6212)
X_test shape: (33989, 6212)
Y_train shape: (271915, 2)
Y_test shape: (33989, 2)
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 6212) 0
__________________________________________________________________________________________________
dense_1 (Dense) (None, 1000) 6213000 input_1[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 1000) 4000 dense_1[0][0]
__________________________________________________________________________________________________
dense_2 (Dense) (None, 1000) 1001000 batch_normalization_1[0][0]
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 1000) 4000 dense_2[0][0]
__________________________________________________________________________________________________
dense_3 (Dense) (None, 1000) 1001000 batch_normalization_1[0][0]
__________________________________________________________________________________________________
multiply_1 (Multiply) (None, 1000) 0 batch_normalization_2[0][0]
dense_3[0][0]
__________________________________________________________________________________________________
dense_4 (Dense) (None, 500) 500500 multiply_1[0][0]
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 500) 2000 dense_4[0][0]
__________________________________________________________________________________________________
dropout_1 (Dropout) (None, 500) 0 batch_normalization_3[0][0]
__________________________________________________________________________________________________
dense_5 (Dense) (None, 250) 125250 dropout_1[0][0]
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 250) 1000 dense_5[0][0]
__________________________________________________________________________________________________
dropout_2 (Dropout) (None, 250) 0 batch_normalization_4[0][0]
__________________________________________________________________________________________________
dense_6 (Dense) (None, 125) 31375 dropout_2[0][0]
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, 125) 500 dense_6[0][0]
__________________________________________________________________________________________________
dropout_3 (Dropout) (None, 125) 0 batch_normalization_5[0][0]
__________________________________________________________________________________________________
dense_7 (Dense) (None, 60) 7560 dropout_3[0][0]
__________________________________________________________________________________________________
batch_normalization_6 (BatchNor (None, 60) 240 dense_7[0][0]
__________________________________________________________________________________________________
dropout_4 (Dropout) (None, 60) 0 batch_normalization_6[0][0]
__________________________________________________________________________________________________
dense_8 (Dense) (None, 30) 1830 dropout_4[0][0]
__________________________________________________________________________________________________
batch_normalization_7 (BatchNor (None, 30) 120 dense_8[0][0]
__________________________________________________________________________________________________
dropout_5 (Dropout) (None, 30) 0 batch_normalization_7[0][0]
__________________________________________________________________________________________________
dense_9 (Dense) (None, 2) 62 dropout_5[0][0]
==================================================================================================
Total params: 8,893,437
Trainable params: 8,887,507
Non-trainable params: 5,930
..
..
271915/271915 [==============================] - 631s 2ms/step - loss: 0.8681 - acc: 0.5548 - tf_auc: 0.5371 - val_loss: 0.6010 - val_acc: 0.8365 - val_tf_auc: 0.5743
Current time ....631.567
Epoch 00001: val_loss improved from inf to 0.60103, saving model to ./save/001/Agg_attn_bin.autosave.model.h5
creating table of predictions
creating figure 1 at ./save/001/Agg_attn_bin.auroc.pdf
creating figure 2 at ./save/001/Agg_attn_bin.auroc2.pdf
f1=0.234 auroc=0.841 aucpr=0.990
creating figure 3 at ./save/001/Agg_attn_bin.aurpr.pdf
creating figure 4 at ./save/001/Agg_attn_bin.confusion_without_norm.pdf
Confusion matrix, without normalization
[[27591 5190][ 360 848]]
Confusion matrix, without normalization
[[27591 5190][ 360 848]]
Normalized confusion matrix
[[0.84 0.16][0.3 0.7 ]]
Examples:
Total: 339893
Positive: 12269 (3.61% of total)
0.7718316679565835
0.7718316679565836
precision recall f1-score support
0 0.99 0.84 0.91 32781
1 0.14 0.70 0.23 1208
micro avg 0.84 0.84 0.84 33989
macro avg 0.56 0.77 0.57 33989
weighted avg 0.96 0.84 0.88 33989
[[27591 5190][ 360 848]]
score
[0.5760348070144456, 0.8367118835449219, 0.5936741828918457]
Test val_loss: 0.5760348070144456
Test accuracy: 0.8367118835449219
Saved model to disk
Loaded json model from disk
json Validation loss: 0.560062773128295
json Validation accuracy: 0.8367118835449219
json accuracy: 83.67%
Loaded yaml model from disk
yaml Validation loss: 0.560062773128295
yaml Validation accuracy: 0.8367118835449219
yaml accuracy: 83.67%
Yaml_train_shape: (271915, 2)
Yaml_test_shape: (33989, 2)
```
214 changes: 214 additions & 0 deletions Pilot1/Attn/attn.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,214 @@
from __future__ import print_function

import os
import sys
import logging

import pandas as pd
import numpy as np

from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from scipy.stats.stats import pearsonr

file_path = os.path.dirname(os.path.realpath(__file__))
#lib_path = os.path.abspath(os.path.join(file_path, '..'))
#sys.path.append(lib_path)
lib_path2 = os.path.abspath(os.path.join(file_path, '..', '..', 'common'))
sys.path.append(lib_path2)

import candle

logger = logging.getLogger(__name__)
candle.set_parallelism_threads()

additional_definitions = [
{'name':'latent_dim',
'action':'store',
'type': int,
'help':'latent dimensions'},
{'name':'residual',
'type': candle.str2bool,
'default': False,
'help':'add skip connections to the layers'},
{'name':'reduce_lr',
'type': candle.str2bool,
'default': False,
'help':'reduce learning rate on plateau'},
{'name':'warmup_lr',
'type': candle.str2bool,
'default': False,
'help':'gradually increase learning rate on start'},
{'name':'base_lr',
'type': float,
'help':'base learning rate'},
{'name':'epsilon_std',
'type': float,
'help':'epsilon std for sampling latent noise'},
{'name':'use_cp',
'type': candle.str2bool,
'default': False,
'help':'checkpoint models with best val_loss'},
#{'name':'shuffle',
#'type': candle.str2bool,
#'default': False,
#'help':'shuffle data'},
{'name':'use_tb',
'type': candle.str2bool,
'default': False,
'help':'use tensorboard'},
{'name':'tsne',
'type': candle.str2bool,
'default': False,
'help':'generate tsne plot of the latent representation'}
]

required = [
'activation',
'batch_size',
'dense',
'dropout',
'epochs',
'initialization',
'learning_rate',
'loss',
'optimizer',
'rng_seed',
'scaling',
'val_split',
'latent_dim',
'batch_normalization',
'epsilon_std',
'timeout'
]

class BenchmarkAttn(candle.Benchmark):

def set_locals(self):
"""Functionality to set variables specific for the benchmark
- required: set of required parameters for the benchmark.
- additional_definitions: list of dictionaries describing the additional parameters for the
benchmark.
"""

if required is not None:
self.required = set(required)
if additional_definitions is not None:
self.additional_definitions = additional_definitions


def extension_from_parameters(params, framework=''):
"""Construct string for saving model with annotation of parameters"""
ext = framework
for i, n in enumerate(params['dense']):
if n:
ext += '.D{}={}'.format(i+1, n)
ext += '.A={}'.format(params['activation'][0])
ext += '.B={}'.format(params['batch_size'])
ext += '.E={}'.format(params['epochs'])
ext += '.L={}'.format(params['latent_dim'])
ext += '.LR={}'.format(params['learning_rate'])
ext += '.S={}'.format(params['scaling'])

if params['epsilon_std'] != 1.0:
ext += '.EPS={}'.format(params['epsilon_std'])
if params['dropout']:
ext += '.DR={}'.format(params['dropout'])
if params['batch_normalization']:
ext += '.BN'
if params['warmup_lr']:
ext += '.WU_LR'
if params['reduce_lr']:
ext += '.Re_LR'
if params['residual']:
ext += '.Res'

return ext
def load_data(params, seed):

# start change #
if params['train_data'].endswith('h5') or params['train_data'].endswith('hdf5'):
print ('processing h5 in file {}'.format(params['train_data']))

url = params['data_url']
file_train = params['train_data']
train_file = candle.get_file(file_train, url+file_train, cache_subdir='Pilot1')

df_x_train_0 = pd.read_hdf(train_file, 'x_train_0').astype(np.float32)
df_x_train_1 = pd.read_hdf(train_file, 'x_train_1').astype(np.float32)
X_train = pd.concat([df_x_train_0, df_x_train_1], axis=1, sort=False)
del df_x_train_0, df_x_train_1

df_x_test_0 = pd.read_hdf(train_file, 'x_test_0').astype(np.float32)
df_x_test_1 = pd.read_hdf(train_file, 'x_test_1').astype(np.float32)
X_test = pd.concat([df_x_test_0, df_x_test_1], axis=1, sort=False)
del df_x_test_0, df_x_test_1

df_x_val_0 = pd.read_hdf(train_file, 'x_val_0').astype(np.float32)
df_x_val_1 = pd.read_hdf(train_file, 'x_val_1').astype(np.float32)
X_val = pd.concat([df_x_val_0, df_x_val_1], axis=1, sort=False)
del df_x_val_0, df_x_val_1

Y_train = pd.read_hdf(train_file, 'y_train')
Y_test = pd.read_hdf(train_file, 'y_test')
Y_val = pd.read_hdf(train_file, 'y_val')

# assumes AUC is in the third column at index 2
# df_y = df['AUC'].astype('int')
# df_x = df.iloc[:,3:].astype(np.float32)

# assumes dataframe has already been scaled
# scaler = StandardScaler()
# df_x = scaler.fit_transform(df_x)
else:
print ('expecting in file file suffix h5')
sys.exit()


print('x_train shape:', X_train.shape)
print('x_test shape:', X_test.shape)

return X_train, Y_train, X_val, Y_val, X_test, Y_test

# start change #
if train_file.endswith('h5') or train_file.endswith('hdf5'):
print ('processing h5 in file {}'.format(train_file))

df_x_train_0 = pd.read_hdf(train_file, 'x_train_0').astype(np.float32)
df_x_train_1 = pd.read_hdf(train_file, 'x_train_1').astype(np.float32)
X_train = pd.concat([df_x_train_0, df_x_train_1], axis=1, sort=False)
del df_x_train_0, df_x_train_1

df_x_test_0 = pd.read_hdf(train_file, 'x_test_0').astype(np.float32)
df_x_test_1 = pd.read_hdf(train_file, 'x_test_1').astype(np.float32)
X_test = pd.concat([df_x_test_0, df_x_test_1], axis=1, sort=False)
del df_x_test_0, df_x_test_1

df_x_val_0 = pd.read_hdf(train_file, 'x_val_0').astype(np.float32)
df_x_val_1 = pd.read_hdf(train_file, 'x_val_1').astype(np.float32)
X_val = pd.concat([df_x_val_0, df_x_val_1], axis=1, sort=False)
del df_x_val_0, df_x_val_1

Y_train = pd.read_hdf(train_file, 'y_train')
Y_test = pd.read_hdf(train_file, 'y_test')
Y_val = pd.read_hdf(train_file, 'y_val')

# assumes AUC is in the third column at index 2
# df_y = df['AUC'].astype('int')
# df_x = df.iloc[:,3:].astype(np.float32)

# assumes dataframe has already been scaled
# scaler = StandardScaler()
# df_x = scaler.fit_transform(df_x)

else:
print ('expecting in file file suffix h5')
sys.exit()


print('x_train shape:', X_train.shape)
print('x_test shape:', X_test.shape)

return X_train, Y_train, X_val, Y_val, X_test, Y_test


27 changes: 27 additions & 0 deletions Pilot1/Attn/attn_abs_default_model.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
[Global_Params]
data_url = 'http://ftp.mcs.anl.gov/pub/candle/public/benchmarks/Pilot1/uno/'
train_data='top_21_1fold_001.h5'
model_name='attn_abs'
dense=[1000, 1000, 1000, 500, 250, 125, 60, 30, 2]
batch_size=32
epochs=2
activation=['relu', 'relu', 'softmax', 'relu', 'relu', 'relu', 'relu', 'relu', 'softmax']
loss='categorical_crossentropy'
optimizer='sgd'
dropout=0.2
learning_rate=0.00001
momentum=0.9
val_split=0.1
rng_seed=2017
use_cp=False
early_stop=True
reduce_lr=True
feature_subsample=0
output_dir='save_abs/EXP01/'
experiment_id='01'
run_id='1'
save_path='save_abs/EXP01/'
target_abs_acc=0.85

[Monitor_Params]
timeout=3600
Loading

0 comments on commit d518162

Please sign in to comment.