-
Notifications
You must be signed in to change notification settings - Fork 83
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
merge release_03 branch into master branch
- Loading branch information
Showing
190 changed files
with
15,746 additions
and
1,484 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,3 @@ | ||
*.pyc | ||
__pycache__/ | ||
Data |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,133 @@ | ||
The Pilot1 Attn Benchmark requires an hdf5 file specified by the hyperparameter "in", name of this file for default case is: top_21_1fold_001.h5 | ||
|
||
Benchmark auto downloads the file below: | ||
http://ftp.mcs.anl.gov/pub/candle/public/benchmarks/Pilot1/uno/top_21_1fold_001.h5 (~4GB) | ||
|
||
Any file of the form top*21_1fold*"ijk".h5 can be used as input | ||
|
||
## Sample run: | ||
|
||
``` | ||
python attn_baseline_keras2.py | ||
Params: {'model_name': 'attn', 'dense': [2000, 600], 'batch_size': 32, 'epochs': 1, 'activation': 'relu', 'loss': 'categorical_crossentropy', 'optimizer': 'sgd', 'drop': 0.2, 'learning_rate': 1e-05, 'momentum': 0.7, 'scaling': 'minmax', 'validation_split': 0.1, 'epsilon_std': 1.0, 'rng_seed': 2017, 'initialization': 'glorot_uniform', 'latent_dim': 2, 'batch_normalization': False, 'in': 'top_21_1fold_001.h5', 'save_path': 'candle_save', 'save_dir': './save/001/', 'use_cp': False, 'early_stop': True, 'reduce_lr': True, 'feature_subsample': 0, 'nb_classes': 2, 'timeout': 3600, 'verbose': None, 'logfile': None, 'train_bool': True, 'experiment_id': 'EXP000', 'run_id': 'RUN000', 'shuffle': False, 'gpus': [], 'profiling': False, 'residual': False, 'warmup_lr': False, 'use_tb': False, 'tsne': False, 'datatype': <class 'numpy.float32'>, 'output_dir': '/nfs2/jain/Benchmarks/Pilot1/Attn/Output/EXP000/RUN000'} | ||
... | ||
... | ||
processing h5 in file top_21_1fold_001.h5 | ||
x_train shape: (271915, 6212) | ||
x_test shape: (33989, 6212) | ||
Examples: | ||
Total: 339893 | ||
Positive: 12269 (3.61% of total) | ||
X_train shape: (271915, 6212) | ||
X_test shape: (33989, 6212) | ||
Y_train shape: (271915, 2) | ||
Y_test shape: (33989, 2) | ||
Instructions for updating: | ||
If using Keras pass *_constraint arguments to layers. | ||
Model: "model_1" | ||
__________________________________________________________________________________________________ | ||
Layer (type) Output Shape Param # Connected to | ||
================================================================================================== | ||
input_1 (InputLayer) (None, 6212) 0 | ||
__________________________________________________________________________________________________ | ||
dense_1 (Dense) (None, 1000) 6213000 input_1[0][0] | ||
__________________________________________________________________________________________________ | ||
batch_normalization_1 (BatchNor (None, 1000) 4000 dense_1[0][0] | ||
__________________________________________________________________________________________________ | ||
dense_2 (Dense) (None, 1000) 1001000 batch_normalization_1[0][0] | ||
__________________________________________________________________________________________________ | ||
batch_normalization_2 (BatchNor (None, 1000) 4000 dense_2[0][0] | ||
__________________________________________________________________________________________________ | ||
dense_3 (Dense) (None, 1000) 1001000 batch_normalization_1[0][0] | ||
__________________________________________________________________________________________________ | ||
multiply_1 (Multiply) (None, 1000) 0 batch_normalization_2[0][0] | ||
dense_3[0][0] | ||
__________________________________________________________________________________________________ | ||
dense_4 (Dense) (None, 500) 500500 multiply_1[0][0] | ||
__________________________________________________________________________________________________ | ||
batch_normalization_3 (BatchNor (None, 500) 2000 dense_4[0][0] | ||
__________________________________________________________________________________________________ | ||
dropout_1 (Dropout) (None, 500) 0 batch_normalization_3[0][0] | ||
__________________________________________________________________________________________________ | ||
dense_5 (Dense) (None, 250) 125250 dropout_1[0][0] | ||
__________________________________________________________________________________________________ | ||
batch_normalization_4 (BatchNor (None, 250) 1000 dense_5[0][0] | ||
__________________________________________________________________________________________________ | ||
dropout_2 (Dropout) (None, 250) 0 batch_normalization_4[0][0] | ||
__________________________________________________________________________________________________ | ||
dense_6 (Dense) (None, 125) 31375 dropout_2[0][0] | ||
__________________________________________________________________________________________________ | ||
batch_normalization_5 (BatchNor (None, 125) 500 dense_6[0][0] | ||
__________________________________________________________________________________________________ | ||
dropout_3 (Dropout) (None, 125) 0 batch_normalization_5[0][0] | ||
__________________________________________________________________________________________________ | ||
dense_7 (Dense) (None, 60) 7560 dropout_3[0][0] | ||
__________________________________________________________________________________________________ | ||
batch_normalization_6 (BatchNor (None, 60) 240 dense_7[0][0] | ||
__________________________________________________________________________________________________ | ||
dropout_4 (Dropout) (None, 60) 0 batch_normalization_6[0][0] | ||
__________________________________________________________________________________________________ | ||
dense_8 (Dense) (None, 30) 1830 dropout_4[0][0] | ||
__________________________________________________________________________________________________ | ||
batch_normalization_7 (BatchNor (None, 30) 120 dense_8[0][0] | ||
__________________________________________________________________________________________________ | ||
dropout_5 (Dropout) (None, 30) 0 batch_normalization_7[0][0] | ||
__________________________________________________________________________________________________ | ||
dense_9 (Dense) (None, 2) 62 dropout_5[0][0] | ||
================================================================================================== | ||
Total params: 8,893,437 | ||
Trainable params: 8,887,507 | ||
Non-trainable params: 5,930 | ||
.. | ||
.. | ||
271915/271915 [==============================] - 631s 2ms/step - loss: 0.8681 - acc: 0.5548 - tf_auc: 0.5371 - val_loss: 0.6010 - val_acc: 0.8365 - val_tf_auc: 0.5743 | ||
Current time ....631.567 | ||
Epoch 00001: val_loss improved from inf to 0.60103, saving model to ./save/001/Agg_attn_bin.autosave.model.h5 | ||
creating table of predictions | ||
creating figure 1 at ./save/001/Agg_attn_bin.auroc.pdf | ||
creating figure 2 at ./save/001/Agg_attn_bin.auroc2.pdf | ||
f1=0.234 auroc=0.841 aucpr=0.990 | ||
creating figure 3 at ./save/001/Agg_attn_bin.aurpr.pdf | ||
creating figure 4 at ./save/001/Agg_attn_bin.confusion_without_norm.pdf | ||
Confusion matrix, without normalization | ||
[[27591 5190][ 360 848]] | ||
Confusion matrix, without normalization | ||
[[27591 5190][ 360 848]] | ||
Normalized confusion matrix | ||
[[0.84 0.16][0.3 0.7 ]] | ||
Examples: | ||
Total: 339893 | ||
Positive: 12269 (3.61% of total) | ||
0.7718316679565835 | ||
0.7718316679565836 | ||
precision recall f1-score support | ||
0 0.99 0.84 0.91 32781 | ||
1 0.14 0.70 0.23 1208 | ||
micro avg 0.84 0.84 0.84 33989 | ||
macro avg 0.56 0.77 0.57 33989 | ||
weighted avg 0.96 0.84 0.88 33989 | ||
[[27591 5190][ 360 848]] | ||
score | ||
[0.5760348070144456, 0.8367118835449219, 0.5936741828918457] | ||
Test val_loss: 0.5760348070144456 | ||
Test accuracy: 0.8367118835449219 | ||
Saved model to disk | ||
Loaded json model from disk | ||
json Validation loss: 0.560062773128295 | ||
json Validation accuracy: 0.8367118835449219 | ||
json accuracy: 83.67% | ||
Loaded yaml model from disk | ||
yaml Validation loss: 0.560062773128295 | ||
yaml Validation accuracy: 0.8367118835449219 | ||
yaml accuracy: 83.67% | ||
Yaml_train_shape: (271915, 2) | ||
Yaml_test_shape: (33989, 2) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,214 @@ | ||
from __future__ import print_function | ||
|
||
import os | ||
import sys | ||
import logging | ||
|
||
import pandas as pd | ||
import numpy as np | ||
|
||
from sklearn.metrics import mean_squared_error | ||
from sklearn.metrics import r2_score | ||
from scipy.stats.stats import pearsonr | ||
|
||
file_path = os.path.dirname(os.path.realpath(__file__)) | ||
#lib_path = os.path.abspath(os.path.join(file_path, '..')) | ||
#sys.path.append(lib_path) | ||
lib_path2 = os.path.abspath(os.path.join(file_path, '..', '..', 'common')) | ||
sys.path.append(lib_path2) | ||
|
||
import candle | ||
|
||
logger = logging.getLogger(__name__) | ||
candle.set_parallelism_threads() | ||
|
||
additional_definitions = [ | ||
{'name':'latent_dim', | ||
'action':'store', | ||
'type': int, | ||
'help':'latent dimensions'}, | ||
{'name':'residual', | ||
'type': candle.str2bool, | ||
'default': False, | ||
'help':'add skip connections to the layers'}, | ||
{'name':'reduce_lr', | ||
'type': candle.str2bool, | ||
'default': False, | ||
'help':'reduce learning rate on plateau'}, | ||
{'name':'warmup_lr', | ||
'type': candle.str2bool, | ||
'default': False, | ||
'help':'gradually increase learning rate on start'}, | ||
{'name':'base_lr', | ||
'type': float, | ||
'help':'base learning rate'}, | ||
{'name':'epsilon_std', | ||
'type': float, | ||
'help':'epsilon std for sampling latent noise'}, | ||
{'name':'use_cp', | ||
'type': candle.str2bool, | ||
'default': False, | ||
'help':'checkpoint models with best val_loss'}, | ||
#{'name':'shuffle', | ||
#'type': candle.str2bool, | ||
#'default': False, | ||
#'help':'shuffle data'}, | ||
{'name':'use_tb', | ||
'type': candle.str2bool, | ||
'default': False, | ||
'help':'use tensorboard'}, | ||
{'name':'tsne', | ||
'type': candle.str2bool, | ||
'default': False, | ||
'help':'generate tsne plot of the latent representation'} | ||
] | ||
|
||
required = [ | ||
'activation', | ||
'batch_size', | ||
'dense', | ||
'dropout', | ||
'epochs', | ||
'initialization', | ||
'learning_rate', | ||
'loss', | ||
'optimizer', | ||
'rng_seed', | ||
'scaling', | ||
'val_split', | ||
'latent_dim', | ||
'batch_normalization', | ||
'epsilon_std', | ||
'timeout' | ||
] | ||
|
||
class BenchmarkAttn(candle.Benchmark): | ||
|
||
def set_locals(self): | ||
"""Functionality to set variables specific for the benchmark | ||
- required: set of required parameters for the benchmark. | ||
- additional_definitions: list of dictionaries describing the additional parameters for the | ||
benchmark. | ||
""" | ||
|
||
if required is not None: | ||
self.required = set(required) | ||
if additional_definitions is not None: | ||
self.additional_definitions = additional_definitions | ||
|
||
|
||
def extension_from_parameters(params, framework=''): | ||
"""Construct string for saving model with annotation of parameters""" | ||
ext = framework | ||
for i, n in enumerate(params['dense']): | ||
if n: | ||
ext += '.D{}={}'.format(i+1, n) | ||
ext += '.A={}'.format(params['activation'][0]) | ||
ext += '.B={}'.format(params['batch_size']) | ||
ext += '.E={}'.format(params['epochs']) | ||
ext += '.L={}'.format(params['latent_dim']) | ||
ext += '.LR={}'.format(params['learning_rate']) | ||
ext += '.S={}'.format(params['scaling']) | ||
|
||
if params['epsilon_std'] != 1.0: | ||
ext += '.EPS={}'.format(params['epsilon_std']) | ||
if params['dropout']: | ||
ext += '.DR={}'.format(params['dropout']) | ||
if params['batch_normalization']: | ||
ext += '.BN' | ||
if params['warmup_lr']: | ||
ext += '.WU_LR' | ||
if params['reduce_lr']: | ||
ext += '.Re_LR' | ||
if params['residual']: | ||
ext += '.Res' | ||
|
||
return ext | ||
def load_data(params, seed): | ||
|
||
# start change # | ||
if params['train_data'].endswith('h5') or params['train_data'].endswith('hdf5'): | ||
print ('processing h5 in file {}'.format(params['train_data'])) | ||
|
||
url = params['data_url'] | ||
file_train = params['train_data'] | ||
train_file = candle.get_file(file_train, url+file_train, cache_subdir='Pilot1') | ||
|
||
df_x_train_0 = pd.read_hdf(train_file, 'x_train_0').astype(np.float32) | ||
df_x_train_1 = pd.read_hdf(train_file, 'x_train_1').astype(np.float32) | ||
X_train = pd.concat([df_x_train_0, df_x_train_1], axis=1, sort=False) | ||
del df_x_train_0, df_x_train_1 | ||
|
||
df_x_test_0 = pd.read_hdf(train_file, 'x_test_0').astype(np.float32) | ||
df_x_test_1 = pd.read_hdf(train_file, 'x_test_1').astype(np.float32) | ||
X_test = pd.concat([df_x_test_0, df_x_test_1], axis=1, sort=False) | ||
del df_x_test_0, df_x_test_1 | ||
|
||
df_x_val_0 = pd.read_hdf(train_file, 'x_val_0').astype(np.float32) | ||
df_x_val_1 = pd.read_hdf(train_file, 'x_val_1').astype(np.float32) | ||
X_val = pd.concat([df_x_val_0, df_x_val_1], axis=1, sort=False) | ||
del df_x_val_0, df_x_val_1 | ||
|
||
Y_train = pd.read_hdf(train_file, 'y_train') | ||
Y_test = pd.read_hdf(train_file, 'y_test') | ||
Y_val = pd.read_hdf(train_file, 'y_val') | ||
|
||
# assumes AUC is in the third column at index 2 | ||
# df_y = df['AUC'].astype('int') | ||
# df_x = df.iloc[:,3:].astype(np.float32) | ||
|
||
# assumes dataframe has already been scaled | ||
# scaler = StandardScaler() | ||
# df_x = scaler.fit_transform(df_x) | ||
else: | ||
print ('expecting in file file suffix h5') | ||
sys.exit() | ||
|
||
|
||
print('x_train shape:', X_train.shape) | ||
print('x_test shape:', X_test.shape) | ||
|
||
return X_train, Y_train, X_val, Y_val, X_test, Y_test | ||
|
||
# start change # | ||
if train_file.endswith('h5') or train_file.endswith('hdf5'): | ||
print ('processing h5 in file {}'.format(train_file)) | ||
|
||
df_x_train_0 = pd.read_hdf(train_file, 'x_train_0').astype(np.float32) | ||
df_x_train_1 = pd.read_hdf(train_file, 'x_train_1').astype(np.float32) | ||
X_train = pd.concat([df_x_train_0, df_x_train_1], axis=1, sort=False) | ||
del df_x_train_0, df_x_train_1 | ||
|
||
df_x_test_0 = pd.read_hdf(train_file, 'x_test_0').astype(np.float32) | ||
df_x_test_1 = pd.read_hdf(train_file, 'x_test_1').astype(np.float32) | ||
X_test = pd.concat([df_x_test_0, df_x_test_1], axis=1, sort=False) | ||
del df_x_test_0, df_x_test_1 | ||
|
||
df_x_val_0 = pd.read_hdf(train_file, 'x_val_0').astype(np.float32) | ||
df_x_val_1 = pd.read_hdf(train_file, 'x_val_1').astype(np.float32) | ||
X_val = pd.concat([df_x_val_0, df_x_val_1], axis=1, sort=False) | ||
del df_x_val_0, df_x_val_1 | ||
|
||
Y_train = pd.read_hdf(train_file, 'y_train') | ||
Y_test = pd.read_hdf(train_file, 'y_test') | ||
Y_val = pd.read_hdf(train_file, 'y_val') | ||
|
||
# assumes AUC is in the third column at index 2 | ||
# df_y = df['AUC'].astype('int') | ||
# df_x = df.iloc[:,3:].astype(np.float32) | ||
|
||
# assumes dataframe has already been scaled | ||
# scaler = StandardScaler() | ||
# df_x = scaler.fit_transform(df_x) | ||
|
||
else: | ||
print ('expecting in file file suffix h5') | ||
sys.exit() | ||
|
||
|
||
print('x_train shape:', X_train.shape) | ||
print('x_test shape:', X_test.shape) | ||
|
||
return X_train, Y_train, X_val, Y_val, X_test, Y_test | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
[Global_Params] | ||
data_url = 'http://ftp.mcs.anl.gov/pub/candle/public/benchmarks/Pilot1/uno/' | ||
train_data='top_21_1fold_001.h5' | ||
model_name='attn_abs' | ||
dense=[1000, 1000, 1000, 500, 250, 125, 60, 30, 2] | ||
batch_size=32 | ||
epochs=2 | ||
activation=['relu', 'relu', 'softmax', 'relu', 'relu', 'relu', 'relu', 'relu', 'softmax'] | ||
loss='categorical_crossentropy' | ||
optimizer='sgd' | ||
dropout=0.2 | ||
learning_rate=0.00001 | ||
momentum=0.9 | ||
val_split=0.1 | ||
rng_seed=2017 | ||
use_cp=False | ||
early_stop=True | ||
reduce_lr=True | ||
feature_subsample=0 | ||
output_dir='save_abs/EXP01/' | ||
experiment_id='01' | ||
run_id='1' | ||
save_path='save_abs/EXP01/' | ||
target_abs_acc=0.85 | ||
|
||
[Monitor_Params] | ||
timeout=3600 |
Oops, something went wrong.