Skip to content

Commit

Permalink
Release candidate v0.2.0 (#10)
Browse files Browse the repository at this point in the history
* Added some context for EvNN

* AdamW, Simpler Thresholds (#5)

* Switched to AdamW optimizer, simplified threshold parameterization, slight changes to the training of thresholds

* removed wandb from training script

* fixed inference script and updated README.md

* Improved setup and install (#8)

* improve setup and remove makefiles

* remove makefile

---------

authored-by: KhaleelKhan <[email protected]>

* bump up version, update readme

* include required files in distributed archive

* only require nvcc to compile cuda kernels

* cleaned LM code from pruning attempts

* update changelog and prepare merge

---------

Co-authored-by: Anand <[email protected]>
Co-authored-by: Mark Schoene <[email protected]>
Co-authored-by: KhaleelKhan <[email protected]>
  • Loading branch information
4 people authored May 24, 2024
1 parent 1c49161 commit eaf293a
Show file tree
Hide file tree
Showing 17 changed files with 339 additions and 506 deletions.
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
# ChangeLog

## 0.2.0-egru (2024-05-24)
### Changed
- Simplified install and removed makefile
- CUDA compute capability is automatically detected
- Update Readme with the setup instruction
- Update Dockerfile
- Cleaned LM pruning code


## 0.1.0-egru (2022-03-01)
### Changed
- Project forked from original
Expand Down
1 change: 0 additions & 1 deletion build/MANIFEST.in → MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
include Makefile
include frameworks/pytorch/*.h
include frameworks/pytorch/*.cc
include lib/*.cc
Expand Down
66 changes: 0 additions & 66 deletions Makefile

This file was deleted.

25 changes: 15 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,36 +30,41 @@ Here's what you'll need to get started:
- a [CUDA Compute Capability](https://developer.nvidia.com/cuda-gpus) 3.7+ GPU (required only if using GPU)
- [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit) 11.0+ (required only if using GPU)
- [PyTorch](https://pytorch.org) 1.3+ for PyTorch integration (GPU optional)
- [BLAS](https://netlib.org/blas/) or any BLAS-like library for CPU computation.
- [Eigen 3](http://eigen.tuxfamily.org/) to build the C++ examples (optional)
- [OpenBLAS](https://www.openblas.net/) or any BLAS-like library for CPU computation.

Once you have the prerequisites, you can install with pip or by building the source code.

<!-- ### Using pip
### Using pip
```
pip install evnn_pytorch
``` -->
```

### Building from source
> **Note**
>
> Currenty supported only on Linux, use Docker for building on Windows.
Build and install it with `pip`:
```bash
make evnn_pytorch # Build PyTorch API
pip install .
```
### Building in Docker

If you built the PyTorch API, install it with `pip`:
Build docker image:
```bash
pip install evnn_pytorch-*.whl
docker build -t evnn -f docker/Dockerfile .
```

If the CUDA Toolkit that you're building against is not in `/usr/local/cuda`, you must specify the
`$CUDA_HOME` environment variable before running make:
Example usage:
```bash
CUDA_HOME=/usr/local/cuda-10.2 make
docker run --rm --gpus=all evnn python -m unittest discover -p "*_test.py" -s /evnn_src/validation -v
```

> **Note**
>
> The build script tries to automatically detect GPU compute capability. In case the GPU is not available during compilation, for example when building with docker or when using compute cluster login nodes for compiling, Use enviroment variable `EVNN_CUDA_COMPUTE` to set the required compute capability.
> Example: For CUDA Compute capability 8.0 use ```export EVNN_CUDA_COMPUTE=80```
## Performance

Code for the experiments and benchmarks presented in the paper are published in ``benchmarks`` directory.
Expand Down
22 changes: 17 additions & 5 deletions benchmarks/lm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,30 @@ To run the language modeling experiments, first download the data

./getdata <data_dir>

Then run Penn Treebank experiments with EGRU (1350 units)
We [provide checkpoints for EGRU](https://cloudstore.zih.tu-dresden.de/index.php/s/NPQ9pLnpZnTsM5X) with 3 layers of hidden size (1350, 1350, 750)

python lm/train.py --data path_to_your_data --scratch ./log --dataset PTB --epochs 2500 --rnn_type egru --layers 3 --hidden_dim 1350 --batch_size=64 --bptt=68 --dropout_connect=0.6788113982442464 --dropout_emb=0.7069992062976298 --dropout_forward=0.2641540030663871 --dropout_words=0.05460274136214911 --emb_dim=788 --learning_rate=0.00044406742918918466 --pseudo_derivative_width=2.179414375864446 --thr_init_mean=-3.76855645544185 --weight_decay=9.005509348932795e-06 --seed 12008
# Penn Treebank
To train EGRU on Penn Treebank word-level language modeling, run

or EGRU (2000 units)
python benchmarks/lm/train.py --data=/path/to/data --scratch=/your/scratch/directory/Experiments --dataset=PTB --epochs=1000 --batch_size=64 --rnn_type=egru --layer=3 --bptt=70 --scheduler=cosine --weight_decay=0.10 --learning_rate=0.0012 --learning_rate_thresholds 0.0 --emb_dim=750 --dropout_emb=0.6 --dropout_words=0.1 --dropout_forward=0.25 --grad_clip=0.1 --thr_init_mean=0.01 --dropout_connect=0.7 --hidden_dim=1350 --pseudo_derivative_width=3.6 --scheduler_start=700 --seed=9612

python lm/train.py --data path_to_your_data --scratch ./log --dataset PTB --epochs 2500 --rnn_type egru --layers 3 --hidden_dim 2000 --batch_size=128 --bptt=67 --dropout_connect=0.621405385527356 --dropout_emb=0.7651296208061924 --dropout_forward=0.24131807369801447 --dropout_words=0.14942681962154375 --emb_dim=786 --learning_rate=0.000494172266064804 --pseudo_derivative_width=2.35216907207571 --thr_init_mean=-3.4957794302256007 --weight_decay=6.6878095661652755e-06 --seed 52798
For inference with the [provided checkpoint](https://cloudstore.zih.tu-dresden.de/index.php/s/NPQ9pLnpZnTsM5X), run

python benchmarks/lm/infer.py --data /path/to/data --dataset PTB --datasplit test --batch_size 1 --directory /path/to/checkpoint

# Wikitext-2
To train EGRU on Wikitext-2, run

python benchmarks/lm/train.py --data=/your/data/directory --scratch=/your/scratch/directory/Experiments --dataset=WT2 --epochs=800 --batch_size=128 --rnn_type=egru --layer=3 --bptt=70 --scheduler=cosine --weight_decay=0.12 --learning_rate=0.001 --learning_rate_thresholds 0.0 --emb_dim=750 --dropout_emb=0.7 --dropout_words=0.1 --dropout_forward=0.25 --grad_clip=0.1 --thr_init_mean=0.01 --dropout_connect=0.7 --hidden_dim=1350 --pseudo_derivative_width=3.6 --scheduler_start=400 --seed=913420

For inference with the [provided checkpoint](https://cloudstore.zih.tu-dresden.de/index.php/s/NPQ9pLnpZnTsM5X), run

python benchmarks/lm/infer.py --data /path/to/data --dataset WT2 --datasplit test --batch_size 1 --directory /path/to/checkpoint

Various flags can be passed to change the defaults parameters.
See "train.py" for a list of all available arguments.

This code was tested with PyTorch >= 1.9.0
This code was tested with PyTorch >= 1.9.0, CUDA 11.

A large batch of code stems from Salesforce AWD-LSTM implementation:
https://github.com/salesforce/awd-lstm-lm
5 changes: 2 additions & 3 deletions benchmarks/lm/eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,16 +14,15 @@
# ==============================================================================

import torch
import lm.data as d
import data as d


def evaluate(model, eval_data, criterion, batch_size, bptt, ntokens, device, return_hidden=False):
def evaluate(model, eval_data, criterion, batch_size, bptt, ntokens, device, hidden_dims, return_hidden=False):
# turn on evaluation mode
model.eval()

# initialize evaluation metrics
iter_range = range(0, eval_data.size(0) - 1, bptt)
hidden_dims = [rnn.hidden_size for rnn in model.rnns]

total_loss = 0.
mean_activities = torch.zeros(len(iter_range), dtype=torch.float16, device=device)
Expand Down
68 changes: 11 additions & 57 deletions benchmarks/lm/infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,9 @@
import torch
import torch.nn

import lm.data as d
from lm.models import LanguageModel
from lm.eval import evaluate
import data as d
from models import LanguageModel
from eval import evaluate


def get_args():
Expand All @@ -37,7 +37,6 @@ def get_args():
argparser.add_argument('--batch_size', type=int, default=80)
argparser.add_argument('--directory', type=str, required=False, help='model directory for checkpoints and config')
argparser.add_argument('--hidden', action='store_true', help='returns the hidden states of the whole dataset to perform analysis')
argparser.add_argument('--prune', type=float, default=0.0)

return argparser.parse_args()

Expand Down Expand Up @@ -85,14 +84,19 @@ def main(args):
model = LanguageModel(**model_args).to(device)
elif config['rnn_type'] == 'egru':
model = LanguageModel(**model_args,
dampening_factor=config['damp_factor'],
dampening_factor=config['pseudo_derivative_width'],
pseudo_derivative_support=config['pseudo_derivative_width']).to(device)
else:
raise RuntimeError("Unknown RNN type: %s" % config['rnn_type'])

best_model_path = os.path.join(args.directory, 'checkpoints', f"{config['rnn_type'].upper()}_best_model.cpt")
model.load_state_dict(torch.load(best_model_path, map_location=device))

if model_args['rnn_type'] == 'egru':
hidden_dims = [rnn.hidden_size for rnn in model.rnns]
else:
hidden_dims = [rnn.module.hidden_size if args.dropout_connect > 0 else rnn.hidden_size for rnn in model.rnns]

criterion = torch.nn.CrossEntropyLoss()

if args.hidden:
Expand All @@ -104,6 +108,7 @@ def main(args):
bptt=config['bptt'],
ntokens=vocab_size,
device=device,
hidden_dims=hidden_dims,
return_hidden=True)
save_file = os.path.join(args.directory, f'hidden_states_{args.datasplit}.hdf')
with h5py.File(save_file, 'w') as f:
Expand All @@ -121,6 +126,7 @@ def main(args):
bptt=config['bptt'],
ntokens=vocab_size,
device=device,
hidden_dims=hidden_dims,
return_hidden=False)

test_ppl = math.exp(test_loss)
Expand All @@ -131,58 +137,6 @@ def main(args):
print(f'Layerwise activity {test_layerwise_activity_mean.tolist()} +- {test_layerwise_activity_std.tolist()}')
print('=' * 89)

if args.prune > 0.0 and args.hidden:
print(f"Model Parameter Count: {sum(p.numel() for p in model.parameters() if p.requires_grad)}")
input_indices = torch.arange(model.rnns[0].input_size).to(device)
for i in range(model.nlayers):
if i < model.nlayers - 1:
# get event frequencies
hid_dim = all_hiddens[i].shape[2]
hid_cells = all_hiddens[i].reshape(-1, hid_dim)
seq_len = hid_cells.shape[0]
spike_frequency = torch.sum(hid_cells != 0, dim=0) / seq_len
print(
f"Layer {i + 1}: "
f"less than 1/100: {torch.sum(spike_frequency < 0.01)} / {spike_frequency.shape} "
f"// never: {torch.sum(hid_cells.sum(dim=0) == 0)} / {spike_frequency.shape}")

# compute remaining indicies from spike frequencies
topk = int(model.rnns[i].hidden_size * (1 - args.prune))
hidden_indices, _ = torch.sort(torch.argsort(spike_frequency, descending=True)[:topk], descending=False)
hidden_indices = hidden_indices.to(device)
else:
hidden_indices = torch.arange(model.rnns[i].hidden_size).to(device)
model.rnns[i].prune_units(input_indices, hidden_indices)
input_indices = hidden_indices

print(f"Model Parameter Count: {sum(p.numel() for p in model.parameters() if p.requires_grad)}")

test_loss, test_activity, test_layerwise_activity_mean, test_layerwise_activity_std, centered_cell_states, all_hiddens = \
evaluate(model=model,
eval_data=test_data,
criterion=criterion,
batch_size=args.batch_size,
bptt=config['bptt'],
ntokens=vocab_size,
device=device,
return_hidden=True)
for i in range(model.nlayers - 1):
# get event frequencies
hid_dim = all_hiddens[i].shape[2]
hid_cells = all_hiddens[i].reshape(-1, hid_dim)
seq_len = hid_cells.shape[0]
spike_frequency = torch.sum(hid_cells != 0, dim=0) / seq_len
print(
f"less than 1/100: {torch.sum(spike_frequency < 0.01)} / {spike_frequency.shape} "
f"// never: {torch.sum(hid_cells.sum(dim=0) == 0)} / {spike_frequency.shape}")
test_ppl = math.exp(test_loss)
print('=' * 89)
print(f'| Inference | test loss {test_loss:5.2f} | '
f'test ppl {test_ppl:8.2f} | '
f'test mean activity {test_activity}')
print(f'Layerwise activity {test_layerwise_activity_mean.tolist()} +- {test_layerwise_activity_std.tolist()}')
print('=' * 89)


if __name__ == "__main__":
args = get_args()
Expand Down
58 changes: 3 additions & 55 deletions benchmarks/lm/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@
import torch.nn as nn
import torch.nn.functional as F
import evnn_pytorch as evnn
from lm.modules import VariationalDropout, WeightDrop
from lm.embedding_dropout import embedded_dropout
from modules import VariationalDropout, WeightDrop
from embedding_dropout import embedded_dropout
from typing import Union


Expand Down Expand Up @@ -64,9 +64,8 @@ def forward(self, x):
bs, seq_len, ninp = x.shape
if self.project:
x = x.view(-1, ninp)
x = F.relu(self.projection(x))
x = self.projection(x)
x = x.view(bs, seq_len, self.nemb)
x = self.variational_dropout(x, self.dropout)
x = x.view(-1, self.nemb)
x = self.decoder(x)
return x
Expand Down Expand Up @@ -155,57 +154,6 @@ def __init__(self,

self.backward_sparsity = torch.zeros(len(self.rnns))

def prune_embeddings(self, index):
device = next(self.parameters()).device
self.embeddings.weight = nn.Parameter(
self.embeddings.weight[:, index]).to(device)
self.emb_dim = self.embeddings.weight.shape[1]
self.decoder = Decoder(ninp=self.hidden_dim if self.projection else self.emb_dim, ntokens=self.vocab_size,
project=self.projection, nemb=self.emb_dim,
dropout=self.dropout_forward).to(device)
self.decoder.decoder.weight = self.embeddings.weight

def prune(self, fractions, hiddens, device):
# calculate new hidden dimensions
indicies = [torch.arange(self.rnns[0].input_size).to(device)]

for i in range(self.nlayers):
if isinstance(fractions, float):
frac = fractions
elif isinstance(fractions, tuple) or isinstance(fractions, list):
frac = fractions[i]
else:
raise NotImplementedError(
f"data type {type(fractions)} not implemented. Use float, tuple or list")

# get event frequencies
hid_dim = hiddens[i].shape[2]
hid_cells = hiddens[i].reshape(-1, hid_dim)
seq_len = hid_cells.shape[0]
spike_frequency = torch.sum(hid_cells != 0, dim=0) / seq_len
print(
f"Layer {i + 1}: "
f"less than 1/100: {torch.sum(spike_frequency < 0.01)} / {spike_frequency.shape} "
f"// never: {torch.sum(hid_cells.sum(dim=0) == 0)} / {spike_frequency.shape}")

# compute remaining indicies from spike frequencies
topk = int(self.rnns[i].hidden_size * (1 - frac))
hidden_indices, _ = torch.sort(torch.argsort(
spike_frequency, descending=True)[:topk], descending=False)
hidden_indices = hidden_indices.to(device)
indicies.append(hidden_indices)

# input dimension equals embedding dimension for tied weights
indicies[0] = indicies[-1]

# prune weights
for i in range(self.nlayers):
self.rnns[i].prune_units(indicies[i], indicies[i+1])

self.prune_embeddings(indicies[-1])
print(
f"Final model hidden size: {[rnn.hidden_size for rnn in self.rnns]}")

def init_embedding(self, initrange):
nn.init.uniform_(self.embeddings.weight, -initrange, initrange)

Expand Down
Loading

0 comments on commit eaf293a

Please sign in to comment.