Fast Transformer

This repo implements Fastformer: Additive Attention Can Be All You Need by Wu et al. in Pytorch based on the implementation of Rishit Dagli. Fast Transformer is a Transformer variant based on additive attention that can handle long sequences efficiently with linear complexity. Fastformer is much more efficient than many existing Transformer models and can meanwhile achieve comparable or even better long text modeling performance.

Installation

Run the following to install:

pip install fast-transformer-torch

Developing fast-transformer

To install fast-transformer-torch, along with tools you need to develop and test, run the following in your virtualenv:

git clone https://github.com/talipturkmen/Fast-Transformer-Pytorch.git
# or clone your own fork

cd fast-transformer-torch
pip install -e .[dev]

Usage

from fast_transformer_torch import FastTransformer
import torch

mask = torch.ones([16, 4096], dtype=torch.bool)
model = FastTransformer(num_tokens = 20000,
                        dim = 512,
                        depth = 2,
                        max_seq_len = 4096,
                        absolute_pos_emb = True, # Absolute positional embeddings
                        mask = mask
                        )
x = torch.randint(0, 20000, (16, 4096))

logits = model(x) # (1, 4096, 20000)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Fast Transformer

Installation

Developing fast-transformer

Usage

Files

README.md

Latest commit

History

README.md

File metadata and controls

Fast Transformer

Installation

Developing fast-transformer

Usage