Skip to content

Latest commit

 

History

History
15 lines (9 loc) · 897 Bytes

README.md

File metadata and controls

15 lines (9 loc) · 897 Bytes

Block Recurrent Transformer

A PyTorch implementation of Hutchins & Schlag et al.. Owes very much to Phil Wang's x-transformers. Very much in-progress.

Dockerfile, requirements.txt, and environment.yaml because I love chaos.

Differences from the Paper (as of 2022/05/04)

  • Keys and values are not shared between the "vertical" and "horizontal" directions (the standard input -> output information flow and the recurrent state flow, respectively).
  • The state vectors are augmented with Rotary Embeddings for positional encoding, instead of using learned embeddings.
  • The special LSTM gate initialization is not yet implemented.