Skip to content

An end to end implementation of Google's BERT model. The model can be initialized with pre-trained weights from HF's BertModel.

Notifications You must be signed in to change notification settings

tanwanirahul/BERT_from_scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BERT

Implementation of BERT (base) Model

Paper: https://arxiv.org/pdf/1810.04805

The implementation contains BERT model that is only intended to be used as an Encoder. Using BERT as the Decoder isn't supported.

The implementation uses torch's SDPA attention mechanism which is an optimized implementation that leverages optimized cuda kernels if the CUDA backend is available.

The implementation is based on the parameter configuration of the BERT base model. Below are the key parameter configuration details (defined in config.py)

  • Embedding dimension (n_embed): 768
  • No. of Encoder blocks (n_layers): 12
  • No. of Heads in each Encoder block (n_heads): 12
  • Max Sequence Length: 512

Furthermore, though the entire architecture is constructed end to end, the implementation does not contain the training loop. Instead, the weights are loaded/transferred from the HF's Bert Model. The training loop might be added in future.

To confirm the implementation is accurate, the implementation contains validate.py that compares the output from BERT model that is implemented in the code with the output from HF's Bert Model.

About

An end to end implementation of Google's BERT model. The model can be initialized with pre-trained weights from HF's BertModel.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages