This repository implements a seminal transformer design inspired by the principles outlined in the paper Attention is All You Need. The coding implementation was adopted from a tutorial made by Umar Jamil. More details can be found at Project Report.pdf
.
The model is trained on the English-Chinese section of the OPUS-100 dataset on Hugging Face.
- Training Data: 1 million samples
- Validation Data: 2,000 samples
- Test Data: 2,000 samples
-
Hardware: The model is trained and evaluated on an NVIDIA 3070 GPU. While hardware with lower specifications may be compatible, it could result in longer training time.
-
System: Ubuntu 22.04
- Anaconda: download and install Anaconda.
Create a new Conda environment named transformer
and install all necessary packages within this environment.
conda create -n transformer python=3.10
conda activate transformer
Clone the repository from GitHub.
git clone https://github.com/QZJGeorge/STATS507-Final-Project.git
Install the required python packages
pip install -r requirements.txt
For detailed training parameters, refer to config.py
. To start training, run the following command. This will train and save the transformer model.
python3 train.py
You can also download our pre-trained model and place it in the Helsinki-NLP/opus-100_weights folder.
The validation and test datasets were combined, resulting in a total of 4,000 samples. The model's performance is evaluated using BERTScore.
To run the evaluation, execute the following command:
python3 evaluate.py
Our pre-trained model achieved a Precision score of 0.795, Recall score of 0.766, and F1 score of 0.779.
If you encounter the error torch.cuda.OutOfMemoryError
, open config.py
and reduce the batch size.
Zhijie Qiao [email protected]
Distributed under the MIT License.