microGPT

microGPT is a micro-scaled GPT model implemented from scratch with 10 million parameters. Coded. From. Scratch. This project demonstrates a comprehensive understanding of the transformer architecture, including self-attention and cross-attention mechanisms, layer normalization, and the overall structure of a GPT-like model.

Project Overview

The goal of this project was to recreate and train a miniaturized version of GPT (Generative Pre-trained Transformer) from the ground up. This involved coding the entire model architecture, training process, and evaluation metrics from scratch. The project serves as a testament to the under-the-hood understanding of how transformer models function.

Model Architecture

Key Components

Self-Attention Mechanism: Used to capture dependencies between different parts of the input sequence.
Feedforward Neural Networks: Applied after the attention mechanism to process the representations.
Layer Normalization: Implemented to stabilize and accelerate training.
Residual Connections: Added to help the gradients flow through the network.

Layers

input -> token_embedding + position_embedding -> input_embedding -> * [[self attention -> dropout -> layer_norm] *skip_connection -> @ [linear_mlp -> dropout -> layer_norm]] @skip_connection x 6 -> layer_norm -> linear_lm_head -> logits

Hyperparameters

Train Split: 0.9
Validation Split: 0.1
Batch Size: 64
Block Size: 256
Embedding Size: 384
Number of Attention Heads: 6
Number of Layers: 6
Loss Evaluation Interval: 500 epochs
Loss Evaluation Iterations: 200
Learning Rate: 3e-4
Epochs: 5000
Dropout: 0.2

Results

Parameters: 10,788,929
Train Loss: 1.1111
Validation Loss: 1.4815

Notes

Detailed notes taken during the development and training of this model can be found in the my notes file.

Loss Record

The detailed record of training and validation losses at each phase can be found in the loss records file.

Conclusion

microGPT is a robust miniaturized version of GPT, built and trained from scratch. It serves as a comprehensive demonstration of the inner workings of transformer models and their training processes. This project showcases the depth of understanding and practical implementation skills in the field of neural networks and deep learning.

Reach out

Email - [email protected] LinkedIn | Dreamer.ai | Portfolio

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
demo		demo
images		images
output		output
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

microGPT

Project Overview

Model Architecture

Key Components

Layers

Hyperparameters

Results

Notes

Loss Record

Conclusion

Reach out

About

Releases

Packages

Languages

TejasKalsait/microGPT

Folders and files

Latest commit

History

Repository files navigation

microGPT

Project Overview

Model Architecture

Key Components

Layers

Hyperparameters

Results

Notes

Loss Record

Conclusion

Reach out

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages