This repository contains the code for our project in the course IFT6760B: Neural Scaling Laws and Foundation Models taught in the Winter semester of 2022 at Mila/University of Montreal.
The slides for our presentation, titled "On Layer Normalization for Vision Transformers", can be found here.
The goal of our project was to understand the effect of PreNorm and PostNorm versions of the Vision Transformer. While the literature contains some studies that look into the Pre- and Post-Norm versions of the vanilla transformer applied to language data, a similar analysis for vision data using vision transformers (ViT) is lacking. We used 4 datasets: CIFAR10, CIFAR100, Imagenette, and Imagewoof and trained them from scratch.
The idea for this project was primarily inspired by this paper.