This is the code used to optimize the weights of an under-parameterized neural network. Training is done via gradient flow using MLPGradientFlow.jl ( In this repo, we release the code to train and visualize the result of training for the erf activation function and standard Gaussian input data (
- This file
- Simulation file
- Script to see the loss curves and gradient norms
- Script to see the summary of training
for all widths - Script to visualize the weights at convergence
- Helper functions
To visualize the results using Python as done in this repo, need to install
- juliacall
- numpy
- matplotlib
We find that gradient flow converges to either one of two minima depending on the direction of initialization when the student width is about one-half of teacher width.
We plot the results for

Jan 15, 2024