Skip to content

It's training

Latest
Compare
Choose a tag to compare
@proger proger released this 11 Mar 15:58
· 20 commits to main since this release

The default model config is now narrower, and the model trains stably. Key changes: add an epsilon inside sqrt to avoid nan gradients, adjust the range of forget_base to be closer to the paper.