Support gradient accumulation #229
Closed
btbujiangjun
started this conversation in
Ideas
Replies: 1 comment
-
that's something you can definitely do on a case-by-case basis, but we likely wont add it in the core library we also have plans for gradient checkpointing and a few other levers we are exploring to reduce memory pressure. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Due to limited memory size, We have to set a small batch-size, gradient accumulation maybe a rigid demand
Beta Was this translation helpful? Give feedback.
All reactions