You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are trying to train a model with factors, but running into out of memory problems:
When running marian with data shuffling, the training uses ~90Gb of RAM, regardless of shuffle-in-ram.
Same model, disabled shuffling, it peaks at ~40Gb
The baseline SPM model, which uses exactly the same data but without factors, with shuffle-in-ram:false, peaks at ~25Gb
We are using factors-combine: sum, but not sure this has a large effect on RAM usage.
It seems marian is using significantly more RAM when shuffling data using factored models. Maybe it is ignoring shuffle-in-ram: false?
For reference, vocab+factors+valid entries stats, which looks OK to me:
[2023-02-10 14:56:47] [vocab] Loading vocab spec file ../wd.all2022.en-fr.en-fr/vocab.en.new.fsv
[2023-02-10 14:56:47] [vocab] Factor group '(lemma)' has 32000 members
[2023-02-10 14:56:47] [vocab] Factor group '|d' has 114 members
[2023-02-10 14:56:47] [vocab] Factor group '|s' has 4 members
[2023-02-10 14:56:47] [vocab] Factor group '|c' has 3 members
[2023-02-10 14:56:47] [vocab] Factored-embedding map read with total/unique of 127985/32121 factors from 32000 example words (in space of 73,602,300)
[2023-02-10 14:56:47] [vocab] Expanding all valid vocab entries out of 73,602,300...
[2023-02-10 14:57:11] [vocab] Completed, total 43769165 valid combinations
[2023-02-10 14:57:11] [data] Setting vocabulary size for input 0 to 43,769,165
As a side question (and sorry to mix it with the bug), the size of the expanded space is:
(32000+1)(114+1)(4+1)*(3+1)=73602300
To me it seems marian is reserving an extra vocab word for UNK on each factor, but this will not happen. Is there a flag to inhibit this behaviour?
Thanks a lot
The text was updated successfully, but these errors were encountered:
Bug description
We are trying to train a model with factors, but running into out of memory problems:
shuffle-in-ram:false
, peaks at ~25GbWe are using
factors-combine: sum
, but not sure this has a large effect on RAM usage.It seems marian is using significantly more RAM when shuffling data using factored models. Maybe it is ignoring
shuffle-in-ram: false
?For reference, vocab+factors+valid entries stats, which looks OK to me:
Context
Marian v1.11.0 f00d062 2022-02-08 08:39:24 -0800
We also observed the same behaviour with rev. 3c2a432
Comments
As a side question (and sorry to mix it with the bug), the size of the expanded space is:
(32000+1)(114+1)(4+1)*(3+1)=73602300
To me it seems marian is reserving an extra vocab word for UNK on each factor, but this will not happen. Is there a flag to inhibit this behaviour?
Thanks a lot
The text was updated successfully, but these errors were encountered: