You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What are the memory requirement to train this model? I am providing 187GB of RAm and it fails after
INFO:tensorflow:Saving checkpoints for 0 into summary/knee_l1/model.ckpt.
Here the memory requirement changes from 4GB to more than 187 GB and the job gets killed as it runs out of memory.
I am just running the model based on train_all.sh command, where I have decreased the batch size from 2 to 1 and iteration steps from 10000 to only 10.
Hey,
What are the memory requirement to train this model? I am providing 187GB of RAm and it fails after
INFO:tensorflow:Saving checkpoints for 0 into summary/knee_l1/model.ckpt.
Here the memory requirement changes from 4GB to more than 187 GB and the job gets killed as it runs out of memory.
I am just running the model based on train_all.sh command, where I have decreased the batch size from 2 to 1 and iteration steps from 10000 to only 10.
python3 recon_train.py
--shape_y 320 --shape_z 256
--num_channels 8 --num_maps 1
--batch_size 1
--model_dir summary/knee_l1
--loss_l1 1
--max_steps 10
--device $device
Can you please help me, I am unable to train the model? I am proving 1 GPU of 16 GB. Does this model design to run on multiple nodes and CPU?
Thank you.
The text was updated successfully, but these errors were encountered: