Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory requirement #3

Open
jaykumar16 opened this issue Apr 19, 2020 · 0 comments
Open

memory requirement #3

jaykumar16 opened this issue Apr 19, 2020 · 0 comments

Comments

@jaykumar16
Copy link

Hey,

What are the memory requirement to train this model? I am providing 187GB of RAm and it fails after

INFO:tensorflow:Saving checkpoints for 0 into summary/knee_l1/model.ckpt.
Here the memory requirement changes from 4GB to more than 187 GB and the job gets killed as it runs out of memory.

I am just running the model based on train_all.sh command, where I have decreased the batch size from 2 to 1 and iteration steps from 10000 to only 10.

python3 recon_train.py
--shape_y 320 --shape_z 256
--num_channels 8 --num_maps 1
--batch_size 1
--model_dir summary/knee_l1
--loss_l1 1
--max_steps 10
--device $device

Can you please help me, I am unable to train the model? I am proving 1 GPU of 16 GB. Does this model design to run on multiple nodes and CPU?

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant