CUDA out of memory issue #10

danieleghisi · 2017-01-19T03:10:02Z

Hi, I'm having an "out of memory" issue while running the demo.

Snippet:
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:04:00.0
Total memory: 11.91GiB
Free memory: 11.67GiB
(full log below)

I have tried to lower the model parameters, but nothing seems to work. Do you have any advice?
Why does the demo take so much GPU memory?
Thanks a lot,
Daniele

Full log:
python demo.py
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:119] Couldn't open CUDA library libcudnn.so. LD_LIBRARY_PATH:
I tensorflow/stream_executor/cuda/cuda_dnn.cc:3459] Unable to load cuDNN DSO
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:04:00.0
Total memory: 11.91GiB
Free memory: 11.67GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x48c4140
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties:
name: GeForce GTX 980
major: 5 minor: 2 memoryClockRate (GHz) 1.342
pciBusID 0000:0a:00.0
Total memory: 3.94GiB
Free memory: 487.88MiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x48c0320
E tensorflow/core/common_runtime/direct_session.cc:135] Internal: failed initializing StreamExecutor for CUDA device ordinal 2: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OUT_OF_MEMORY; total memory reported: 18446744073648275456
Traceback (most recent call last):
File "demo.py", line 16, in
gpu_fraction=gpu_fraction)
File "/home/daniele/fast-wavenet-master/wavenet/models.py", line 54, in init
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
File "/home/daniele/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1186, in init
super(Session, self).init(target, graph, config=config)
File "/home/daniele/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 551, in init
self._session = tf_session.TF_NewDeprecatedSession(opts, status)
File "/usr/lib/python2.7/contextlib.py", line 24, in exit
self.gen.next()
File "/home/daniele/.local/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Failed to create session.

ianni67 · 2017-01-23T12:57:08Z

I'm facing the same error. Titan X, 11.92GB memory available.
Looks like TF is trying to get as much memory as the card exposes, thus being never satisfied.
There must be some wrong configuration somewhere.

tomlepaine · 2017-02-01T00:53:46Z

Hi @danieleghisi @ianni67 I have definitely run the demo with less memory. 6 GB I think. Did you have any luck?

danieleghisi · 2017-02-01T00:55:04Z

Not really, still same issue for me...

…

----- Messaggio originale ----- Da: "Tom Le Paine" <[email protected]> Inviato: ‎01/‎02/‎2017 01:53 A: "tomlepaine/fast-wavenet" <[email protected]> Cc: "danieleghisi" <[email protected]>; "Mention" <[email protected]> Oggetto: Re: [tomlepaine/fast-wavenet] CUDA out of memory issue (#10) Hi @danieleghisi @ianni67 I have definitely run the demo with less memory. 6 GB I think. Did you have any luck? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

ianni67 · 2017-02-02T10:16:50Z

I'm very sorry for my late reply. I eventually solved the issue, and, alas, (or luckily?) it was my own fault. I did not manage the GPUs correctly, so two processes where contending the same GPU, both requesting all its memory. Confining the processes on separate GPUs (by using CUDA_VISIBLE_DEVICES appropriately) solved the problem.
Sorry for the wrong issue and the late reply!

tomlepaine · 2017-02-09T21:11:06Z

@ianni67, I'm glad you resolved your issue. @danieleghisi do you think you have the same problem?

danieleghisi · 2017-03-03T09:30:02Z

Sorry for the late reply. CUDA_VISIBLE_DEVICES solves the memory issues, but now I get a CuDNN version error:

E tensorflow/stream_executor/cuda/cuda_dnn.cc:378] Loaded runtime CuDNN library: 5005 (compatibility version 5000) but source was compiled with 5105 (compatibility version 5100). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.

Looks like an issue in tensorflow installation (I can't update CuDNN, due to other dependencies.)
I'll try to google how to solve this...

Daniele

danieleghisi · 2017-03-03T19:08:03Z

@tomlepaine Thanks Tom, I've managed to update CuDNN and dependencies, and the training works fine! On the other hand, the generation is always a constant value, the seed (perfectly flat waveform)...

May I be doing something wrong?
I've just added a scipy.io.wavfile.write("out.wav", 44100, numpy.array(predictions[0]))
as very last line, to save the output file...

Thanks again for your support,
Daniele

tomlepaine · 2017-03-07T00:19:43Z

@danieleghisi why are you saving predictions[0]? What is the shape of predictions?

Glad you are closer to getting the code working!

danieleghisi · 2017-03-09T22:49:54Z

Hi Tom,
the shape of predictions is (1, 32000)
However don't worry, I've just noticed that the tensorflow-wavenet implementation uses this fast generation; I'll stick to that model for now!
Thanks for your support,
d

ishandutta2007 · 2018-11-05T09:48:10Z

@danieleghisi if you have succeeded in training it now, can you please share the model.

francois-baptiste · 2019-02-01T08:57:03Z

Setting "gpu_fraction = .95" instead of 1 in the demo.py solve the problem for me!

ishandutta2007 · 2019-02-02T10:55:48Z

@francois-baptiste can you share the model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA out of memory issue #10

CUDA out of memory issue #10

danieleghisi commented Jan 19, 2017

ianni67 commented Jan 23, 2017

tomlepaine commented Feb 1, 2017

danieleghisi commented Feb 1, 2017 via email

ianni67 commented Feb 2, 2017 •

edited

Loading

tomlepaine commented Feb 9, 2017 •

edited

Loading

danieleghisi commented Mar 3, 2017 •

edited

Loading

danieleghisi commented Mar 3, 2017 •

edited

Loading

tomlepaine commented Mar 7, 2017

danieleghisi commented Mar 9, 2017

ishandutta2007 commented Nov 5, 2018

francois-baptiste commented Feb 1, 2019

ishandutta2007 commented Feb 2, 2019

CUDA out of memory issue #10

CUDA out of memory issue #10

Comments

danieleghisi commented Jan 19, 2017

ianni67 commented Jan 23, 2017

tomlepaine commented Feb 1, 2017

danieleghisi commented Feb 1, 2017 via email

ianni67 commented Feb 2, 2017 • edited Loading

tomlepaine commented Feb 9, 2017 • edited Loading

danieleghisi commented Mar 3, 2017 • edited Loading

danieleghisi commented Mar 3, 2017 • edited Loading

tomlepaine commented Mar 7, 2017

danieleghisi commented Mar 9, 2017

ishandutta2007 commented Nov 5, 2018

francois-baptiste commented Feb 1, 2019

ishandutta2007 commented Feb 2, 2019

ianni67 commented Feb 2, 2017 •

edited

Loading

tomlepaine commented Feb 9, 2017 •

edited

Loading

danieleghisi commented Mar 3, 2017 •

edited

Loading

danieleghisi commented Mar 3, 2017 •

edited

Loading