Memory required for inference using GPU #4

W-QY · 2024-10-10T06:06:31Z

Thank you for your outstanding work！

I would like to know how much memory the device needs to have when using GPU to perform inference "python scripts/infer.py --opts-path configs/infer/lmo.json"? I got the error "cudaMalloc error out of memory [2]" after executing the code:

I1010 14:02:55.50 3011677 infer.py:232] Building KNN index for template 5...
I1010 14:02:55.275 3011677 infer.py:232] Building KNN index for template 6...
I1010 14:02:55.488 3011677 infer.py:232] Building KNN index for template 7...
I1010 14:02:55.713 3011677 infer.py:232] Building KNN index for template 8...
I1010 14:02:55.938 3011677 infer.py:232] Building KNN index for template 9...
I1010 14:02:56.154 3011677 infer.py:232] Building KNN index for template 10...
Traceback (most recent call last):
File "/home/data1/user/foundpose/scripts/infer.py", line 832, in
main()
File "/home/data1/user/foundpose/scripts/infer.py", line 828, in main
infer(opts)
File "/home/data1/user/foundpose/scripts/infer.py", line 240, in infer
template_knn_index.fit(template_feats)
File "/home/data1/user/foundpose/utils/knn_util.py", line 55, in fit
self.index = faiss.index_cpu_to_gpu(self.res, self.gpu_id, self.index)
File "/home/user/anaconda3/envs/foundpose_gpu/lib/python3.9/site-packages/faiss/swigfaiss_avx512.py", line 12799, in index_cpu_to_gpu
return _swigfaiss_avx512.index_cpu_to_gpu(provider, device, index, options)
RuntimeError: Error in virtual void* faiss::gpu::StandardGpuResourcesImpl::allocMemory(const faiss::gpu::AllocRequest&) at /home/circleci/miniconda/conda-bld/faiss-pkg_1709244517602/work/faiss/gpu/StandardGpuResources.cpp:530: Error: 'err == cudaSuccess' failed: StandardGpuResources: alloc fail type TemporaryMemoryBuffer dev 0 space Device stream 0x59204770 size 1610612736 bytes (cudaMalloc error out of memory [2])

I saw in your bop submission results that you used a Tesla P100 16GB device, but our RTX 3090 24G showed (cudaMalloc error out of memory [2]), why is that?

nesi73 · 2024-10-11T09:00:08Z

same issue for me, were you able to solve it?

jerukan · 2024-10-24T22:12:25Z

Not sure if you all already solved it, but this issue is caused by self.res = faiss.StandardGpuResources() being allocated for every single KNN for each template, of which there are 100s, each of which reserves around a GB of memory. I have no idea how the developers ran this in the first place, but I fixed this issue in utils/knn_util.py on line 40 via the following change

if self.res is None and use_gpu is True:
    self.res = faiss.StandardGpuResources()
    self.res.setTempMemory(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory required for inference using GPU #4

Memory required for inference using GPU #4

W-QY commented Oct 10, 2024

nesi73 commented Oct 11, 2024 •

edited

Loading

jerukan commented Oct 24, 2024

Memory required for inference using GPU #4

Memory required for inference using GPU #4

Comments

W-QY commented Oct 10, 2024

nesi73 commented Oct 11, 2024 • edited Loading

jerukan commented Oct 24, 2024

nesi73 commented Oct 11, 2024 •

edited

Loading