You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to know how much memory the device needs to have when using GPU to perform inference "python scripts/infer.py --opts-path configs/infer/lmo.json"? I got the error "cudaMalloc error out of memory [2]" after executing the code:
I1010 14:02:55.50 3011677 infer.py:232] Building KNN index for template 5...
I1010 14:02:55.275 3011677 infer.py:232] Building KNN index for template 6...
I1010 14:02:55.488 3011677 infer.py:232] Building KNN index for template 7...
I1010 14:02:55.713 3011677 infer.py:232] Building KNN index for template 8...
I1010 14:02:55.938 3011677 infer.py:232] Building KNN index for template 9...
I1010 14:02:56.154 3011677 infer.py:232] Building KNN index for template 10...
Traceback (most recent call last):
File "/home/data1/user/foundpose/scripts/infer.py", line 832, in
main()
File "/home/data1/user/foundpose/scripts/infer.py", line 828, in main
infer(opts)
File "/home/data1/user/foundpose/scripts/infer.py", line 240, in infer
template_knn_index.fit(template_feats)
File "/home/data1/user/foundpose/utils/knn_util.py", line 55, in fit
self.index = faiss.index_cpu_to_gpu(self.res, self.gpu_id, self.index)
File "/home/user/anaconda3/envs/foundpose_gpu/lib/python3.9/site-packages/faiss/swigfaiss_avx512.py", line 12799, in index_cpu_to_gpu
return _swigfaiss_avx512.index_cpu_to_gpu(provider, device, index, options)
RuntimeError: Error in virtual void* faiss::gpu::StandardGpuResourcesImpl::allocMemory(const faiss::gpu::AllocRequest&) at /home/circleci/miniconda/conda-bld/faiss-pkg_1709244517602/work/faiss/gpu/StandardGpuResources.cpp:530: Error: 'err == cudaSuccess' failed: StandardGpuResources: alloc fail type TemporaryMemoryBuffer dev 0 space Device stream 0x59204770 size 1610612736 bytes (cudaMalloc error out of memory [2])
I saw in your bop submission results that you used a Tesla P100 16GB device, but our RTX 3090 24G showed (cudaMalloc error out of memory [2]), why is that?
The text was updated successfully, but these errors were encountered:
Not sure if you all already solved it, but this issue is caused by self.res = faiss.StandardGpuResources() being allocated for every single KNN for each template, of which there are 100s, each of which reserves around a GB of memory. I have no idea how the developers ran this in the first place, but I fixed this issue in utils/knn_util.py on line 40 via the following change
Thank you for your outstanding work!
I would like to know how much memory the device needs to have when using GPU to perform inference "python scripts/infer.py --opts-path configs/infer/lmo.json"? I got the error "cudaMalloc error out of memory [2]" after executing the code:
I saw in your bop submission results that you used a Tesla P100 16GB device, but our RTX 3090 24G showed (cudaMalloc error out of memory [2]), why is that?
The text was updated successfully, but these errors were encountered: