Improve memory management in clustering_qr.kmeans_plusplus #775

RobertoDF · 2024-09-04T09:15:46Z

This modification avoids the creation or immediately deletes unnecessary tensors in clustering_qr.kmeans_plusplus. It helps with OOM errors (#746 ) happening at

Kilosort/kilosort/clustering_qr.py

Line 202 in b2f5ded

vexp = 2 * Xg @ Xc.T - (Xc**2).sum(1)

Xg can at sometimes be quite big (5GB in the case I get OOM), in both of these lines a copy of Xg was created unnecessarily on the GPU.

Kilosort/kilosort/clustering_qr.py

Lines 166 to 168 in b2f5ded

    
           #Xg = torch.from_numpy(Xd).to(dev)     
        
           vtot = (Xg**2).sum(1)

&

Kilosort/kilosort/clustering_qr.py

Line 202 in b2f5ded

vexp = 2 * Xg @ Xc.T - (Xc**2).sum(1)

The solution to line 202 does not impact speed. Solution to line 167 might impact speed but not in any noticeable fashion on my tests, for this reason I didn´t extend the reach of the clear_cache arg to the kmeans_plusplus func.

Tested on pytorch 2.1.2 and 2.4.1.

jacobpennington · 2024-09-04T17:00:13Z

@RobertoDF Are you able to share the data that you're seeing this problem with so that I can test this myself?

RobertoDF · 2024-09-04T18:49:54Z

Sure, compressing the files now.

RobertoDF · 2024-09-05T09:10:54Z

In the zip there is a jupyter notebook that shows the problem and the specific Xd tensor that causes the crash on my machine. I put the standard and modified versions of kmeans_plusplus. The old one should crash, if you run the new one afterwards, it should run without errors.
https://we.tl/t-40kiuNy3Cd

RobertoDF · 2024-09-05T20:07:15Z

Just noticed that in the notebook I didn´t include the change at line vtot = (Xg**2).sum(1)

jacobpennington · 2024-09-05T22:06:50Z

@RobertoDF Those are not the files I would need. I mean the full recording, either a .bin file or whatever format you converted from, along with the probe file you used.

…ing_qr.kmeans_plusplus

RobertoDF · 2024-09-07T21:49:41Z

This last commit seems to really solve the OOM problems.

Peyton-D · 2024-09-26T23:29:38Z

Hello, I tried to use your last commit, but I'm still getting a CUDA OOM error in the final clustering phase. How much dedicated GPU memory do you have? I have 8 GB, and Kilosort used on average 6-7 GB throughout sorting until crashing at the end.

RobertoDF · 2024-09-27T08:29:42Z

I have 12 GB. Without the modification I would get OOM often inside the kmeans_plus_plus func. Which line is problematic to you exactly? and what is the error message saying? also what is your recording duration?

Peyton-D · 2024-09-27T14:01:16Z

Thanks for the quick response. Yes, kmeans_plus_plus inside of clustering_qr seems to be the cause of each crash every time. My recording duration is 90 min. Here's the problematic line and the kilosort log if it helps:

File "C:\miniconda3\envs\kilosort\lib\site-packages\kilosort\clustering_qr.py", line 215, in kmeans_plusplus
mu[j] = Xg[ix].mean(0)

kilosort4_9_26_1700.log

RobertoDF · 2024-09-27T14:09:43Z

Mmm never had a crash at that line. If you use the normal version, not my fork, does it also crashes in the same line?

Peyton-D · 2024-09-27T15:26:55Z

Just ran another attempt with normal version. Here's the problem line:

File "C:\Users\ColginLab\miniconda3\envs\kilosort\lib\site-packages\kilosort\clustering_qr.py", line 167, in kmeans_plusplus
vtot = (Xg**2).sum(1)

kilosort4_normal_version.log

RobertoDF · 2024-09-27T15:32:04Z

Ok that was a problematic line also for me and indeed I would expect my solution to solve that one. But I never had a problem at the line you showed me before. Maybe it can be optimized further but I won't have time to check this in near future. If you have access to a 12 GB I would expect that to solve the problem.
If you are on windows you can try to use a debugger stopping at that line and inspect the GPU memory via task manager.

Peyton-D · 2024-09-27T15:39:22Z

Alright, I'll look into getting more GPU memory. Thanks for the help!

Improve memory management in clustering_qr.kmeans_plusplus

5169a39

RobertoDF added 5 commits September 6, 2024 09:41

Merge branch 'MouseLand:main' into Improved_memory_management_cluster…

73afacf

…ing_qr.kmeans_plusplus

clear_cache arg in kmeans_plusplus

73591f3

clear_cache arg in clustering_qr.run

c7711ca

Move tensor to CPU temporarily in kmeans_plusplus

0747ee8

Xd transfer to GPU moved within kmeans_plusplus, remove clear_cachhe arg

d4bff08

Refer to Xd not Xg ,Xg has been moved inside kmeans_plusplus

7680278

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve memory management in clustering_qr.kmeans_plusplus #775

Improve memory management in clustering_qr.kmeans_plusplus #775

RobertoDF commented Sep 4, 2024

jacobpennington commented Sep 4, 2024

RobertoDF commented Sep 4, 2024

RobertoDF commented Sep 5, 2024 •

edited

Loading

RobertoDF commented Sep 5, 2024

jacobpennington commented Sep 5, 2024

RobertoDF commented Sep 7, 2024

Peyton-D commented Sep 26, 2024

RobertoDF commented Sep 27, 2024 •

edited

Loading

Peyton-D commented Sep 27, 2024

RobertoDF commented Sep 27, 2024

Peyton-D commented Sep 27, 2024

RobertoDF commented Sep 27, 2024

Peyton-D commented Sep 27, 2024

Improve memory management in clustering_qr.kmeans_plusplus #775

Are you sure you want to change the base?

Improve memory management in clustering_qr.kmeans_plusplus #775

Conversation

RobertoDF commented Sep 4, 2024

jacobpennington commented Sep 4, 2024

RobertoDF commented Sep 4, 2024

RobertoDF commented Sep 5, 2024 • edited Loading

RobertoDF commented Sep 5, 2024

jacobpennington commented Sep 5, 2024

RobertoDF commented Sep 7, 2024

Peyton-D commented Sep 26, 2024

RobertoDF commented Sep 27, 2024 • edited Loading

Peyton-D commented Sep 27, 2024

RobertoDF commented Sep 27, 2024

Peyton-D commented Sep 27, 2024

RobertoDF commented Sep 27, 2024

Peyton-D commented Sep 27, 2024

RobertoDF commented Sep 5, 2024 •

edited

Loading

RobertoDF commented Sep 27, 2024 •

edited

Loading