Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Memory resource predictor for primitives #156

Open
PabloAndresCQ opened this issue Aug 29, 2024 · 6 comments
Open

[Feature] Memory resource predictor for primitives #156

PabloAndresCQ opened this issue Aug 29, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@PabloAndresCQ
Copy link

Hi, I'd like to request a feature.

Context: In the project I develop (pytket-cutensornet), we make extensive use of cuTensorNet's primitive operations on tensors: tensor.decompose (both QRMethod and SVDMethod) and contract (applied often on only two/three tensors). We have encountered multiple cases where we reach OutOfMemory errors, and we would like to improve the user experience around these. To do so, we need to be able to detect if an OOM error would happen if we were to apply one of these primitives. With this, we sometimes may be able to prevent the OOM error, for instance, truncating tensors more aggresively before applying the primitive. Conceptually, this must be possible, since if I set CUTENSORNET_LOG_LEVEL=6, I can see how much workspace memory each primitive requests from the GPU, and I can keep track of how much memory I am using to store my tensor network on the GPU.

Feature request: A method for the user to obtain an upper bound of the GPU memory used by primitives contract, tensor.decompose (both QRMethod and SVDMethod) and experimental.contract_decompose on the inputs given by the user. Such method should not run the primitive itself, only inform of the memory resources it would require. Alternatively, I'd be happy with an optional memory_budget: int parameter passed to these primitives so that, if it requires more than the memory_budget, it does not apply the operation, and let's the user know it was skipped (but does not error out, or if it does, it throws an exception that can be handled at the Python level to recover from it).

If this sounds interesting, I'd be happy to provide more details of my use case and refine the feature request.

@daniellowell
Copy link
Collaborator

Thanks for the clear description of the feature request. :) I will discuss it with the team.

@daniellowell daniellowell self-assigned this Aug 29, 2024
@daniellowell daniellowell added the enhancement New feature or request label Aug 29, 2024
@yangcal
Copy link
Collaborator

yangcal commented Aug 29, 2024

NetworkOptions.memory_limit is meant to act as the budge guide but it appears that there may be a bug that we didn't throw an MemoryError in decompose/contract_decompose when required memory exceeds the budget. Would it be sufficient if we throw this MemoryError with message on the actual required workspace size? Then you may be able to resolve it with try except handling?

@PabloAndresCQ
Copy link
Author

PabloAndresCQ commented Aug 30, 2024

Ah, I had not seen NetworkOptions.memory_limit, thanks for pointing that out!

Would it be sufficient if we throw this MemoryError with message on the actual required workspace size?

As long as it is guaranteed that the tensors were not modified if MemoryError is thrown, then this would indeed work for me. Receiving the actual required workspace size in the error message would be very useful.

What is the current behaviour for decompose when a memory_limit is set? I'm wondering if there is a workaround that I could play with while I wait for the bugfix (and addition of required space in the message).

@yangcal
Copy link
Collaborator

yangcal commented Aug 30, 2024

The current decompose doesn't actually check options.memory_limit (this is a bug) on our side. Ideally we should check the workspace required like what we do in contract, see here.

For decompose, one just needs to insert the memory check here

For contract_decompose, it would be here

@PabloAndresCQ
Copy link
Author

Thanks! Is there an expected date for a release including the bugfix and adding the extra info in the MemoryError message?

@yangcal
Copy link
Collaborator

yangcal commented Sep 6, 2024

Our next release is planned out around end of Oct or early Nov. Please stay tuned!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants