-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] How does cuTensorNet behave when CONFIG_NUM_HYPER_SAMPLES
uses its default value (SamplerAttribute)?
#153
Comments
Hi, Second, let me explain some basic of the path optimizer. As a summary:
Looking forward to hearing from you soon, |
Thanks for the details! I am attaching the full log of the case where no value for My remaining questions are:
|
your problem seems to be very large, I can see it requires workspace ranging in the Exa-bytes.
Again the workspace decreasing is unrelated to hyper_samples. Within one sample, if workspace needed is larger than the available memory, then the pathfinder code will automatically try to slice the network to decrease workspace and thus you might see a monotonically decreasing workspace. Note that, when a new hyper sample start, everything is restarted.
increasing CONFIG_NUM_HYPER_SAMPLES will let the optimizer run longer
if the contraction cannot be executed using cuTENSOR (there is many reasons this can happen, for example due to a tensor with large number of modes > 64 ), then the workspace returned is 0 and the optimizer code will iterate and slice trying to decrease it. The easy way to check further is to have the network pattern printed using the |
Thanks, that was helpful. I was not aware that cuTensorNet would do slicing even when using a single GPU (I thought this was only used when parallelisation was enabled). Just to confirm, you are referring to this notion of slicing, right? And just to confirm, from your reply I am inferring that the default value of Indeed, we know that our problem is very large, we were limit testing. Once we saw the logs it was clear to us that these circuits were too large to be simulated with this approach, but we wanted to properly understand what the logs were displaying. |
yes, if contraction doesn't fit one GPU, then cuTensorNet will slice it to make it fit in 1 GPU. Similarly for multi node multi GPU, the slicing is the techniques used to distributed workload as well as to be sure each workload fit into the GPU. |
Hi! I've been doing some experiments with some rather large circuits, trying to see how far we can push contraction-path optimisation. We are using the
sampler_sample
API, essentially reproducing this example. We are keeping track of the memory required by each contraction path by setting the environment valueCUTENSORNET_LOG_LEVEL=6
and having a look at the logs (particularly, the lines withworksizeNeeded
).At first, we tried setting no value to
CONFIG_NUM_HYPER_SAMPLES
and we saw thatworksizeNeeded
monotonically decreases until the optimisation decides to stop. We wanted to provide more time for the optimiser to try and find better contraction paths, so we setCONFIG_NUM_HYPER_SAMPLES=100
, but then theworksizeNeeded
reported no longer decreased monotonically, but fluctuated across the 100 samples. In the end, theCONFIG_NUM_HYPER_SAMPLES=100
run took way longer, but it did find a worksizeNeeded somewhat lower than the default (a bit smaller than a half).I'm attaching the two logs, showing only lineas with "worksizeNeeded" via
grep "worksizeNeeded" log.txt
. The_100
log corresponds to that number of samples, "_0" is for the default one. We're talking about petabytes of worksize needed here -- as I said, we are limit testing.worksizeNeeded_0.log
worksizeNeeded_100.log
I would like to know a couple of things:
CONFIG_NUM_HYPER_SAMPLES
is left to its default value.CONFIG_NUM_HYPER_SAMPLES
to default (assuming it's actually different)?worksizeNeeded=0
lines in the log? Are these samples that somehow failed and I should read that 0 as NaN?Cheers!
EDIT: I forgot to mention, we were using cuQuantum 24.03 here.
The text was updated successfully, but these errors were encountered: