Create a new method to return the final state vector array instead of wrapping it #623

NoureldinYosri · 2023-09-21T20:17:39Z

This is to avoid the numpy limit on the number of dimensions quantumlib/Cirq#6031

The 1D representation should only be used when the number of qubits is greater than the numpy limit on the number of dimensions (currently set to 32) numpy/numpy#5744.

_, state_vector, _ = s.simulate_into_1d_array(c)

fixes quantumlib/Cirq#6031

qsimcirq/qsim_simulator.py

95-martin-orion · 2023-09-22T19:56:41Z

qsimcirq/qsim_simulator.py

    def simulate_sweep_iter(
        self,
        program: cirq.Circuit,
        params: cirq.Sweepable,
        qubit_order: cirq.QubitOrderOrList = cirq.QubitOrder.DEFAULT,
        initial_state: Optional[Union[int, np.ndarray]] = None,
+        as_1d_state_vector: bool = False,


This violates the API defined by cirq.SimulatesFinalState.simulate_sweep_iter. If there are plans to modify that function as well, please link the relevant Cirq PR. (The required cirq version for qsim will also need to be updated if this is the case.)

it doesn't violate the API. but it changes the internal data representation just for this if and only if as_1d_state_vector = True which by default is False so nothing changes unless the caller explicitly wants the 1D representation. This is done only for the qsim simulator.

The 1D representation is used for only one reason and that is to report the result, because the normal representation is a tensor that has number of dimensions equal to the num_qubits which breaks when the number of qubits is greater than the limit on numpy array dimensions (see issue in docstring & PR description)

QSimSimulator inherits from cirq.SimulatesFinalState, whose simulate_sweep_iter method does not have this new argument. Even though the implementation here can accept any valid input to the function of the parent class, it's still a violation of the API.

I changed the approach to adding a new method rather than extending the existing API.

NoureldinYosri · 2023-09-22T21:25:36Z

@95-martin-orion This PR is just a workaround to solve quantumlib/Cirq#6031 until numpy starts to support more than 32 dimensions numpy/numpy#5744

95-martin-orion

Some docstring requests, otherwise this LGTM. A new qsimcirq version is necessary to make this generally available - would you like me to cut a new release?

qsimcirq/qsim_simulator.py

NoureldinYosri · 2023-09-25T20:49:20Z

@95-martin-orion

Some docstring requests, otherwise this LGTM. A new qsimcirq version is necessary to make this generally available - would you like me to cut a new release?

yes, please 😄

95-martin-orion · 2023-10-03T14:43:57Z

Thank you for the myriad fixes, @NoureldinYosri !

Logs for the Kokoro error can be found here. I unfortunately don't have much context on this, though I do know that the Kokoro tests are not affected by the bazeltest.yml file.

NoureldinYosri · 2023-10-03T21:27:31Z

@95-martin-orion from the logs

WARNING: Download from [https://storage.googleapis.com/mirror.tensorflow.org/github.com/tensorflow/runtime/archive/4ce3e4da2e21ae4dfcee9366415e55f408c884ec.tar.gz](https://www.google.com/url?q=https://storage.googleapis.com/mirror.tensorflow.org/github.com/tensorflow/runtime/archive/4ce3e4da2e21ae4dfcee9366415e55f408c884ec.tar.gz&sa=D) failed: class java.io.FileNotFoundException GET returned 404 Not Found

It tries to download an old version of the TF runtime that no longer exists https://storage.googleapis.com/mirror.tensorflow.org/github.com/tensorflow/runtime/archive/4ce3e4da2e21ae4dfcee9366415e55f408c884ec.tar.gz

the versions that are still hosted on storage.googleapis.com/mirror.tensorflow.org are in http://mirror.tensorflow.org/. Where does it decide to go for that specific version of the runtime?

Looking deeper in the logs it looks like it pypasses that error and then gets a cuda11 environment but then decides to look at cuda12

ERROR: An error occurred during the fetch of repository 'ubuntu20.04-gcc9_manylinux2014-cuda11.2-cudnn8.1-tensorrt7.2_config_cuda':
...
No library found under: /usr/local/cuda-12.2/targets/x86_64-linux/lib/libcupti.so.12.2

this looks to be the real problem

95-martin-orion · 2023-10-04T14:41:46Z

{...} Where does it decide to go for that specific version of the runtime?

The files for this are stored in Google-internal repositories - I'll email you the links.

rht · 2023-10-28T05:33:10Z

@NoureldinYosri thank you once again for this feature. May I know the timeline for the next release for qsim?

95-martin-orion · 2023-10-30T17:24:59Z

@rht qsim releases are on an "as-needed" basis, which I think this qualifies for. I've opened #631 to cut the release.

95-martin-orion · 2023-10-30T20:19:01Z

@rht A new release has been cut and should be visible on pypi in the next 10-20 minutes.

rht · 2023-10-30T23:47:47Z

I see, thank you, just in time to do huge statevector for Halloween!

rht · 2024-02-08T00:04:47Z

@NoureldinYosri there was a delay in using this feature in our production instances. We were waiting for the cuQuantum Appliance to have qsimcirq>=0.17.x (NVIDIA/cuQuantum#98), but it hasn't happened.

But I was able to test this PR by straight up patching on qsimcirq 0.15.0 on cuQuantum Appliance 23.10. I am running a 2xA100 instance, with the following code

import time

from memory_profiler import memory_usage
import cirq
import qsimcirq

def f():
    num_qubits = 33
    qc_cirq = cirq.Circuit()
    qubits = cirq.LineQubit.range(num_qubits)
    for i in range(num_qubits):
        qc_cirq.append(cirq.H(qubits[i]))
    sim = qsimcirq.QSimSimulator()
    tic = time.time()
    # sim = cirq.Simulator()
    print("?", sim.simulate_into_1d_array)
    sim.simulate_into_1d_array(qc_cirq)
    print("Elapsed", time.time() - tic)
# print("Max memory", max(memory_usage(f)))
f()

but still got this OOM error

? <bound method QSimSimulator.simulate_into_1d_array of <qsimcirq.qsim_simulator.QSimSimulator object at 0x7f9e9ac62770>>
CUDA error: out of memory vector_mgpu.h 116

Here is the benchmark result for 32 qubits (haven't measured GPU memory usage from nvidia-smi yet)

Elapsed 14.033143758773804
Max memory 34182.4296875

Here is the manual patch I applied

535c535
<     def simulate_sweep_iter(
---
>     def _simulate_impl(
541,570c541
<     ) -> Iterator[cirq.StateVectorTrialResult]:
<         """Simulates the supplied Circuit.
< 
<         This method returns a result which allows access to the entire
<         wave function. In contrast to simulate, this allows for sweeping
<         over different parameter values.
< 
<         Avoid using this method with `use_gpu=True` in the simulator options;
<         when used with GPU this method must copy state from device to host memory
<         multiple times, which can be very slow. This issue is not present in
<         `simulate_expectation_values_sweep`.
< 
<         Args:
<             program: The circuit to simulate.
<             params: Parameters to run with the program.
<             qubit_order: Determines the canonical ordering of the qubits. This is
<               often used in specifying the initial state, i.e. the ordering of the
<               computational basis states.
<             initial_state: The initial state for the simulation. This can either
<               be an integer representing a pure state (e.g. 11010) or a numpy
<               array containing the full state vector. If none is provided, this
<               is assumed to be the all-zeros state.
< 
<         Returns:
<             List of SimulationTrialResults for this run, one for each
<             possible parameter resolver.
< 
<         Raises:
<             TypeError: if an invalid initial_state is provided.
<         """
---
>     ) -> Iterator[Tuple[cirq.ParamResolver, np.ndarray, Sequence[int]]]:
625a597,649
>             yield prs, qsim_state.view(np.complex64), cirq_order
> 
>     def simulate_into_1d_array(
>         self,
>         program: cirq.AbstractCircuit,
>         param_resolver: cirq.ParamResolverOrSimilarType = None,
>         qubit_order: cirq.QubitOrderOrList = cirq.ops.QubitOrder.DEFAULT,
>         initial_state: Any = None,
>     ) -> Tuple[cirq.ParamResolver, np.ndarray, Sequence[int]]:
>         """Same as simulate() but returns raw simulation result without wrapping it.
>             The returned result is not wrapped in a StateVectorTrialResult but can be used
>             to create a StateVectorTrialResult.
>         Returns:
>             Tuple of (param resolver, final state, qubit order)
>         """
>         params = cirq.study.ParamResolver(param_resolver)
>         return next(self._simulate_impl(program, params, qubit_order, initial_state))
> 
>     def simulate_sweep_iter(
>         self,
>         program: cirq.Circuit,
>         params: cirq.Sweepable,
>         qubit_order: cirq.QubitOrderOrList = cirq.QubitOrder.DEFAULT,
>         initial_state: Optional[Union[int, np.ndarray]] = None,
>     ) -> Iterator[cirq.StateVectorTrialResult]:
>         """Simulates the supplied Circuit.
>         This method returns a result which allows access to the entire
>         wave function. In contrast to simulate, this allows for sweeping
>         over different parameter values.
>         Avoid using this method with `use_gpu=True` in the simulator options;
>         when used with GPU this method must copy state from device to host memory
>         multiple times, which can be very slow. This issue is not present in
>         `simulate_expectation_values_sweep`.
>         Args:
>             program: The circuit to simulate.
>             params: Parameters to run with the program.
>             qubit_order: Determines the canonical ordering of the qubits. This is
>               often used in specifying the initial state, i.e. the ordering of the
>               computational basis states.
>             initial_state: The initial state for the simulation. This can either
>               be an integer representing a pure state (e.g. 11010) or a numpy
>               array containing the full state vector. If none is provided, this
>               is assumed to be the all-zeros state.
>         Returns:
>             Iterator over SimulationTrialResults for this run, one for each
>             possible parameter resolver.
>         Raises:
>             TypeError: if an invalid initial_state is provided.
>         """
> 
>         for prs, state_vector, cirq_order in self._simulate_impl(
>             program, params, qubit_order, initial_state
>         ):
627c651
<                 initial_state=qsim_state.view(np.complex64), qubits=cirq_order
---
>                 initial_state=np.complex64, qubits=cirq_order

rht · 2024-02-08T00:08:20Z

Something is still consuming more GPU memory much more than in the past. I used to be able to do 33 qubits on a 2xA100 instance.

$ nvidia-smi
Thu Feb  8 00:07:04 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-SXM...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P0    63W / 400W |      0MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-SXM...  Off  | 00000000:00:05.0 Off |                    0 |
| N/A   36C    P0    61W / 400W |      0MiB / 40960MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

NoureldinYosri · 2024-02-08T00:44:34Z

For 32 qubits we have a state vector of $2^{32}$ complex entries each of each is 2 float32 numbers or 8 bytes so that is $2^{35}$ bytes so we should expect usage of at least $2^{35}$ bytes or 32GB . the value in #623 (comment) is 34182.4296875 MB or $~34.1$ GB. so we are only using $2.1$ GB more memory than the minimum necessary which I suppose is consumed by numpy overhead, other variables and maybe auxilary variables that will eventually be cleaned up by the garbage collector.

Are sure you could do 33 qubits on this machine?. The same calculation gives $2^{36}$ bytes or 64GB for 33 qubits. per https://www.aime.info/en/shop/product/aime-gpu-cloud-v242xa100/?pid=V28-2XA100-D1 2xA100 has only 40GB of ram per GPU.

rht · 2024-02-08T01:21:45Z

Are sure you could do 33 qubits on this machine?

Yes, we are able to do so on qsimcirq==0.12.1 via the cuQuantum Appliance, which has a multi-GPU backend. Hence, 2x40 GB is more than enough for 64 GB requirement of 33 qubits.

I am in the process of measuring the max GPU memory consumption by polling nvidia-smi in the background while the simulation is running, but this will take a while since I have terminated the instance and will have to wait until there is an open slot for the 2xA100 instance.

rht · 2024-02-08T12:11:37Z

Update: all is good! I am able to run 33 qubits on the 2xA100 instance. I confirm this PR works.

The bug in the code in #623 (comment) was that I forgot to specify

    options = qsimcirq.QSimOptions(gpu_mode=2)
    sim = qsimcirq.QSimSimulator(options)

My measurements (I'm not sure why the GPU memory is that low, but anyway, it works):

CPU only:
num_qubits 32
Elapsed 114.16660451889038
Peak GPU memory usage: 3 MiB
Max CPU memory 33086.91015625

GPU:
num_qubits 31
Elapsed 14.6939697265625
Peak GPU memory usage: 425 MiB
Max CPU memory 16830.81640625

num_qubits 32
Elapsed 28.458886861801147
Peak GPU memory usage: 425 MiB
Max CPU memory 33174.0078125

num_qubits 33
Elapsed 17.026336431503296
Peak GPU memory usage: 853 MiB
Max CPU memory 67345.63671875

The GPU memory is measured by reading the output of nvidia-smi --query-gpu=memory.used --format=csv.

rht · 2024-02-08T22:45:27Z

(I'm not sure why the GPU memory is that low, but anyway, it works)

My guess is that the time spent on the GPU is somewhat lower than the interval the nvidia-smi measures the VRAM (0.01 s).

allow representing simulation results as 1D array

6d12636

NoureldinYosri mentioned this pull request Sep 21, 2023

cirq.sample_state_vector fails when the number of qubits > 32 quantumlib/Cirq#6031

Closed

NoureldinYosri added 11 commits September 22, 2023 11:45

add debug line

6705a06

Debug line

736d45b

Debug line

8c84f39

Debug line

6337d59

Docker Debug line

9b9e426

Docker Debug line

2c7c06b

Docker Debug line

585a996

Docker Debug line

16b5ab6

Docker Debug line

575ba8d

Docker Debug line

7cdda0c

Docker Debug line

5794472

NoureldinYosri marked this pull request as ready for review September 22, 2023 19:35

95-martin-orion requested changes Sep 22, 2023

View reviewed changes

Docker Debug line

d16dfa0

NoureldinYosri added 2 commits September 22, 2023 14:42

revert changes to docker file

a1e5a22

change into adding a new functionality

0009bc4

NoureldinYosri changed the title ~~allow representing simulation results as 1D array~~ Create a new method to return the final state vector array instead of wrapping it Sep 22, 2023

NoureldinYosri added 2 commits September 25, 2023 12:01

update return type

1277509

updae

8627951

95-martin-orion approved these changes Sep 25, 2023

View reviewed changes

qsimcirq/qsim_simulator.py Outdated Show resolved Hide resolved

95-martin-orion reviewed Sep 25, 2023

View reviewed changes

qsimcirq/qsim_simulator.py Outdated Show resolved Hide resolved

docs

6c17d8d

95-martin-orion added the kokoro:run Trigger Kokoro builds for this PR. label Sep 25, 2023

qsim-qsimh-bot removed the kokoro:run Trigger Kokoro builds for this PR. label Sep 25, 2023

NoureldinYosri added 2 commits October 2, 2023 13:08

try to fix CI

7de5803

try to fix CI

7bad066

qsim-qsimh-bot removed the kokoro:run Trigger Kokoro builds for this PR. label Oct 3, 2023

NoureldinYosri added 2 commits October 3, 2023 13:47

upgrade bazel version

e8df883

restore bazel version

cb32edb

95-martin-orion added the kokoro:run Trigger Kokoro builds for this PR. label Oct 4, 2023

qsim-qsimh-bot removed the kokoro:run Trigger Kokoro builds for this PR. label Oct 4, 2023

Merge branch 'master' into np32

6b159a9

NoureldinYosri added the kokoro:run Trigger Kokoro builds for this PR. label Oct 18, 2023

rerun CI

2da0548

qsim-qsimh-bot removed the kokoro:run Trigger Kokoro builds for this PR. label Oct 18, 2023

NoureldinYosri added 2 commits October 18, 2023 16:17

replace upgrade with satisfy

343895c

upgrade libc to latests

a1025aa

NoureldinYosri added the kokoro:run Trigger Kokoro builds for this PR. label Oct 18, 2023

NoureldinYosri enabled auto-merge October 18, 2023 16:34

NoureldinYosri merged commit 90d2707 into quantumlib:master Oct 18, 2023
16 checks passed

NoureldinYosri deleted the np32 branch October 18, 2023 16:45

95-martin-orion mentioned this pull request Oct 30, 2023

Update version to 0.17.0 #631

Merged

rht mentioned this pull request Oct 31, 2023

Request for releasing a new version of cuQuantum Appliance NVIDIA/cuQuantum#98

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a new method to return the final state vector array instead of wrapping it #623

Create a new method to return the final state vector array instead of wrapping it #623

NoureldinYosri commented Sep 21, 2023 •

edited

Loading

95-martin-orion Sep 22, 2023

NoureldinYosri Sep 22, 2023

95-martin-orion Sep 22, 2023

NoureldinYosri Sep 22, 2023

NoureldinYosri commented Sep 22, 2023

95-martin-orion left a comment

NoureldinYosri commented Sep 25, 2023

95-martin-orion commented Oct 3, 2023

NoureldinYosri commented Oct 3, 2023 •

edited

Loading

95-martin-orion commented Oct 4, 2023

rht commented Oct 28, 2023

95-martin-orion commented Oct 30, 2023

95-martin-orion commented Oct 30, 2023

rht commented Oct 30, 2023

rht commented Feb 8, 2024 •

edited

Loading

rht commented Feb 8, 2024

NoureldinYosri commented Feb 8, 2024 •

edited

Loading

rht commented Feb 8, 2024

rht commented Feb 8, 2024

rht commented Feb 8, 2024

Create a new method to return the final state vector array instead of wrapping it #623

Create a new method to return the final state vector array instead of wrapping it #623

Conversation

NoureldinYosri commented Sep 21, 2023 • edited Loading

95-martin-orion Sep 22, 2023

Choose a reason for hiding this comment

NoureldinYosri Sep 22, 2023

Choose a reason for hiding this comment

95-martin-orion Sep 22, 2023

Choose a reason for hiding this comment

NoureldinYosri Sep 22, 2023

Choose a reason for hiding this comment

NoureldinYosri commented Sep 22, 2023

95-martin-orion left a comment

Choose a reason for hiding this comment

NoureldinYosri commented Sep 25, 2023

95-martin-orion commented Oct 3, 2023

NoureldinYosri commented Oct 3, 2023 • edited Loading

95-martin-orion commented Oct 4, 2023

rht commented Oct 28, 2023

95-martin-orion commented Oct 30, 2023

95-martin-orion commented Oct 30, 2023

rht commented Oct 30, 2023

rht commented Feb 8, 2024 • edited Loading

rht commented Feb 8, 2024

NoureldinYosri commented Feb 8, 2024 • edited Loading

rht commented Feb 8, 2024

rht commented Feb 8, 2024

rht commented Feb 8, 2024

NoureldinYosri commented Sep 21, 2023 •

edited

Loading

NoureldinYosri commented Oct 3, 2023 •

edited

Loading

rht commented Feb 8, 2024 •

edited

Loading

NoureldinYosri commented Feb 8, 2024 •

edited

Loading