-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Issue with cupy and numpy #326
Comments
Versions of packages that might be helpful
|
can you target which line cause this issue? looks like all you need is to use |
|
add should be able to fix. ipie/ipie/trial_wavefunction/particle_hole.py Lines 139 to 141 in ed75292
|
Added the below in
Error:
|
i see. this will cause another issue. ipie/ipie/walkers/uhf_walkers.py Lines 67 to 74 in ed75292
can you try changing xp here to numpy? |
|
you need to change xp.complex128 to numpy too
…On Mon, Dec 2, 2024 at 12:47 PM Zohim Chandani ***@***.***> wrote:
self.phia = numpy.array(
[initial_walker[:, : self.nup].copy() for iw in range(self.nwalkers)],
dtype=xp.complex128,
)
self.phib = numpy.array(
[initial_walker[:, self.nup :].copy() for iw in range(self.nwalkers)],
dtype=xp.complex128,
)
# Preparing MSD wf
# MSD prepared with 100 determinants
Traceback (most recent call last):
File "/home/tutorial_vqe/complete_workflow-cudaq.py", line 114, in <module>
afqmc_msd = AFQMC.build(
File "/usr/local/lib/python3.10/dist-packages/ipie/qmc/afqmc.py", line 379, in build
walkers.build(
File "/usr/local/lib/python3.10/dist-packages/ipie/walkers/uhf_walkers.py", line 243, in build
self.ovlp = trial.calc_greens_function(self)
File "/usr/local/lib/python3.10/dist-packages/ipie/trial_wavefunction/particle_hole.py", line 534, in calc_greens_function
return greens_function_multi_det_wicks_opt(walkers, self)
File "/usr/local/lib/python3.10/dist-packages/ipie/estimators/greens_function_multi_det.py", line 1182, in greens_function_multi_det_wicks_opt
ovlp = numpy.dot(walker_batch.phia[iw].T, trial.psi0a.conj())
File "<__array_function__ internals>", line 200, in dot
File "cupy/_core/core.pyx", line 1719, in cupy._core.core._ndarray_base.__array_function__
File "/usr/local/lib/python3.10/dist-packages/cupy/linalg/_product.py", line 67, in dot
return a.dot(b, out)
TypeError: Argument 'b' has incorrect type (expected cupy._core.core._ndarray_base, got numpy.ndarray)
—
Reply to this email directly, view it on GitHub
<#326 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIZLWMKXGJSNO6F54GPLICD2DSMMTAVCNFSM6AAAAABS3X5QZCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMJSGI3TGMRRGA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
|
maybe just move this config.usegpu to line 126 in your script. |
Added
|
FYI, I undid each of your suggestions above before trying the next one you mentioned. Not sure if you wanted them to be implemented together. |
ok. can you move this block after this line: |
|
I think this new one is related to #321 |
also, it seems this fix #326 (comment) |
|
Sorry, dont think I understand what you want me to change now? |
Some old version of cupy will not support batched qr decomposition. |
|
oh I just realized this is not using cupy to do qr. which means qr is using
scipy. See here:
https://github.com/JoonhoLee-Group/ipie/blob/ed752921f911bdfe98f0be776c76d2341bf6b120/ipie/utils/backend.py#L71
So, still need to have the use gpu config setup earlier as you did before.
But you can try using the second fix I mentioned, moving the block. ( this
possibly is a bug for particleholenonchunked class’ gpu code.)
BTW, why the particleholenonchunked class was used instead of particlehole ?
Zohim Chandani ***@***.***>于2024年12月2日 周一18:15写道:
… pip list yields:
cupy-cuda12x 13.3.0 which is already the latest version.
—
Reply to this email directly, view it on GitHub
<#326 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIZLWMNEDUG7RKBAZJ5OUTD2DTS2ZAVCNFSM6AAAAABS3X5QZCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMJTGE3TIMRVGY>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
I can try this tomorrow. Would it be possible for you to reproduce the error on your end? Might be easier to find a fix that way. Thanks for all the suggestions today. |
sure, I can probably try the day after tomorrow, if this is still not fixed with the above fix. |
Current changes are:
Please specify exactly what change if any is required here Thanks |
config.update_option("use_gpu", True) should be moved to the begining of the script |
Moving this to the top of the script
and this change:
error:
|
I believe now adding this can finally fix (based on your most recent fix). ipie/ipie/walkers/uhf_walkers.py Lines 67 to 74 in ed75292
self.phia = numpy.array( |
at the top:
|
Sorry I think I will not be able to set up and reproduce it by today. But I
think you can try to use ParticleHole instead of ParticleHoleNonchunked in
your script, like I mentioned before.
…On Tue, Dec 3, 2024 at 9:21 AM Zohim Chandani ***@***.***> wrote:
self.phia = numpy.array(
[initial_walker[:, : self.nup].copy() for iw in range(self.nwalkers)],
dtype=numpy.complex128,
)
self.phib = numpy.array(
[initial_walker[:, self.nup :].copy() for iw in range(self.nwalkers)],
dtype=numpy.complex128,
)
self.ovlp = trial.calc_greens_function(self)
if config.get_option("use_gpu"):
self.cast_to_cupy()
self.Ga = xp.asarray(self.Ga)
self.Gb = xp.asarray(self.Gb)
trial._rchola = xp.asarray(trial._rchola)
trial._rcholb = xp.asarray(trial._rcholb)
trial._rchola_act = xp.asarray(trial._rchola_act)
at the top:
from ipie.config import config
config.update_option("use_gpu", True)
# iteration 564: delta_max = 9.99470883e-06: time = 1.55687332e-03
# Orthogonalising Cholesky vectors.
# Time to orthogonalise: 0.161392
# Preparing MSD wf
# MSD prepared with 100 determinants
Traceback (most recent call last):
File "/home/tutorial_vqe/complete_workflow-cudaq.py", line 115, in <module>
afqmc_msd = AFQMC.build(
File "/usr/local/lib/python3.10/dist-packages/ipie/qmc/afqmc.py", line 379, in build
walkers.build(
File "/usr/local/lib/python3.10/dist-packages/ipie/walkers/uhf_walkers.py", line 236, in build
self.ovlp = trial.calc_greens_function(self)
File "/usr/local/lib/python3.10/dist-packages/ipie/trial_wavefunction/particle_hole.py", line 534, in calc_greens_function
return greens_function_multi_det_wicks_opt(walkers, self)
File "/usr/local/lib/python3.10/dist-packages/ipie/estimators/greens_function_multi_det.py", line 1203, in greens_function_multi_det_wicks_opt
dets_a_full, dets_b_full = compute_determinants_batched(
File "/usr/local/lib/python3.10/dist-packages/ipie/propagation/overlap.py", line 618, in compute_determinants_batched
dets_a, dets_b = get_dets_single_excitation_batched_opt(G0a, G0b, trial)
File "/usr/local/lib/python3.10/dist-packages/ipie/propagation/overlap.py", line 279, in get_dets_single_excitation_batched_opt
wk.get_dets_singles(
File "/usr/local/lib/python3.10/dist-packages/ipie/estimators/kernels/gpu/wicks_gpu.py", line 89, in get_dets_singles
get_dets_singles_kernel(
File "cupy/_core/raw.pyx", line 93, in cupy._core.raw.RawKernel.__call__
File "cupy/cuda/function.pyx", line 223, in cupy.cuda.function.Function.__call__
File "cupy/cuda/function.pyx", line 177, in cupy.cuda.function._launch
File "cupy/cuda/function.pyx", line 133, in cupy.cuda.function._pointer
TypeError: You are trying to pass a numpy.ndarray of shape (200, 72, 103) as a kernel parameter. Only numpy.ndarrays of size one can be passed by value. If you meant to pass a pointer to __global__ memory, you need to pass a cupy.ndarray instead.
—
Reply to this email directly, view it on GitHub
<#326 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIZLWMIEY7EQRF5LIIJUYQT2DW47PAVCNFSM6AAAAABS3X5QZCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMJUG4YDGNRZG4>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
I changed this to error:
It turns out that If I change the type of something, it breaks something else in the code. Would be best if you try reproduce so we can resolve this more efficiently. Let me know if you need anything else. Thanks |
@zohimchandani please check #327 |
This now works - thanks. Question: I am using a cluster of 8 H100s. Receiving an out of memory error - see below:
Is there a way to use ipie in a multi-gpu setting? Thanks |
multi gpu should be able to run. But I noticed that you employed 8 gpus but
only 1 mpi rank.
…On Sun, Dec 8, 2024 at 7:15 PM Zohim Chandani ***@***.***> wrote:
This now works - thanks.
Question:
I am using a cluster of 8 H100s.
Receiving an out of memory error - see below:
# - CUDA compute capability: 9.0
# - CUDA version: 12.06.0
# - GPU Type: 'NVIDIA H100 80GB HBM3'
# - GPU Mem: 79.097 GB
# - Number of GPUs: 8
# MPI communicator : <class 'mpi4py.MPI.Intracomm'>
# Available memory on the node is 2015.563 GB
# There are unused GPUs (1 MPI tasks but 8 GPUs). Check if this is really what you wanted.
# PhaselessGeneric: expected to allocate 0.0 GB
# PhaselessGeneric: using 25.2642822265625 GB out of 79.09661865234375 GB memory on GPU
# GenericRealChol: expected to allocate 0.13464972376823425 GB
# GenericRealChol: using 25.2642822265625 GB out of 79.09661865234375 GB memory on GPU
# UHFWalkersParticleHole: expected to allocate 0.0 GB
# UHFWalkersParticleHole: using 25.2642822265625 GB out of 79.09661865234375 GB memory on GPU
# Setting up estimator object.
# Writing estimator data to afqmc_data_10q.h5
# Finished settting up estimator object.
Block Weight WeightFactor HybridEnergy ENumer EDenom ETotal E1Body E2Body
0 1.0000000000000000e+04 1.0000000000000000e+04 0.0000000000000000e+00 -2.1237020500117082e+07 1.0000000000000000e+04 -2.1237020500117083e+03 -4.6822529147033019e+03 2.5585508646915937e+03
1 2.7314090682770265e+05 2.4687498571639704e+06 -1.1586580202448267e+03 -2.1237573243674748e+07 1.0000000000000000e+04 -2.1237573243674747e+03 -4.6822587783462113e+03 2.5585014539787362e+03
2 1.0000619061718366e+04 5.3334290771429962e+05 -1.1587034566746970e+03 -2.1238082175784133e+07 1.0000000000000000e+04 -2.1238082175784134e+03 -4.6822622111536966e+03 2.5584539935752832e+03
cupy.cuda.memory.OutOfMemoryError: Out of memory allocating 8,933,760,000 bytes (allocated so far: 74,319,149,056 bytes).
Is there a way to use ipie in a multi-gpu setting?
Thanks
—
Reply to this email directly, view it on GitHub
<#326 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIZLWMLHI6IK52CU5VQI3ST2ETOLTAVCNFSM6AAAAABS3X5QZCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMRWGUZTEMRVGI>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
I am testing this workflow.
The error message is printed out 8 times, it seems like the script is run 8 times rather than the memory of 8GPUs being pooled? |
I chatted with Yifei, unfortunately so far, multi gpu support of msd-AFQMC
is not enabled within the main branch yet. We expect some improvements can
be merged soon.
…On Mon, Dec 9, 2024 at 8:37 PM Zohim Chandani ***@***.***> wrote:
I am testing this
<https://github.com/davidev886/tutorial_vqe/blob/main/complete_workflow.py#L41C1-L54C19>
workflow.
num_walkers = 9500 works with mpirun -np 1 --allow-run-as-root python3
complete_workflow.py --cudaq-full-stack-trace
num_walkers = 9600 fails with mpirun -np 8 --allow-run-as-root python3
complete_workflow.py --cudaq-full-stack-trace
# iteration 562: delta_max = 1.01314564e-05: time = 4.09116745e-02
# iteration 563: delta_max = 1.00020910e-05: time = 3.03366184e-02
# iteration 564: delta_max = 9.99470883e-06: time = 4.92453575e-02
# Orthogonalising Cholesky vectors.
# Time to orthogonalise: 1.175176
# Time to orthogonalise: 0.314975
# Preparing MSD wf
# MSD prepared with 100 determinants
# Preparing MSD wf
# MSD prepared with 100 determinants
# Preparing MSD wf
# MSD prepared with 100 determinants
# Preparing MSD wf
# MSD prepared with 100 determinants
Traceback (most recent call last):
File "/home/tutorial_vqe/complete_workflow.py", line 122, in <module>
afqmc_msd = AFQMC.build(
File "/usr/local/lib/python3.10/dist-packages/ipie/qmc/afqmc.py", line 379, in build
walkers.build(
File "/usr/local/lib/python3.10/dist-packages/ipie/walkers/uhf_walkers.py", line 236, in build
self.cast_to_cupy()
File "/usr/local/lib/python3.10/dist-packages/ipie/walkers/uhf_walkers.py", line 106, in cast_to_cupy
cast_to_device(self, verbose)
File "/usr/local/lib/python3.10/dist-packages/ipie/utils/backend.py", line 100, in cast_to_device
self.__dict__[k] = arraylib.array(v)
File "/usr/local/lib/python3.10/dist-packages/cupy/_creation/from_data.py", line 53, in array
return _core.array(obj, dtype, copy, order, subok, ndmin, blocking)
File "cupy/_core/core.pyx", line 2408, in cupy._core.core.array
File "cupy/_core/core.pyx", line 2435, in cupy._core.core.array
File "cupy/_core/core.pyx", line 2578, in cupy._core.core._array_default
File "cupy/_core/core.pyx", line 137, in cupy._core.core.ndarray.__new__
File "cupy/_core/core.pyx", line 225, in cupy._core.core._ndarray_base._init
File "cupy/cuda/memory.pyx", line 738, in cupy.cuda.memory.alloc
File "cupy/cuda/memory.pyx", line 1424, in cupy.cuda.memory.MemoryPool.malloc
File "cupy/cuda/memory.pyx", line 1445, in cupy.cuda.memory.MemoryPool.malloc
File "cupy/cuda/memory.pyx", line 1116, in cupy.cuda.memory.SingleDeviceMemoryPool.malloc
File "cupy/cuda/memory.pyx", line 1137, in cupy.cuda.memory.SingleDeviceMemoryPool._malloc
File "cupy/cuda/memory.pyx", line 1382, in cupy.cuda.memory.SingleDeviceMemoryPool._try_malloc
File "cupy/cuda/memory.pyx", line 1385, in cupy.cuda.memory.SingleDeviceMemoryPool._try_malloc
cupy.cuda.memory.OutOfMemoryError: Out of memory allocating 1,629,542,400 bytes (allocated so far: 9,414,626,816 bytes).
The error message is printed out 8 times, it seems like the script is run
8 times rather than the memory of 8GPUs being pooled?
—
Reply to this email directly, view it on GitHub
<#326 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIZLWMLNCW5JAU6YYNU7MMD2EZAT3AVCNFSM6AAAAABS3X5QZCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMZQGAYDEOJTHA>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
Can I expect this to be merged in weeks? or months? just so I can plan for workloads that I intend to run. |
Describe the bug
TypeError: Argument 'b' has incorrect type (expected cupy._core.core._ndarray_base, got numpy.ndarray)
To Reproduce
Pull a CUDA-Q image:
docker pull nvcr.io/nvidia/nightly/cuda-quantum:cu12-latest
Turn the image into a container:
docker run -it --net=host --user root --gpus all -d --name cudaq_zohim_test 05346a75eaf7
The machine I am running on has CUDA Version 12.4 installed
Installing cuda-toolkit 12.4 based on CUDA version:
sudo -S apt-get install -y cuda-toolkit-12.4
Clone a repo where I have a job to run:
git clone https://github.com/davidev886/tutorial_vqe
Install some pip packages including ipie specified in a file in the repo:
pip install -r requirements.txt
Run
unset CUDA_HOME
andunset CUDA_PATH
to enable the job to look in the right location for the CUDA librariesExecute the AFQMC workflow - this file does not run VQE but uses a previously saved statevector from a VQE run
python3 complete_workflow-cudaq.py
Error message:
Other information:
The text was updated successfully, but these errors were encountered: