Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot run CUDA-Q program on CPU #2497

Open
4 tasks done
clownchrys opened this issue Jan 8, 2025 · 6 comments
Open
4 tasks done

Cannot run CUDA-Q program on CPU #2497

clownchrys opened this issue Jan 8, 2025 · 6 comments

Comments

@clownchrys
Copy link

Required prerequisites

  • Consult the security policy. If reporting a security vulnerability, do not report the bug using this form. Use the process described in the policy to report the issue.
  • Make sure you've read the documentation. Your issue may be addressed there.
  • Search the issue tracker to verify that this hasn't already been reported. +1 or comment there if it has.
  • If possible, make a PR with a failing test to give us a starting point to work on!

Describe the bug

libc++abi: terminating due to uncaught exception of type std::runtime_error
Aborted (core dumped)

It occurs an error above, when I finished build on CPU and run the binary file...
However, when I build with MPI options like --target qpp-cpu, It runs successfully!

Please need your help...;(
Is it a bug, or how can I fix it?

Steps to reproduce the bug

program.cpp:

#include <cudaq.h>

__qpu__ void kernel(int qubit_count) {
  cudaq::qvector qubits(qubit_count); // error!
  h(qubits[0]);
  for (auto i = 1; i < qubit_count; ++i) {
    cx(qubits[0], qubits[i]);
  }
  mz(qubits);
}

int main(int argc, char *argv[]) {
  auto qubit_count = 1 < argc ? atoi(argv[1]) : 2;
  auto result = cudaq::sample(kernel, qubit_count);
  result.dump(); // Example: { 11:500 00:500 }
}

build & execute on bash:

# IT RAISES ERROR!
nvq++ program.cpp -o program.x
./program.x  2

libc++abi: terminating due to uncaught exception of type std::runtime_error
Aborted (core dumped)
# IT RUNS SUCCESSFULLY!
nvq++ program.cpp -o program.x --target gpp-cpu
./program.x  2

{ 00:496 11:504 }   # print

Expected behavior

I wanna execute without --target option!

Is this a regression? If it is, put the last known working version (or commit) here.

Not a regression

Environment

  • CUDA-Q version: 0.9.1 (cu12)
  • C++ compiler: g++ 11.4.0
  • Operating system: WSL Ubuntu 22.04.3 LTS (x86_64)
  • For more information:
# Install CUDA-Q

CUDA_Q_VERSION=0.9.1
CUDA_VERSION=cu12
ARCH=x86_64

wget https://github.com/NVIDIA/cuda-quantum/releases/download/${CUDA_Q_VERSION}/install_cuda_quantum_${CUDA_VERSION}.${ARCH}
sudo -E bash install_cuda_quantum*.$(uname -m) --accept
. /etc/profile
# Install MPI

MPI_PATH=/usr/local/openmpi

wget https://download.open-mpi.org/release/open-mpi/v5.0/openmpi-5.0.6.tar.gz
tar -xvzf openmpi-5.0.6.tar.gz

cd openmpi-5.0.6
./configure --prefix=$MPI_PATH --with-cuda=$CUDA_PATH
make -j6 all install
export PATH=$MPI_PATH:$PATH

Suggestions

No response

@clownchrys
Copy link
Author

clownchrys commented Jan 8, 2025

Here is additional gdb log

gdb ./program.x core


GNU gdb (Ubuntu 12.1-0ubuntu1~22.04.2) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./program.x...
/root/Workspace/example-cuda/core: No such file or directory.
(gdb) run
Starting program: /root/Workspace/example-cuda/program.x 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffc3dff000 (LWP 2056221)]
[New Thread 0x7fffc29ff000 (LWP 2056222)]
[New Thread 0x7fffc1dff000 (LWP 2056223)]
[Thread 0x7fffc1dff000 (LWP 2056223) exited]
[New Thread 0x7fffc1dff000 (LWP 2056224)]
libc++abi: terminating due to uncaught exception of type std::runtime_error

Thread 1 "program.x" received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737235263488) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737235263488)
    at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737235263488)
    at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737235263488, signo=signo@entry=6)
    at ./nptl/pthread_kill.c:89
#3  0x00007ffff1010476 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/posix/raise.c:26
#4  0x00007ffff0ff67f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff1568f36 in abort_message ()
   from /opt/nvidia/cudaq/lib/libnvqir-custatevec-fp32.so
#6  0x00007ffff15c5a93 in demangling_terminate_handler() ()
   from /opt/nvidia/cudaq/lib/libnvqir-custatevec-fp32.so
#7  0x00007ffff15c5953 in std::__terminate(void (*)()) ()
   from /opt/nvidia/cudaq/lib/libnvqir-custatevec-fp32.so
#8  0x00007ffff15c5086 in __cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*) ()
   from /opt/nvidia/cudaq/lib/libnvqir-custatevec-fp32.so
#9  0x00007ffff15c501f in __cxa_throw ()
   from /opt/nvidia/cudaq/lib/libnvqir-custatevec-fp32.so
#10 0x00007ffff14bc2be in (anonymous namespace)::CuStateVecCircuitSimulator<float>::a--Type <RET> for more, q to quit, c to continue without paging--c
ddQubitsToState(unsigned long, void const*) () from /opt/nvidia/cudaq/lib/libnvqir-custatevec-fp32.so
#11 0x00007ffff14c1b1f in nvqir::CircuitSimulatorBase<float>::allocateQubits(unsigned long, void const*, cudaq::simulation_precision) () from /opt/nvidia/cudaq/lib/libnvqir-custatevec-fp32.so
#12 0x00007ffff163dc82 in __quantum__rt__qubit_allocate_array () from /opt/nvidia/cudaq/lib/libnvqir.so
#13 0x0000555555564878 in __nvqpp__mlirgen__function_kernel._Z6kerneli ()
#14 0x00005555555649a8 in function_kernel._Z6kerneli.thunk ()
#15 0x00007ffff17b55b5 in (anonymous namespace)::DefaultQPU::launchKernel(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, cudaq::KernelThunkResultType (*)(void*, bool), void*, unsigned long, unsigned long, std::__1::vector<void*, std::__1::allocator<void*> > const&) () from /opt/nvidia/cudaq/lib/libcudaq-platform-default.so
#16 0x00007ffff7ea0f2e in hybridLaunchKernel () from /opt/nvidia/cudaq/lib/libcudaq.so
#17 0x0000555555564984 in kernel(int) ()
#18 0x000055555557039c in cudaq::invokeKernel<void (&)(int), int&> (fn=@0x555555564930: {void (int)} 0x555555564930 <kernel(int)>, args=@0x7fffffffdf94: 2) at /opt/nvidia/cudaq/include/common/KernelWrapper.h:555
#19 0x000055555556da9c in cudaq::sample<void (&)(int), int&>(void (&)(int), int&)::{lambda()#1}::operator()() (this=0x7fffffffded8) at /opt/nvidia/cudaq/include/cudaq/algorithms/sample.h:223
#20 0x000055555556d282 in cudaq::details::runSampling<cudaq::sample<void (&)(int), int&>(void (&)(int), int&)::{lambda()#1}>(void (&)(int), cudaq::quantum_platform&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int, unsigned long, cudaq::details::future*, unsigned long, unsigned long) (wrappedKernel=..., platform=..., kernelName=..., shots=1000, qpu_id=0, futureResult=0x0, batchIteration=0, totalBatchIters=0) at /opt/nvidia/cudaq/include/cudaq/algorithms/sample.h:95
#21 0x0000555555565572 in cudaq::sample<void (&)(int), int&> (kernel=@0x555555564930: {void (int)} 0x555555564930 <kernel(int)>, args=@0x7fffffffdf94: 2) at /opt/nvidia/cudaq/include/cudaq/algorithms/sample.h:221
#22 0x0000555555564b51 in main (argc=1, argv=0x7fffffffe0b8) at program.cpp:14
(gdb) 

@bmhowe23
Copy link
Collaborator

bmhowe23 commented Jan 8, 2025

Hi @clownchrys - the title of this issue says "Cannot run CUDA-Q program on CPU", but that is exactly what you are doing when you run with --target qpp-cpu. Did you perhaps mean to say "Cannot run CUDA-Q on GPU"?

In any case, a first debugging step might be to run nvidia-smi and report back here what the output shows. Based on your very helpful stack trace, it is clearly trying to initialize the GPU-based simulator.

@sacpis
Copy link
Collaborator

sacpis commented Jan 8, 2025

I think @clownchrys wants to run the program on a cpu without mentioning the target as qpp-cpu. I just tried in the latest nightly image.

cudaq@ea401e2-lcedt:~$ nvq++ test.cpp -o test.x && ./test.x
{ 00:502 11:498 }

@bmhowe23
Copy link
Collaborator

bmhowe23 commented Jan 8, 2025

I think @clownchrys wants to run the program on a cpu without mentioning the target as qpp-cpu. I just tried in the latest nightly image.

cudaq@ea401e2-lcedt:~$ nvq++ test.cpp -o test.x && ./test.x
{ 00:502 11:498 }

The default behavior of CUDA-Q depends on whether or not a GPU is available. If a GPU is available, the default is the nvidia target, so if running on CPU is desired, then --target qpp-cpu is required. If no GPU is available, the default is already qpp-cpu, so running with --target qpp-cpu is not required to run on the CPU.

@clownchrys
Copy link
Author

clownchrys commented Jan 9, 2025

The default behavior of CUDA-Q depends on whether or not a GPU is available. If a GPU is available, the default is the nvidia target, so if running on CPU is desired, then --target qpp-cpu is required. If no GPU is available, the default is already qpp-cpu, so running with --target qpp-cpu is not required to run on the CPU.

Aha!
My GPU is NVIDIA GeForce GTX 1070 Ti (compute capability 6.1).
It cannot run CUDA-Q, because CUDA-Q requires compute cap 7.0 or later.

This is the result of --target nvidia on my PC

nvq++ program.cpp -o program.x --target nvidia && ./program.x
GPU not supported. This simulator requires a GPU with compute capability 7.0 or later.

You means that default behavior, what I had run, works same as --target nvidia, right?
(Actually I am little bit confused, because the error log is different. So I want to know if I understood correctly)

I really appreciate your kindness, again

++ nvidia-smi

Thu Jan  9 15:01:00 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 560.94       CUDA Version: 12.6     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1070 Ti     On  | 00000000:01:00.0 Off |                  N/A |
|  0%   25C    P8               4W / 180W |   1067MiB /  8192MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

@bmhowe23
Copy link
Collaborator

bmhowe23 commented Jan 9, 2025

It cannot run CUDA-Q, because CUDA-Q requires compute cap 7.0 or later.

Yes, that is correct. For future reference, the full list of requirements is captured here: https://nvidia.github.io/cuda-quantum/latest/using/install/local_installation.html#dependencies-and-compatibility.

You means that default behavior, what I had run, works same as --target nvidia, right?
(Actually I am little bit confused, because the error log is different. So I want to know if I understood correctly)

Yes, it is my understanding that if you are running on a machine with an NVIDIA GPU, that running a) without any specified target and b) running with --target nvidia should be identical. So I am also confused why the error log is different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants