Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does cuda support vgpu driver ? #51

Open
hw0505 opened this issue May 17, 2021 · 6 comments
Open

Does cuda support vgpu driver ? #51

hw0505 opened this issue May 17, 2021 · 6 comments

Comments

@hw0505
Copy link

hw0505 commented May 17, 2021

Hello,I encountered a problem. The GPU I used was 2070 and the driver version was Linux KVM 7.9. When I install the vgpu driver and cuda in the virtual machine, there will be a problem that the vgpu driver and cuda do not match. Does cuda support vgpu? If you support it, what version of cuda can adapt to the vgpu driver.Thank you very much!

@DualCoder
Copy link
Owner

The CUDA versions supported by the different vGPU releases are listed here: https://docs.nvidia.com/cuda/vGPU/index.html

Additionally some feature limitations exist, as documented here: https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/#cuda-open-cl-support-vgpu (in addition to the GPUs listed vgpu_unlock also adds support for most Pascal and Turing GPUs, so your 2070 is expected to work).

@hw0505
Copy link
Author

hw0505 commented May 19, 2021

The CUDA versions supported by the different vGPU releases are listed here: https://docs.nvidia.com/cuda/vGPU/index.html

Additionally some feature limitations exist, as documented here: https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/#cuda-open-cl-support-vgpu (in addition to the GPUs listed vgpu_unlock also adds support for most Pascal and Turing GPUs, so your 2070 is expected to work).

Thanks again. I found the vGPU software version corresponding to cuda from the link you provided. I installed vGPU software release 12.2 (NVIDIA-Linux-x86_64-460.73.02-vgpu-kvm.run) and cuda 11.2 (cuda_11.2.0_460.27.04_linux.run) in the virtual machine. When I execute the nvidia-smi command, I can see the version of the vGPU driver, but I cannot see the version of cuda. When I run the cuda program (calling the "cudaGetDeviceCount" function), the error "CUDA driver version is insufficient for CUDA runtime version" appears. I have a few questions to ask: 1. Have you successfully used the cuda library in the virtual machine? 2. How do you verify that the vGPU generated by "vgpu_unlock" can work normally in the virtual machine? Can you share your verification method? I want to reproduce it in my environment.
By the way, I did not assign a license to the vGPU in the virtual machine. Could this be the reason for the failure? In the vGPU wiki, I did not find the specific steps for assigning a license to a virtual machine.
Hope to get your reply soon!

@DualCoder
Copy link
Owner

I installed vGPU software release 12.2 (NVIDIA-Linux-x86_64-460.73.02-vgpu-kvm.run) and cuda 11.2 (cuda_11.2.0_460.27.04_linux.run) in the virtual machine.

That is not correct. The *-vgpu-kvm.run driver is supposed to be installed on the host, not in the guest. This driver does not support CUDA, so you will not have CUDA support on the host. The *-grid.run driver is supposed to be installed in the guest, this driver does have CUDA support and you can have CUDA in the guest (with Q and C profiles).

  1. Have you successfully used the cuda library in the virtual machine?

Yes, cuda_11.3.1_465.19.01_linux.run and NVIDIA-Linux-x86_64-460.32.03-grid.run installed in the VM.

  1. How do you verify that the vGPU generated by "vgpu_unlock" can work normally in the virtual machine? Can you share your verification method? I want to reproduce it in my environment.

For verifying that CUDA and OpenCL runs I have successfully executed the example scripts from hashcat 5.1.0 (https://hashcat.net), I didn't manage to get any of the newer versions to work with CUDA though, only OpenCL. I have not done any performance evaluation of CUDA or OpenCL.

For verifying that OpenGL works, I have run the Heaven Benchmark and just checked that it's running at 60 fps (frame rate limiter enabled). I have run some other graphics benchmarks to compare performance, but my testing is incomplete and I do not have any reliable conclusions yet.

By the way, I did not assign a license to the vGPU in the virtual machine. Could this be the reason for the failure? In the vGPU wiki, I did not find the specific steps for assigning a license to a virtual machine.

No, that is not the reason for the failure. Without a license CUDA will work normally for 20 mins after guest bootup and then suffer severe performance penalties, see https://docs.nvidia.com/grid/latest/grid-licensing-user-guide/index.html#software-enforcement-grid-licensing for further details. That page also has information on how to assign a license to the virtual machine.

@hw0505
Copy link
Author

hw0505 commented Jul 7, 2021

Thanks again! Have you ever tested vgpu_unlock on gtx 1080ti or titan xp? I tested it on these two GPUs using the case that comes with cuda. When I use the matrixMul case, it prompts "error: all devices have compute mode prohibited". When I use the vectorAdd case, it prompts "error code all CUDA-capable devices are busy or unavailable".

@DualCoder
Copy link
Owner

Tested on TITAN X (Pascal):

Host:

$ nvidia-smi
Sun Jul 11 17:10:12 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.04    Driver Version: 460.32.04    CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  TITAN X (Pascal)    On   | 00000000:01:00.0 Off |                  N/A |
|  0%   33C    P8    18W / 250W |   8166MiB / 12287MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     18600      G   vgpu                             8126MiB |
+-----------------------------------------------------------------------------+

Guest:

$ nvidia-smi
Sun Jul 11 17:07:57 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01    Driver Version: 460.73.01    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GRID P40-8Q         Off  | 00000000:07:01.0  On |                  N/A |
| N/A   N/A    P8    N/A /  N/A |    666MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       738      G   /usr/lib/xorg/Xorg                 94MiB |
+-----------------------------------------------------------------------------+

$ ./matrixMul 
[Matrix Multiply Using CUDA] - Starting...
GPU Device 0: "Pascal" with compute capability 6.1

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
done
Performance= 1203.68 GFlop/s, Time= 0.109 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS

NOTE: The CUDA Samples are not meant for performancemeasurements. Results may vary when GPU Boost is enabled.

$ ./vectorAdd_nvrtc 
> Using CUDA Device [0]: GRID P40-8Q
> Using CUDA Device [0]: GRID P40-8Q
> GPU Device has SM 6.1 compute capability
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

$ ./vectorAddMMAP 
Vector Addition (Driver API)
> Using CUDA Device [0]: GRID P40-8Q
Device 0 VIRTUAL ADDRESS MANAGEMENT SUPPORTED = 0.
Device 0 doesn't support VIRTUAL ADDRESS MANAGEMENT.

@hw0505
Copy link
Author

hw0505 commented Jul 12, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants