vGPU doesn't work with pytorch/nccl/vllm

Mar 10, 2023
2
0
6
Hi,

we are struggeling with vgpus for vllm / pytorch inside a VM.

Setup:
We operate a node with 4x Nvidia L40S in a Proxmox Cluster (8.4.1). Driver on the host is 580.65.05 and vGPUs are setup correctly and work reliable and flawless for VDIs.

For LLM Inference we have mapped 4 vGPUs into a Virtual machine. All 4 vGPUs show up correctly in the guest system (using nvidia-smi). We setup vLLM (0.11.0) and torch 2.8.0+cu128 in a python virtual environment. To our understanding torch comes with a pre-compiled cuda + nccl version. Inferencing models that fit on a SINGLE vGPU (L40S) work flawless.

Problem:
We follow the Troubleshooting guide from vLLM, testing if torch + cuda + nccl actually work. Executing the suggested test script fails in with ncclUnhandledCudaError. I have attached the error output log. When we don't use vGPUs, but rather use PCI Passthrough the test passes correctly and vllm inference can make use of all L40S cards. We suspect there is a problem between the vllm driver and the cuda / nccl version that comes with pytorch. Do you have any advice how to make torch / nccl / vllm work with vGPUs?

Disclaimer:
I opened this ticket previously with the Nivida Enterprise Support, but got defered to the Hypervisor vendor (Proxmox):
This case involves use of a Linux KVM Partner hypervisor, which is out of scope for NVIDIA Enterprise Support.
All technical issues should be reported to your Linux KVM Hypervisor vendor.
The Linux KVM Hypervisor vendor will contact NVIDIA directly as required, and is also responsible for delivering any SW updates that may be needed to address the issue.

Help would be much appreciated!

Thanks & kind regards,
Tobias
 

Attachments

Did you chose correct vgpu profile and set Nvidia license token in vm?

Do you mind to show
nvidia-smi
nvidia-smi -q
From vm?
 
sure!
Code:
$ nvidia-smi

Fri Nov  7 14:13:38 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L40S-48Q                On  |   00000000:06:10.0 Off |                    0 |
| N/A   N/A    P0            N/A  /  N/A  |       0MiB /  49152MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L40S-48Q                On  |   00000000:06:11.0 Off |                    0 |
| N/A   N/A    P0            N/A  /  N/A  |       0MiB /  49152MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA L40S-48Q                On  |   00000000:06:1B.0 Off |                    0 |
| N/A   N/A    P0            N/A  /  N/A  |       0MiB /  49152MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA L40S-48Q                On  |   00000000:06:1C.0 Off |                    0 |
| N/A   N/A    P0            N/A  /  N/A  |       0MiB /  49152MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

and attached the output of nvidia-smi -q.
 

Attachments

As far as I know, profile type C is best suited for AI (requires a vCS license, but works with a vDWS license as well).
Unfortunately, I have no other ideas.
Subscribed to the post.