vGPU doesn't work with pytorch/nccl/vllm

bluesnapper · Nov 7, 2025

Hi,

we are struggeling with vgpus for vllm / pytorch inside a VM.

Setup:
We operate a node with 4x Nvidia L40S in a Proxmox Cluster (8.4.1). Driver on the host is 580.65.05 and vGPUs are setup correctly and work reliable and flawless for VDIs.

For LLM Inference we have mapped 4 vGPUs into a Virtual machine. All 4 vGPUs show up correctly in the guest system (using nvidia-smi). We setup vLLM (0.11.0) and torch 2.8.0+cu128 in a python virtual environment. To our understanding torch comes with a pre-compiled cuda + nccl version. Inferencing models that fit on a SINGLE vGPU (L40S) work flawless.

Problem:
We follow the Troubleshooting guide from vLLM, testing if torch + cuda + nccl actually work. Executing the suggested test script fails in with ncclUnhandledCudaError. I have attached the error output log. When we don't use vGPUs, but rather use PCI Passthrough the test passes correctly and vllm inference can make use of all L40S cards. We suspect there is a problem between the vllm driver and the cuda / nccl version that comes with pytorch. Do you have any advice how to make torch / nccl / vllm work with vGPUs?

Disclaimer:
I opened this ticket previously with the Nivida Enterprise Support, but got defered to the Hypervisor vendor (Proxmox):

This case involves use of a Linux KVM Partner hypervisor, which is out of scope for NVIDIA Enterprise Support.
All technical issues should be reported to your Linux KVM Hypervisor vendor.
The Linux KVM Hypervisor vendor will contact NVIDIA directly as required, and is also responsible for delivering any SW updates that may be needed to address the issue.

Help would be much appreciated!

Thanks & kind regards,
Tobias

Whatever · Nov 7, 2025

Did you chose correct vgpu profile and set Nvidia license token in vm?

Do you mind to show
nvidia-smi
nvidia-smi -q
From vm?

bluesnapper · Nov 7, 2025

sure!

Code:

$ nvidia-smi

Fri Nov  7 14:13:38 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.95.05              Driver Version: 580.95.05      CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L40S-48Q                On  |   00000000:06:10.0 Off |                    0 |
| N/A   N/A    P0            N/A  /  N/A  |       0MiB /  49152MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L40S-48Q                On  |   00000000:06:11.0 Off |                    0 |
| N/A   N/A    P0            N/A  /  N/A  |       0MiB /  49152MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA L40S-48Q                On  |   00000000:06:1B.0 Off |                    0 |
| N/A   N/A    P0            N/A  /  N/A  |       0MiB /  49152MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA L40S-48Q                On  |   00000000:06:1C.0 Off |                    0 |
| N/A   N/A    P0            N/A  /  N/A  |       0MiB /  49152MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

and attached the output of nvidia-smi -q.

Whatever · Nov 7, 2025

As far as I know, profile type C is best suited for AI (requires a vCS license, but works with a vDWS license as well).
Unfortunately, I have no other ideas.
Subscribed to the post.

basavyr · Nov 21, 2025

Hey there, for what is worth, we are also encountering a similar issue with our server. We have two L40 GPUs allocated on a VM (managed by Proxmox through vGPU), and when we try to use PyTorch DPP to train a model, the execution fails with: CUDA error: operation not supported.

The vGPU license was active at the time of testing:

Bash:

nvidia-smi -q | grep "License Status"

        License Status                    : Licensed (Expiry: 2025-11-22 6:1:40 GMT)

As a matter of fact, we tried the most minimal test using nccl backend, which can be seen here. Unfortunately, we have the same outcome. It seems that GPU inter-communication through nccl is not possible using vGPU.

Can anyone confirm ?

DerDieDas · Jan 16, 2026

Hello!
We encountered a similar NCCL issue with CUDA error: operation not supported.
Our H200s were managed by vGPU. We uninstalled the vGPU drivers (on the Proxmox host and the VM) and installed on the VM the latest Nvidia Linux driver. After that we used the GPUs via PCI Passthorough and the error was gone. So I can confirm that it looks like inter-com via nccl is not possible when using vGPU.
Hope this helps someone in the future.

@bluesnapper

Do you have any advice how to make torch / nccl / vllm work with vGPUs?

You can avoid those errors when you set tensor-parallel-size to 1 (in vLLM). This deactivates nccl but you'll have reduced performance. Worth a try to narrow down the source of the error.

Search

Search

vGPU doesn't work with pytorch/nccl/vllm

bluesnapper

Member

Attachments

Whatever

Renowned Member

bluesnapper

Member

Attachments

Whatever

Renowned Member

basavyr

New Member

DerDieDas

Active Member

We value your privacy