pci passthrough not working with proxmox8 and RTX6000 ada

sulacco

New Member
Dec 28, 2023
1
0
1
I have a problem under proxmox 8 when assigning a RTX6000 ada gpu to a vm using pci passthrough.

we receivd recently a rack server asus model ESC4000A-E12 with 4x nvidia RTX6000 ada, provided by AIME that arrived with a working ubuntu installation. we reinstalld it using proxomx 8.1-1 and applied the following configuration:

/etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"

/etc/modprobe.d/blacklist.conf:
blacklist nouveau
blacklist nvidia*

after rebooting, the only error reported by dmesg is:
[ 1.663069] ACPI BIOS Error (bug): Failure creating named object [\SMIP], AE_ALREADY_EXISTS (20230331/dsfield-637)

proxmox seems to properly detect the gpus:

root@aime1:/var/log# lspci -nn | grep -i nvidia
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation AD102GL [L6000 / RTX 6000 Ada Generation] [10de:26b1] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation AD102 High Definition Audio Controller [10de:22ba] (rev a1)
02:00.0 VGA compatible controller [0300]: NVIDIA Corporation AD102GL [L6000 / RTX 6000 Ada Generation] [10de:26b1] (rev a1)
02:00.1 Audio device [0403]: NVIDIA Corporation AD102 High Definition Audio Controller [10de:22ba] (rev a1)
c1:00.0 VGA compatible controller [0300]: NVIDIA Corporation AD102GL [L6000 / RTX 6000 Ada Generation] [10de:26b1] (rev a1)
c1:00.1 Audio device [0403]: NVIDIA Corporation AD102 High Definition Audio Controller [10de:22ba] (rev a1)
c2:00.0 VGA compatible controller [0300]: NVIDIA Corporation AD102GL [L6000 / RTX 6000 Ada Generation] [10de:26b1] (rev a1)
c2:00.1 Audio device [0403]: NVIDIA Corporation AD102 High Definition Audio Controller [10de:22ba] (rev a1)

also iommugroups seem to be different, as required:

Code:
root@aime1:~# pvesh get /nodes/aime1/hardware/pci --pci-class-blacklist ""
┌──────────┬────────┬──────────────┬────────────┬────────┬───────────────────────────────────────────┬──────┬──────────────────┬──────────────────
│ class    │ device │ id           │ iommugroup │ vendor │ device_name                               │ mdev │ subsystem_device │ subsystem_device_
╞══════════╪════════╪══════════════╪════════════╪════════╪═══════════════════════════════════════════╪══════╪══════════════════╪══════════════════
├──────────┼────────┼──────────────┼────────────┼────────┼───────────────────────────────────────────┼──────┼──────────────────┼──────────────────
│ 0x030000 │ 0x26b1 │ 0000:01:00.0 │         39 │ 0x10de │ AD102GL [L6000 / RTX 6000 Ada Generation] │      │ 0x16a1           │
├──────────┼────────┼──────────────┼────────────┼────────┼───────────────────────────────────────────┼──────┼──────────────────┼──────────────────
│ 0x030000 │ 0x26b1 │ 0000:02:00.0 │         40 │ 0x10de │ AD102GL [L6000 / RTX 6000 Ada Generation] │      │ 0x16a1           │
├──────────┼────────┼──────────────┼────────────┼────────┼───────────────────────────────────────────┼──────┼──────────────────┼──────────────────
│ 0x030000 │ 0x26b1 │ 0000:c1:00.0 │         10 │ 0x10de │ AD102GL [L6000 / RTX 6000 Ada Generation] │      │ 0x16a1           │
├──────────┼────────┼──────────────┼────────────┼────────┼───────────────────────────────────────────┼──────┼──────────────────┼──────────────────
│ 0x030000 │ 0x26b1 │ 0000:c2:00.0 │         11 │ 0x10de │ AD102GL [L6000 / RTX 6000 Ada Generation] │      │ 0x16a1           │
├──────────┼────────┼──────────────┼────────────┼────────┼───────────────────────────────────────────┼──────┼──────────────────┼──────────────────


then I created a vm with the following setup:
type: q35
display: standarvga
processor: host
bios: default (seabios)
pcidevice: raw (applied to one of the RTX600ada)
[x] all functions
[x] ROM bar
[ ] primary gpu
[ ] pci-express

installed with ubuntu22, and the problem is that inside the vm it does not recognize properly the gpu, it just detects a generic nvidia vga:

root@ubuntu22:~# lspci -nn | grep -i nvidia
06:10.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:26b1] (rev a1)
06:10.1 Audio device [0403]: NVIDIA Corporation Device [10de:22ba] (rev a1)

trying to install the nvidia drivers leads to different errors but never ends with nvidia-smi properly detecting the gpu; i also tried different configurations for the pcidevice, without results

searching with google some threads report that ada generation gpus wuould require vgpu sw/license also if vgpu is not used, just to work via passthrough, but others deny it (which seems reasonable to me)

could someone help me ?
 
Hi sulacco,

I encountered the same issue on almost the same hardware (ESC4000A-E12 with 2x 6000 Ada).

I found a hint to the solution here. In brief, you have to use Nvidia's displaymodeselector tool (dev account required for download).
Using this, you can set the GPU-mode to compute ("./displaymodeselector --gpumode compute" on the Proxmox host shell). However, this worked only for the second GPU, the first (ID 0 in the tool) GPU did allegedly not support the "physical_display_disabled" mode (see screenshot). The mode of the second GPU (ID 1) could be changed without an issue.

When I added the second GPU to a VM via passthrough (all functions, ROM bar, PCIe) and ran the tool again for the first GPU, the mode could be changed. I suppose it is a bug in the tool.

Both GPUs show up fine within the VM (nvidia-smi) now.

Hope this helps.

1706286656383.png
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!