pci passthrough not working with proxmox8 and RTX6000 ada

sulacco

New Member
Dec 28, 2023
1
0
1
I have a problem under proxmox 8 when assigning a RTX6000 ada gpu to a vm using pci passthrough.

we receivd recently a rack server asus model ESC4000A-E12 with 4x nvidia RTX6000 ada, provided by AIME that arrived with a working ubuntu installation. we reinstalld it using proxomx 8.1-1 and applied the following configuration:

/etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"

/etc/modprobe.d/blacklist.conf:
blacklist nouveau
blacklist nvidia*

after rebooting, the only error reported by dmesg is:
[ 1.663069] ACPI BIOS Error (bug): Failure creating named object [\SMIP], AE_ALREADY_EXISTS (20230331/dsfield-637)

proxmox seems to properly detect the gpus:

root@aime1:/var/log# lspci -nn | grep -i nvidia
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation AD102GL [L6000 / RTX 6000 Ada Generation] [10de:26b1] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation AD102 High Definition Audio Controller [10de:22ba] (rev a1)
02:00.0 VGA compatible controller [0300]: NVIDIA Corporation AD102GL [L6000 / RTX 6000 Ada Generation] [10de:26b1] (rev a1)
02:00.1 Audio device [0403]: NVIDIA Corporation AD102 High Definition Audio Controller [10de:22ba] (rev a1)
c1:00.0 VGA compatible controller [0300]: NVIDIA Corporation AD102GL [L6000 / RTX 6000 Ada Generation] [10de:26b1] (rev a1)
c1:00.1 Audio device [0403]: NVIDIA Corporation AD102 High Definition Audio Controller [10de:22ba] (rev a1)
c2:00.0 VGA compatible controller [0300]: NVIDIA Corporation AD102GL [L6000 / RTX 6000 Ada Generation] [10de:26b1] (rev a1)
c2:00.1 Audio device [0403]: NVIDIA Corporation AD102 High Definition Audio Controller [10de:22ba] (rev a1)

also iommugroups seem to be different, as required:

Code:
root@aime1:~# pvesh get /nodes/aime1/hardware/pci --pci-class-blacklist ""
┌──────────┬────────┬──────────────┬────────────┬────────┬───────────────────────────────────────────┬──────┬──────────────────┬──────────────────
│ class    │ device │ id           │ iommugroup │ vendor │ device_name                               │ mdev │ subsystem_device │ subsystem_device_
╞══════════╪════════╪══════════════╪════════════╪════════╪═══════════════════════════════════════════╪══════╪══════════════════╪══════════════════
├──────────┼────────┼──────────────┼────────────┼────────┼───────────────────────────────────────────┼──────┼──────────────────┼──────────────────
│ 0x030000 │ 0x26b1 │ 0000:01:00.0 │         39 │ 0x10de │ AD102GL [L6000 / RTX 6000 Ada Generation] │      │ 0x16a1           │
├──────────┼────────┼──────────────┼────────────┼────────┼───────────────────────────────────────────┼──────┼──────────────────┼──────────────────
│ 0x030000 │ 0x26b1 │ 0000:02:00.0 │         40 │ 0x10de │ AD102GL [L6000 / RTX 6000 Ada Generation] │      │ 0x16a1           │
├──────────┼────────┼──────────────┼────────────┼────────┼───────────────────────────────────────────┼──────┼──────────────────┼──────────────────
│ 0x030000 │ 0x26b1 │ 0000:c1:00.0 │         10 │ 0x10de │ AD102GL [L6000 / RTX 6000 Ada Generation] │      │ 0x16a1           │
├──────────┼────────┼──────────────┼────────────┼────────┼───────────────────────────────────────────┼──────┼──────────────────┼──────────────────
│ 0x030000 │ 0x26b1 │ 0000:c2:00.0 │         11 │ 0x10de │ AD102GL [L6000 / RTX 6000 Ada Generation] │      │ 0x16a1           │
├──────────┼────────┼──────────────┼────────────┼────────┼───────────────────────────────────────────┼──────┼──────────────────┼──────────────────


then I created a vm with the following setup:
type: q35
display: standarvga
processor: host
bios: default (seabios)
pcidevice: raw (applied to one of the RTX600ada)
[x] all functions
[x] ROM bar
[ ] primary gpu
[ ] pci-express

installed with ubuntu22, and the problem is that inside the vm it does not recognize properly the gpu, it just detects a generic nvidia vga:

root@ubuntu22:~# lspci -nn | grep -i nvidia
06:10.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:26b1] (rev a1)
06:10.1 Audio device [0403]: NVIDIA Corporation Device [10de:22ba] (rev a1)

trying to install the nvidia drivers leads to different errors but never ends with nvidia-smi properly detecting the gpu; i also tried different configurations for the pcidevice, without results

searching with google some threads report that ada generation gpus wuould require vgpu sw/license also if vgpu is not used, just to work via passthrough, but others deny it (which seems reasonable to me)

could someone help me ?
 
Hi sulacco,

I encountered the same issue on almost the same hardware (ESC4000A-E12 with 2x 6000 Ada).

I found a hint to the solution here. In brief, you have to use Nvidia's displaymodeselector tool (dev account required for download).
Using this, you can set the GPU-mode to compute ("./displaymodeselector --gpumode compute" on the Proxmox host shell). However, this worked only for the second GPU, the first (ID 0 in the tool) GPU did allegedly not support the "physical_display_disabled" mode (see screenshot). The mode of the second GPU (ID 1) could be changed without an issue.

When I added the second GPU to a VM via passthrough (all functions, ROM bar, PCIe) and ran the tool again for the first GPU, the mode could be changed. I suppose it is a bug in the tool.

Both GPUs show up fine within the VM (nvidia-smi) now.

Hope this helps.

1706286656383.png
 
Last edited:
@

jfraf This was really helpful, thank you for sharing. I also had the same issue as shown in your image, but I was not able to pass that. So I installed the Nvidia's displaymodeselector tool in my windows VM, rebooted the VM, shut it down, and then rebooted Proxmox. Now its working!