Hi
We are trying to get our H100 GPU passthrough to work again after upgrading our Proxmox host from PVE v9.1.1 to v9.2.3. It worked ok on v9.1.1.
We have set the config for the VM (running AlmaLinux 9.8) on the host to:
We also tried:
Also tried reverting to Kernel version 6.x which we were using before the upgrade to see if that made a difference.
But in both instances we get:
Are there any other suggestions as to how we might best adjust the configuration to get this working?
Thanks in advance!
We are trying to get our H100 GPU passthrough to work again after upgrading our Proxmox host from PVE v9.1.1 to v9.2.3. It worked ok on v9.1.1.
We have set the config for the VM (running AlmaLinux 9.8) on the host to:
Code:
hostpci0: 0000:21:00.0,pcie=1
We also tried:
Code:
hostpci0: 0000:21:00.0,pcie=1,rombar=0
Also tried reverting to Kernel version 6.x which we were using before the upgrade to see if that made a difference.
But in both instances we get:
Code:
[root@gpu-h100-1 ~]# lspci | grep -i nvidia
01:00.0 3D controller: NVIDIA Corporation GH100 [H100L 94GB] (rev a1)
[root@gpu-h100-1 ~]#
[root@gpu-h100-1 ~]# dmesg | tail -20 | grep -E "NVRM|nvidia"
[ 8.124633] nvidia-nvlink: Nvlink Core is being initialized, major device number 238
[ 8.128167] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR0 is 0M @ 0x0 (PCI:0000:01:00.0)
[ 8.128344] nvidia 0000:01:00.0: probe with driver nvidia failed with error -1
[ 8.128420] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 8.128422] NVRM: None of the NVIDIA devices were initialized.
[ 8.129543] nvidia-nvlink: Unregistered Nvlink Core, major device number 238
[ 10.156066] nvidia-nvlink: Nvlink Core is being initialized, major device number 238
[ 10.156076] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
NVRM: BAR0 is 0M @ 0x0 (PCI:0000:01:00.0)
[ 10.159928] nvidia 0000:01:00.0: probe with driver nvidia failed with error -1
[ 10.160008] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 10.160010] NVRM: None of the NVIDIA devices were initialized.
[ 10.162194] nvidia-nvlink: Unregistered Nvlink Core, major device number 238
[root@gpu-h100-1 ~]#
Code:
[root@gpu-h100-1 ~]# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
Are there any other suggestions as to how we might best adjust the configuration to get this working?
Thanks in advance!