tl;dr:
If you can't pass-through a NVidia Super GPU to a VM and you have a long Proxmox boot time then try this:
Full info:
I've spent a lot of time to try and get a GeForce GTX 1660 SUPER AERO ITX OC GPU passed through to a Windows 10 VM.
I changed my AMD RX 570 for a Nvidia 1660 Super and immediately got 2 problems:
1. The boot time for Proxmox got 1 min and 33 seconds longer.
2. I could not GPU passthrough my NVidia 1660 Super
I found a post where it was said that all kernels after 5.0.21-5 has this problem.
You can fix the problem by downgrading to this kernel version:
(spoiler: DON'T DO THIS)
On the most recent kernel (5.3.13-3-pve) I get this output after doing the common GPU passthrough stuff:
As you can see there is a "USB controller" connected to the GPU and also a "Serial bus controller".
Both of then are not using the vfio-pci kernel driver, which is needed for a passthrough to work.
When checking why the system is booting so slow I get this:
If I try and start a Win10 VM with 09:00 passthrough I get the following:
Notice that nothing happens for around 3:30 minutes.
After I tried starting the VM and the error messages shows up the system gets unstable and I need to power down Proxmox by a hard reset.
How to fix this problem:
This problem can be fixed by blacklisting the "i2c nvidia gpu" kernel module:
Now it looks like this:
(09:00.3 is using vfio-pci)
Now there is no delay at boot and I can start my VM with the GPU passthrough but /var/log/syslog still says it has problems with xhci_hcd on the first boot of the VM:
This only happens on the first boot of the VM. After that 09:00.2 is using the vfio-pci kernel driver.
While I was trying to fix the main problem did I find a way to set so "09:00.2 USB controller" is using vfio-pci from the start but it includes making a script:
If you run that before you start the VM there is no error, but this seems like overkill now since it doesn't seem to be needed.
If you can't pass-through a NVidia Super GPU to a VM and you have a long Proxmox boot time then try this:
Code:
echo "blacklist i2c-nvidia-gpu" >> /etc/modprobe.d/blacklist.conf
update-initramfs -u -k all
reboot
Full info:
I've spent a lot of time to try and get a GeForce GTX 1660 SUPER AERO ITX OC GPU passed through to a Windows 10 VM.
I changed my AMD RX 570 for a Nvidia 1660 Super and immediately got 2 problems:
1. The boot time for Proxmox got 1 min and 33 seconds longer.
2. I could not GPU passthrough my NVidia 1660 Super
I found a post where it was said that all kernels after 5.0.21-5 has this problem.
You can fix the problem by downgrading to this kernel version:
(spoiler: DON'T DO THIS)
Bash:
apt install pve-kernel-5.0.21-5-pve
reboot
(select 5.0.21-5 on boot)
grub-editenv - set saved_entry="Proxmox Virtual Environment GNU/Linux, with Linux 5.0.21-5-pve"
//delete newer kernels (5.3-*)
touch '/please-remove-proxmox-ve'
apt remove "pve-kernel-5.3-*"
reboot
On the most recent kernel (5.3.13-3-pve) I get this output after doing the common GPU passthrough stuff:
Code:
lspci -nnk
09:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:21c4] (rev a1)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:8d94]
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau
09:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:1aeb] (rev a1)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:8d94]
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel
09:00.2 USB controller [0c03]: NVIDIA Corporation Device [10de:1aec] (rev a1)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:8d94]
Kernel driver in use: xhci_hcd
09:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device [10de:1aed] (rev a1)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:8d94]
Kernel driver in use: nvidia-gpu
Kernel modules: i2c_nvidia_gpu
As you can see there is a "USB controller" connected to the GPU and also a "Serial bus controller".
Both of then are not using the vfio-pci kernel driver, which is needed for a passthrough to work.
When checking why the system is booting so slow I get this:
Code:
systemd-analyze blame
1min 33.115s ifupdown-pre.service
1min 33.101s systemd-udev-settle.service
5.800s smartmontools.service
5.506s zfs-import-cache.service
2.127s systemd-modules-load.service
...
If I try and start a Win10 VM with 09:00 passthrough I get the following:
Code:
Feb 9 12:52:04 prox pvedaemon[13664]: start VM 100: UPID:prox:00003560:00005E98:5E3FF264:qmstart:100:root@pam:
Feb 9 12:52:04 prox pvedaemon[13173]: <root@pam> starting task UPID:prox:00003560:00005E98:5E3FF264:qmstart:100:root@pam:
Feb 9 12:55:31 prox kernel: [ 449.004296] xhci_hcd 0000:09:00.2: remove, state 4
Feb 9 12:55:31 prox kernel: [ 449.004300] usb usb6: USB disconnect, device number 1
Feb 9 12:55:31 prox kernel: [ 449.004422] xhci_hcd 0000:09:00.2: USB bus 6 deregistered
Feb 9 12:55:31 prox kernel: [ 449.004428] xhci_hcd 0000:09:00.2: remove, state 4
Feb 9 12:55:31 prox kernel: [ 449.004428] usb usb5: USB disconnect, device number 1
Feb 9 12:55:31 prox kernel: [ 449.005083] xhci_hcd 0000:09:00.2: USB bus 5 deregistered
Feb 9 12:55:31 prox systemd[1]: Created slice qemu.slice.
Feb 9 12:55:31 prox systemd[1]: Started 100.scope.
Feb 9 12:55:31 prox kernel: [ 449.226960] BUG: unable to handle page fault for address: ffff9f9380121000
Feb 9 12:55:31 prox kernel: [ 449.226964] #PF: supervisor read access in kernel mode
...
After I tried starting the VM and the error messages shows up the system gets unstable and I need to power down Proxmox by a hard reset.
How to fix this problem:
This problem can be fixed by blacklisting the "i2c nvidia gpu" kernel module:
Code:
echo "blacklist i2c-nvidia-gpu" >> /etc/modprobe.d/blacklist.conf
update-initramfs -u -k all
reboot
Now it looks like this:
Code:
lspci -nnk
09:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:21c4] (rev a1)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:8d94]
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau
09:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:1aeb] (rev a1)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:8d94]
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel
09:00.2 USB controller [0c03]: NVIDIA Corporation Device [10de:1aec] (rev a1)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:8d94]
Kernel driver in use: xhci_hcd
09:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device [10de:1aed] (rev a1)
Subsystem: Micro-Star International Co., Ltd. [MSI] Device [1462:8d94]
Kernel driver in use: vfio-pci
Kernel modules: i2c_nvidia_gpu
Now there is no delay at boot and I can start my VM with the GPU passthrough but /var/log/syslog still says it has problems with xhci_hcd on the first boot of the VM:
Code:
Feb 9 13:08:34 prox pvedaemon[1999]: <root@pam> starting task UPID:prox:0000093D:00002B69:5E3FF642:qmstart:100:root@pam:
Feb 9 13:08:35 prox kernel: [ 110.935303] xhci_hcd 0000:09:00.2: remove, state 4
Feb 9 13:08:35 prox kernel: [ 110.935306] usb usb6: USB disconnect, device number 1
Feb 9 13:08:35 prox kernel: [ 110.935426] xhci_hcd 0000:09:00.2: USB bus 6 deregistered
Feb 9 13:08:35 prox kernel: [ 110.935431] xhci_hcd 0000:09:00.2: remove, state 4
Feb 9 13:08:35 prox kernel: [ 110.935432] usb usb5: USB disconnect, device number 1
Feb 9 13:08:35 prox kernel: [ 110.936068] xhci_hcd 0000:09:00.2: USB bus 5 deregistered
While I was trying to fix the main problem did I find a way to set so "09:00.2 USB controller" is using vfio-pci from the start but it includes making a script:
Code:
nano unbind-gtx-usb.sh
Bash:
#!/bin/sh
DEVICE1="09:00.2"
modprobe vfio-pci
for dev in "0000:$DEVICE1"; do
vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
device=$(cat /sys/bus/pci/devices/$dev/device)
if [ -e /sys/bus/pci/devices/$dev/driver ]; then
echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
fi
echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done
Code:
chmod u+x unbind-gtx-usb.sh
./unbind-gtx-usb.sh
If you run that before you start the VM there is no error, but this seems like overkill now since it doesn't seem to be needed.