Proxmox 8.1.4 and passthrough gpu nvidia

rjcab

Member
Mar 1, 2021
50
0
11
44
Hello,

I want to passthrough my gpu and I did follow the below guide:
https://forum.proxmox.com/threads/p...x-ve-8-installation-and-configuration.130218/
Thanks to @asded

It worked with proxmox 7.4 with the same motherboard.

Code:
root@pve:~# efibootmgr -v
BootCurrent: 0003
Timeout: 2 seconds
BootOrder: 0003,0001,0000,0004,0002,0005,0006
Boot0000* Windows Boot Manager    VenHw(99e275e7-75a0-4b37-a2e6-c5385e6c00cb)WINDOWS.........x...B.C.D.O.B.J.E.C.T.=.{.9.d.e.a.8.6.2.c.-.5.c.d.d.-.4.e.7.0.-.a.c.c.1.-.f.3.2.b.3.4.4.d.4.7.9.5.}...a................
Boot0001* debian    VenHw(99e275e7-75a0-4b37-a2e6-c5385e6c00cb)
Boot0002* UEFI:CD/DVD Drive    BBS(129,,0x0)
Boot0003* proxmox    HD(2,GPT,5b03a7b4-beb1-436d-a7be-e0b8836506cf,0x800,0x200000)/File(\EFI\PROXMOX\SHIMX64.EFI)
Boot0004* UEFI OS    HD(2,GPT,5b03a7b4-beb1-436d-a7be-e0b8836506cf,0x800,0x200000)/File(\EFI\BOOT\BOOTX64.EFI)..BO
Boot0005* UEFI:Removable Device    BBS(130,,0x0)
Boot0006* UEFI:Network Device    BBS(131,,0x0)
root@pve:~#

so GRUB is used in UEFI mode.

Code:
root@pve:~# cat /etc/default/grub
# If you change this file, run 'update-grub' afterwards to update
# /boot/grub/grub.cfg.
# For full documentation of the options in this file, see:
#   info -f grub -n 'Simple configuration'

GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
#GRUB_CMDLINE_LINUX_DEFAULT="quiet iommu=pt nomodeset pcie_acs_override=downstream initcall_blacklist=sysfb_init"
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on"
#GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt video=vesafb:off video=efifb:off"
#GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt initcall_blacklist=sysfb_init"

If I put the below args, it doesn't work and stucks at boot saying something "likeloading initial ramdisk "
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt nomodeset pcie_acs_override=downstream initcall_blacklist=sysfb_init"

If I put
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt initcall_blacklist=sysfb_init"
then it seems not correct after making the update-grub and rebooting:

Code:
[CODE]root@pve:~# dmesg | grep -e IOMMU
[    0.246259] DMAR-IR: IOAPIC id 2 under DRHD base  0xfed91000 IOMMU 0
root@pve:~#
[/CODE]

and If i put, it boot correctly
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on"

then

Code:
root@pve:~# update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-6.5.11-8-pve
Found initrd image: /boot/initrd.img-6.5.11-8-pve
Found memtest86+ 64bit EFI image: /boot/memtest86+x64.efi
Adding boot menu entry for UEFI Firmware Settings ...
done
root@pve:~#

So I keep this one: GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on" and I reboot

then:

Code:
root@pve:~# cat /etc/kernel/cmdline
root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet intel_iommu=on iommu=pt
root@pve:~#

then

Code:
root@pve:~# pve-efiboot-tool refresh
Running hook script 'proxmox-auto-removal'..
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.
root@pve:~#

Code:
root@pve:~# dmesg | grep -e IOMMU
[    0.106293] DMAR: IOMMU enabled
[    0.246310] DMAR-IR: IOAPIC id 2 under DRHD base  0xfed91000 IOMMU 0

Code:
# /etc/modules: kernel modules to load at boot time.
#
# This file contains the names of kernel modules that should be loaded
# at boot time, one per line. Lines beginning with "#" are ignored.
# Parameters can be specified after the module name.
vfio
vfio_iommu_type1
vfio_pci
root@pve:~#

Code:
root@pve:~# update-initramfs -u -k all
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_CTYPE = "UTF-8",
perl: warning: Setting locale failed.
    LANG = "en_US.UTF-8"
perl: warning: Please check that your locale settings:
    are supported and installed on your system.
    LANGUAGE = (unset),
    LC_ALL = (unset),
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
    LC_CTYPE = "UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
update-initramfs: Generating /boot/initrd.img-6.5.11-8-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.
root@pve:~#


Code:
root@pve:~# dmesg | grep -i vfio
[    3.178643] VFIO - User Level meta-driver version: 0.3
root@pve:~#

Code:
root@pve:~# dmesg | grep 'remapping'
[    0.246313] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[    0.247757] DMAR-IR: Enabled IRQ remapping in x2apic mode
root@pve:~#

Code:
root@pve:~# lspci -nn | grep 'NVIDIA'
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU117 [GeForce GTX 1650] [10de:1f82] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10fa] (rev a1)
root@pve:~#

When I set that, I reboot but the system failed at boot:

Code:
root@pve:~# echo "options vfio-pci ids=10de:1f82,10de:10fa" > /etc/modprobe.d/vfio.conf
root@pve:~#

If you have any solutions :)

Many thanks
 
Last edited:
root@pve:~# echo "options vfio-pci ids=10de:1f82,10de:10fa" > /etc/modprobe.d/vfio.conf

I have a working configuration with 8.0.4, but without pcie_acs_override=downstream:
grub:
.."quiet pcie_aspm=off intel_idle.max_cstate=0 intel_pstate=disable processor.max_cstate=1 mitigations=off intel_iommu=on iommu=pt"

#lspci -nn | grep 'NVIDIA'
0d:00.0 3D controller [0302]: NVIDIA Corporation GA102GL [A40] [10de:2235] (rev a1)
b5:00.0 3D controller [0302]: NVIDIA Corporation GA102GL [A40] [10de:2235] (rev a1)

#cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:2235

# cat /etc/modules
..
vfio
vfio_iommu_type1
vfio_pci


Maybe revert to 8.0.xx ?
 
Last edited:
well I've searched and it seems quite tricky to downgrade and I have to much to do if I have to reinstall proxmox 8.0 :(
 
I've tried:

Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet pcie_aspm=off intel_idle.max_cstate=0 intel_pstate=disable processor.max_cstate=1 mitigations=off intel_iommu=on iommu=pt"

It boots correctly.
Now If I add, it stucks as below:

[/CODE]
#cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:1f82

Code:
root@pve:~# lspci -nn | grep 'NVIDIA'
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU117 [GeForce GTX 1650] [10de:1f82] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10fa] (rev a1)
root@pve:~#

[ATTACH type="full"]64596[/ATTACH]
 

Attachments

  • Untitled.jpg
    Untitled.jpg
    139.6 KB · Views: 4
Hi, @rjcab, before going any further and avoiding any misunderstanding, I think it's a typo but what's your CPU, is it an Intel or an AMD? Your GRUB lines indicate both amd_iommu=on and intel_iommu=on.

Also, since your GRUB is in UEFI mode, why add /etc/kernel/cmdline as efibootmgr -v doesn't indicate that systemd-boot is being used. Lastly, GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt nomodeset pcie_acs_override=downstream initcall_blacklist=sysfb_init" seems to be working fine. In fact, the system is not stuck at startup, it's just that the display no longer goes through the graphics card, since pass-through involves blacklisting nvidia drivers.
 
Hi, @rjcab, before going any further and avoiding any misunderstanding, I think it's a typo but what's your CPU, is it an Intel or an AMD? Your GRUB lines indicate both amd_iommu=on and intel_iommu=on.

Also, since your GRUB is in UEFI mode, why add /etc/kernel/cmdline as efibootmgr -v doesn't indicate that systemd-boot is being used. Lastly, GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt nomodeset pcie_acs_override=downstream initcall_blacklist=sysfb_init" seems to be working fine. In fact, the system is not stuck at startup, it's just that the display no longer goes through the graphics card, since pass-through involves blacklisting nvidia drivers.
Thanks for your reply.
You are completly right, I made a mistake :
Code:
#GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt initcall_blacklist=sysfb_init"

It's INTEL :)

In fact, the system is not stuck at startup, it's just that the display no longer goes through the graphics card, since pass-through involves blacklisting nvidia drivers.
Well I suppose. In fact i am able to got via SSH and when I type reboot nothing happens and the screen still display the same messages.
When I was on Proxmox 7 even with passthrough the server boots and once I started the VM with the PCIe I lost the display on proxmox to have it on the VM which seems normal.
 
Last edited:
I see, in this case the correct command would be.#GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt initcall_blacklist=sysfb_init For display, this can also depend on the method chosen for excluding drivers, "Modules order" or "Blacklisting drivers," but the essential thing is that your GPU is ultimately managed by vfio-pci
 
@asded , if I recap I should keep:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt nomodeset pcie_acs_override=downstream initcall_blacklist=sysfb_init"
and remove:
root@pve:~# cat /etc/kernel/cmdline
root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet intel_iommu=on iommu=pt
root@pve:~#
and keep that:

root@pve:~# cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:1f82,10de:10fa
 
for GRUB use Intel settings GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt nomodeset pcie_acs_override=downstream initcall_blacklist=sysfb_init" it's okay for all, but which method do you use for drivers isolation ?
 
I choose blacklist method:

Code:
root@pve:~# cat /etc/modprobe.d/blacklist.conf
blacklist i915
blacklist bluetooth
install bluetooth /bin/true
blacklist nouveau
blacklist nvidia
blacklist nvidiafb
blacklist nvidia_drm
root@pve:~#
 
Last edited:
it's okay, check also if you have enabled vfio oncat /etc/modules
Bash:
echo "vfio" >> /etc/modules
echo "vfio_iommu_type1" >> /etc/modules
echo "vfio_pci" >> /etc/modules
 
Code:
root@pve:~# cat /etc/modules
# /etc/modules: kernel modules to load at boot time.
#
# This file contains the names of kernel modules that should be loaded
# at boot time, one per line. Lines beginning with "#" are ignored.
# Parameters can be specified after the module name.
vfio
vfio_iommu_type1
vfio_pci
root@pve:~#
 
I choose blacklist method:

Code:
root@pve:~# cat /etc/modprobe.d/blacklist.conf
blacklist i915
blacklist bluetooth
install bluetooth /bin/true
blacklist nouveau
blacklist nvidia
blacklist nvidiafb
blacklist nvidia_drm
root@pve:~#
each time I make changes in that file, I have to run the below command and reboot ?

Code:
root@pve:~# update-initramfs -u -k all
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_CTYPE = "UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_CTYPE = "UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
update-initramfs: Generating /boot/initrd.img-6.5.11-8-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.
root@pve:~#
 
I did all of that with those conf:
Bash:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt initcall_blacklist=sysfb_init"

root@pve:~# dmesg | grep -e IOMMU
[    0.106394] DMAR: IOMMU enabled
[    0.247236] DMAR-IR: IOAPIC id 2 under DRHD base  0xfed91000 IOMMU 0
root@pve:~#

 /etc/modules
vfio
vfio_iommu_type1
vfio_pci

root@pve:~# cat /etc/modprobe.d/blacklist.conf
blacklist i915
blacklist bluetooth
install bluetooth /bin/true
blacklist nouveau
blacklist nvidia
blacklist nvidiafb
blacklist nvidia_drm

root@pve:~# cat /etc/modprobe.d/vfio.conf
#options vfio-pci ids=10de:1f82
options vfio-pci ids=10de:1f82,10de:10fa
root@pve:~#

root@pve:~# dmesg | grep -i vfio
[    3.372791] VFIO - User Level meta-driver version: 0.3
[    3.379729] vfio-pci 0000:01:00.0: vgaarb: deactivate vga console
[    3.379733] vfio-pci 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[    3.379905] vfio_pci: add [10de:1f82[ffffffff:ffffffff]] class 0x000000/00000000
[    7.240617] vfio-pci 0000:01:00.0: not ready 1023ms after resume; waiting
[    8.323107] vfio-pci 0000:01:00.0: not ready 2047ms after resume; waiting
[   10.508242] vfio-pci 0000:01:00.0: not ready 4095ms after resume; waiting
[   14.858437] vfio-pci 0000:01:00.0: not ready 8191ms after resume; waiting
[   23.308200] vfio-pci 0000:01:00.0: not ready 16383ms after resume; waiting
[   40.197785] vfio-pci 0000:01:00.0: not ready 32767ms after resume; waiting
[   75.019399] vfio-pci 0000:01:00.0: not ready 65535ms after resume; giving up
[   75.019478] vfio-pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
[   75.370216] vfio-pci 0000:01:00.0: nv_msi_ht_cap_quirk_leaf+0x0/0x30 took 146827 usecs
[   75.370286] vfio-pci 0000:01:00.1: Unable to change power state from D3cold to D0, device inaccessible
[   75.570737] vfio-pci 0000:01:00.1: Unable to change power state from D3cold to D0, device inaccessible
[   75.570750] vfio-pci 0000:01:00.1: Unable to change power state from D3cold to D0, device inaccessible
[   75.570804] vfio_pci: add [10de:10fa[ffffffff:ffffffff]] class 0x000000/00000000
[   75.570813] vfio-pci 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible

Maybe weird to have not ready xxxms after resume; waiting

If I assume it's OK when I go the the VM conf I don't have device in the mapped device :
pb1.jpg

but in raw device I have the graphic gpu:

pb2.jpg

But in selecting raw device it doesn't work. :(
 
Yes, it is normal that the card does not appear in mapped device. Give me more details on what is not working when you select the GPU in RAW Device. Also, provide the command. lspci -nnk for the pci input of your graphics card
 
Sure.

my VM conf:
Code:
root@pve:~# cat /etc/pve/qemu-server/101.conf
agent: 1
bios: ovmf
boot: order=ide0
cores: 2
efidisk0: local:101/vm-101-disk-0.qcow2,efitype=4m,pre-enrolled-keys=1,size=528K
hostpci0: 0000:01:00.0,pcie=1,x-vga=1
ide0: local:101/vm-101-disk-1.qcow2,size=52G
machine: pc-q35-7.2
memory: 9000
meta: creation-qemu=7.2.0,ctime=1705441341
name: screen
net0: e1000=36:F3:15:04:F0:E0,bridge=vmbr0,firewall=1
numa: 0
ostype: win11
scsihw: virtio-scsi-single
smbios1: uuid=c3813f08-3999-4871-8a04-1f0b1381c031
sockets: 4
tpmstate0: local:101/vm-101-disk-2.raw,size=4M,version=v2.0
usb0: host=17e9:4324
usb1: host=2575:0401
vga: none
vmgenid: f84bda78-feeb-45bd-ab09-66718aaec3aa

When I started the VM which takes a long time I have the following message:
Code:
TASK ERROR: start failed: command '/usr/bin/kvm -id 101 -name 'screen,debug-threads=on' -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/101.qmp,server=on,wait=off' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/101.pid -daemonize -smbios 'type=1,uuid=c3813f08-3999-4871-8a04-1f0b1381c031' -drive 'if=pflash,unit=0,format=raw,readonly=on,file=/usr/share/pve-edk2-firmware//OVMF_CODE_4M.secboot.fd' -drive 'if=pflash,unit=1,id=drive-efidisk0,format=qcow2,file=/var/lib/vz/images/101/vm-101-disk-0.qcow2' -smp '8,sockets=4,cores=2,maxcpus=8' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga none -nographic -cpu 'kvm64,enforce,hv_ipi,hv_relaxed,hv_reset,hv_runtime,hv_spinlocks=0x1fff,hv_stimer,hv_synic,hv_time,hv_vapic,hv_vendor_id=proxmox,hv_vpindex,kvm=off,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep' -m 9000 -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device 'vmgenid,guid=f84bda78-feeb-45bd-ab09-66718aaec3aa' -device 'qemu-xhci,p2=15,p3=15,id=xhci,bus=pci.1,addr=0x1b' -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' -device 'vfio-pci,host=0000:01:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0' -device 'usb-host,bus=xhci.0,port=1,vendorid=0x17e9,productid=0x4324,id=usb0' -device 'usb-host,bus=xhci.0,port=2,vendorid=0x2575,productid=0x0401,id=usb1' -chardev 'socket,id=tpmchar,path=/var/run/qemu-server/101.swtpm' -tpmdev 'emulator,id=tpmdev,chardev=tpmchar' -device 'tpm-tis,tpmdev=tpmdev' -chardev 'socket,path=/var/run/qemu-server/101.qga,server=on,wait=off,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:86a9738ce211' -drive 'file=/var/lib/vz/images/101/vm-101-disk-1.qcow2,if=none,id=drive-ide0,format=qcow2,cache=none,aio=io_uring,detect-zeroes=on' -device 'ide-hd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap101i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' -device 'e1000,mac=36:F3:15:04:F0:E0,netdev=net0,bus=pci.0,addr=0x12,id=net0' -rtc 'driftfix=slew,base=localtime' -machine 'hpet=off,type=pc-q35-7.2+pve0' -global 'kvm-pit.lost_tick_policy=discard'' failed: got timeout

I don't get the output of the command lspci -nnk
So I reboot the proxmox but same problem, no output
 
Ok it's the problem Unable to change power state from D3cold to D0, device inaccessible This reminds me of a bug reset.
I saw that link:
https://forum.proxmox.com/threads/a...from-d3cold-to-d0-device-inaccessible.130975/
Are you sure vendor-reset does not fix this when you activate it for the GPU? You probably did this correctly, but sometimes people forget that (or newer kernels).
but I don't understand what is means
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!