q35 guest Win10 won't start, AMD Threadripper 3970x

jena

Member
Jul 9, 2020
47
7
13
33
Hello,

I am trying to do GPU pass through on guest#2

CPU: AMD Threadripper 3970x
GPU: NVIDIA 2080 Ti Founders Edition Blower Style
MB: MSI TRX40 Pro Wifi
BIOS is set to be CSM+UEFI and install was successful. IOMMU and SVM set enable
Because booting from Proxmox install flash drive, pure UEFI will stuck after some command line prompts then the display goes fuzzy boken lines. (white background won't show up)


My SeaBIOS Win10 guest (guest#1) works fine, but my OVMF (q35) Win10 guest (guest#2) doesn't start at all.

The OVMF (guest#2) can boot once I change machine to i440fx from q35.
But starting VM with q35 will produce status "Error: start failed: QEMU exited with code 1"
At this time, PCI device is not added in Hardware Tab yet. (PS: adding PCI device also won't start VM, same error message)

I followed the steps described in this post (it has steps, easier to follow) and cross checked with Proxmox official guide.
https://www.reddit.com/r/homelab/comments/b5xpua/the_ultimate_beginners_guide_to_gpu_passthrough/
https://pve.proxmox.com/wiki/Pci_passthrough

I did every steps EXCEPT this optional steps:
(steps that I didn't do)
1. GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction nofb nomodeset video=vesafb:eek:ff,efifb:eek:ff"

2.
A. Disabling the Framebuffer: video=vesafb:eek:ff,efifb:eek:ff
B. ACS Override for IOMMU groups: pcie_acs_override=downstream,multifunction

3.
Step 3: IOMMU interrupt remapping
echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/iommu_unsafe_interrupts.conf
echo "options kvm ignore_msrs=1" > /etc/modprobe.d/kvm.conf
Reason is that my "dmesg | grep 'remapping' output seems to be fine and Proxmox official guide seems to suggest "unsafe interrupts" is not necessary if the output is current.

I DID put all 4 ids of my GPU in there. the first two are GPU and Audio, the later two are NVIDIA USB.
echo "options vfio-pci ids=10de:1e04,10de:10f7,10de:1ad6,10de:1ad7 disable_vga=1"> /etc/modprobe.d/vfio.conf

Code:
~/rom-parser# ./rom-parser /tmp/image.rom
Valid ROM signature found @0h, PCIR offset 170h
        PCIR: type 0 (x86 PC-AT), vendor: 10de, device: 1e04, class: 030000
        PCIR: revision 0, vendor revision: 1
Error, ran off the end

Code:
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="Proxmox Virtual Environment"
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on"
GRUB_CMDLINE_LINUX="root=ZFS=rpool/ROOT/pve-1 boot=zfs"

# Disable os-prober, it might add menu entries for each guest
GRUB_DISABLE_OS_PROBER=true

# Uncomment to enable BadRAM filtering, modify to suit your needs
# This works with Linux (no patch required) and with any kernel that obtains
# the memory map information from GRUB (GNU Mach, kernel of FreeBSD ...)
#GRUB_BADRAM="0x01234567,0xfefefefe,0x89abcdef,0xefefefef"

# Uncomment to disable graphical terminal (grub-pc only)
#GRUB_TERMINAL=console

# The resolution used on graphical terminal
# note that you can use only modes which your graphic card supports via VBE
# you can see them in real GRUB with the command `vbeinfo'
#GRUB_GFXMODE=640x480

# Uncomment if you don't want GRUB to pass "root=UUID=xxx" parameter to Linux
#GRUB_DISABLE_LINUX_UUID=true

# Disable generation of recovery mode menu entries
GRUB_DISABLE_RECOVERY="true"

# Uncomment to get a beep at grub start
#GRUB_INIT_TUNE="480 440 1"

Conf File
Code:
agent: 1
args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'
balloon: 0
bios: ovmf
bootdisk: scsi0
cores: 64
cpu: host,hidden=1,flags=+pcid
efidisk0: local-nvme:vm-500-disk-1,size=1M
ide2: network-proxmox:iso/en_windows_10_consumer_editions_version_1909_x64_dvd_be09950e.iso,media=cdrom
ide3: network-proxmox:iso/virtio-win-0.1.185.iso,media=cdrom,size=402812K
machine: q35
memory: 32768
name: Win10Edu-1909-2080Ti
net0: virtio=3E:80:D5:FD:81:D2,bridge=vmbr0,firewall=1
numa: 1
ostype: win10
scsi0: local-nvme:vm-500-disk-0,cache=writeback,discard=on,size=501G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=51b7e8ff-ae21-45e2-942a-c236379bbd62
sockets: 1
vmgenid: c1d954f9-87b9-47cb-83f4-7a5505137199

Shell
lspci -nn output attached (because it exceed 10000 character limit on this web page)

Code:
dmesg | grep 'remapping'
[    0.851717] AMD-Vi: Interrupt remapping enabled

Code:
dmesg | grep -e DMAR -e IOMMU
[    0.816040] pci 0000:60:00.2: AMD-Vi: IOMMU performance counters supported
[    0.816082] pci 0000:40:00.2: AMD-Vi: IOMMU performance counters supported
[    0.816106] pci 0000:20:00.2: AMD-Vi: IOMMU performance counters supported
[    0.816122] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[    0.851700] pci 0000:60:00.2: AMD-Vi: Found IOMMU cap 0x40
[    0.851706] pci 0000:40:00.2: AMD-Vi: Found IOMMU cap 0x40
[    0.851710] pci 0000:20:00.2: AMD-Vi: Found IOMMU cap 0x40
[    0.851714] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[    0.855517] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[    0.855530] perf/amd_iommu: Detected AMD IOMMU #1 (2 banks, 4 counters/bank).
[    0.855546] perf/amd_iommu: Detected AMD IOMMU #2 (2 banks, 4 counters/bank).
[    0.855560] perf/amd_iommu: Detected AMD IOMMU #3 (2 banks, 4 counters/bank).
 

Attachments

  • lspci.txt
    11.1 KB · Views: 5
  • iommo_groups.txt
    4.8 KB · Views: 3
Last edited:
First of, remove the 'args' line from the config file. It is not necessary, we set all these by default anyway. Secondly, assigning all 64 cores of your CPU to one VM will most likely hurt performance more than it gives. Leave at least 1 or 2 cores to PVE itself.

As to your error: "Error 1" can mean a lot of different things. Try the following, which should print more information:
Code:
# qm showcmd <vmid> --pretty > /tmp/qm.sh
# cat /tmp/qm.sh
# sh /tmp/qm.sh
 
First of, remove the 'args' line from the config file. It is not necessary, we set all these by default anyway. Secondly, assigning all 64 cores of your CPU to one VM will most likely hurt performance more than it gives. Leave at least 1 or 2 cores to PVE itself.

As to your error: "Error 1" can mean a lot of different things. Try the following, which should print more information:
Code:
# qm showcmd <vmid> --pretty > /tmp/qm.sh
# cat /tmp/qm.sh
# sh /tmp/qm.sh

Code:
qm showcmd 500 --pretty > /tmp/qm.sh
root@xxx:~# cat /tmp/qm.sh
/usr/bin/kvm \
  -id 500 \
  -name Win10Edu-1909-2080Ti \
  -chardev 'socket,id=qmp,path=/var/run/qemu-server/500.qmp,server,nowait' \
  -mon 'chardev=qmp,mode=control' \
  -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' \
  -mon 'chardev=qmp-event,mode=control' \
  -pidfile /var/run/qemu-server/500.pid \
  -daemonize \
  -smbios 'type=1,uuid=51b7e8ff-ae21-45e2-942a-c236379bbd62' \
  -drive 'if=pflash,unit=0,format=raw,readonly,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd' \
  -drive 'if=pflash,unit=1,format=raw,id=drive-efidisk0,size=131072,file=/dev/zvol/nvmepool/vm-500-disk-1' \
  -smp '62,sockets=1,cores=62,maxcpus=62' \
  -nodefaults \
  -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
  -vnc unix:/var/run/qemu-server/500.vnc,password \
  -no-hpet \
  -cpu 'host,hv_ipi,hv_relaxed,hv_reset,hv_runtime,hv_spinlocks=0x1fff,hv_stimer,hv_synic,hv_time,hv_vapic,hv_vpindex,kvm=off,+kvm_pv_eoi,+kvm_pv_unhalt,+pcid' \
  -m 32768 \
  -object 'memory-backend-ram,id=ram-node0,size=32768M' \
  -numa 'node,nodeid=0,cpus=0-61,memdev=ram-node0' \
  -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg \
  -device 'vmgenid,guid=c1d954f9-87b9-47cb-83f4-7a5505137199' \
  -device 'usb-tablet,id=tablet,bus=ehci.0,port=1' \
  -device 'VGA,id=vga,bus=pcie.0,addr=0x1' \
  -chardev 'socket,path=/var/run/qemu-server/500.qga,server,nowait,id=qga0' \
  -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' \
  -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' \
  -iscsi 'initiator-name=iqn.1993-08.org.debian:01:baecad992489' \
  -drive 'file=/mnt/pve/network-neoproxmox/template/iso/en_windows_10_consumer_editions_version_1909_x64_dvd_be09950e.iso,if=none,id=drive-ide2,media=cdrom,aio=threads' \
  -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
  -drive 'file=/mnt/pve/network-neoproxmox/template/iso/virtio-win-0.1.185.iso,if=none,id=drive-ide3,media=cdrom,aio=threads' \
  -device 'ide-cd,bus=ide.1,unit=1,drive=drive-ide3,id=ide3,bootindex=201' \
  -device 'virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1' \
  -drive 'file=/dev/zvol/nvmepool/vm-500-disk-0,if=none,id=drive-scsi0,cache=writeback,discard=on,format=raw,aio=threads,detect-zeroes=unmap' \
  -device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,rotation_rate=1,bootindex=100' \
  -netdev 'type=tap,id=net0,ifname=tap500i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
  -device 'virtio-net-pci,mac=3E:80:D5:FD:81:D2,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' \
  -rtc 'driftfix=slew,base=localtime' \
  -machine 'type=q35+pve0' \
  -global 'kvm-pit.lost_tick_policy=discard'

Code:
sh /tmp/qm.sh
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: -device ide-cd,bus=ide.1,unit=1,drive=drive-ide3,id=ide3,bootindex=201: Can't create IDE unit 1, bus supports only 1 units
 
I fixed the previous problem by using SATA CD-ROM instead of IDE.
I was able to install Win10 following the best practice guide.

Now adding GPU passthrough,
It failed to start.

Code:
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
kvm: warning: host doesn't support requested feature: CPUID.01H:ECX.pcid [bit 17]
.... repeated
kvm: -device vfio-pci,host=0000:21:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on: Failed to mmap 0000:21:00.0 BAR 3. Performance may be slow
kvm: -device vfio-pci,host=0000:21:00.0,id=hostpci1.0,bus=ich9-pcie-port-2,addr=0x0.0,multifunction=on: vfio 0000:21:00.0: device is already attached

After re-blacklist and rerun -> echo "options vfio-pci ids=XXXX disable_vga=1"> /etc/modprobe.d/vfio.conf
The display seems to be disabled.
Now error is
Code:
kvm: -device vfio-pci,host=0000:21:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on: vfio 0000:21:00.0: failed to open /dev/vfio/28: No such file or directory
 
Last edited:
I fixed the previous problem by using SATA CD-ROM instead of IDE.
I was able to install Win10 following the best practice guide.
Tried reproducing your IDE issue here, but can't get the error to happen...

Now error is
Make sure the vfio module is correctly loaded (e.g. 'lsmod | grep vfio'). Also post your VM config with the 'hostpci' added please. One thing I can tell you immediately is that you should disable the 'PCID' setting for the CPU. Your Threadripper does not support that instruction.

Also, do you have a second GPU installed or are you attempting single-GPU passthrough? In the latter case, you do need to add nofb nomodeset video=vesafb:off,efifb:off to your kernel commandline (GRUB setting). Verify with 'cat /proc/cmdline'.
 
Tried reproducing your IDE issue here, but can't get the error to happen...


Make sure the vfio module is correctly loaded (e.g. 'lsmod | grep vfio'). Also post your VM config with the 'hostpci' added please. One thing I can tell you immediately is that you should disable the 'PCID' setting for the CPU. Your Threadripper does not support that instruction.

Also, do you have a second GPU installed or are you attempting single-GPU passthrough? In the latter case, you do need to add nofb nomodeset video=vesafb:off,efifb:off to your kernel commandline (GRUB setting). Verify with 'cat /proc/cmdline'.

Latest host config attached (500.txt)

I turned off "pcid" option for CPU.

I am attempting single-GPU passthrough. [Correction: I can still see PVE console on a monitor connected to 2080Ti]
I have a very old Quadro FX580 (not installed) in case I need something. but the slot layout would made things complicated.
Does blocking the use of single GPU also means that I cannot see any display output (including entering BIOS) anymore?

I followed your instruction and added these lines
Code:
# If you change this file, run 'update-grub' afterwards to update
# /boot/grub/grub.cfg.
# For full documentation of the options in this file, see:
#   info -f grub -n 'Simple configuration'

GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="Proxmox Virtual Environment"
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt pcie_acs_override=downstream,multifunction nofb nomodeset video=vesafb:off,efifb:off"
GRUB_CMDLINE_LINUX="root=ZFS=rpool/ROOT/pve-1 boot=zfs"

# Disable os-prober, it might add menu entries for each guest
GRUB_DISABLE_OS_PROBER=true

# Uncomment to enable BadRAM filtering, modify to suit your needs
# This works with Linux (no patch required) and with any kernel that obtains
# the memory map information from GRUB (GNU Mach, kernel of FreeBSD ...)
#GRUB_BADRAM="0x01234567,0xfefefefe,0x89abcdef,0xefefefef"

# Uncomment to disable graphical terminal (grub-pc only)
#GRUB_TERMINAL=console

# The resolution used on graphical terminal
# note that you can use only modes which your graphic card supports via VBE
# you can see them in real GRUB with the command `vbeinfo'
#GRUB_GFXMODE=640x480

# Uncomment if you don't want GRUB to pass "root=UUID=xxx" parameter to Linux
#GRUB_DISABLE_LINUX_UUID=true

# Disable generation of recovery mode menu entries
GRUB_DISABLE_RECOVERY="true"

# Uncomment to get a beep at grub start
#GRUB_INIT_TUNE="480 440 1"

I ran
Code:
cat /proc/cmdline
initrd=\EFI\proxmox\5.4.34-1-pve\initrd.img-5.4.34-1-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs

I ran
Code:
lsmod | grep vfio
I am not sure how to check if vfio module is correctly loaded.

I checked
nano /etc/modules has these lines:
Code:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

Code:
dmesg | grep -e DMAR -e IOMMU
[    0.815757] pci 0000:60:00.2: AMD-Vi: IOMMU performance counters supported
[    0.815796] pci 0000:40:00.2: AMD-Vi: IOMMU performance counters supported
[    0.815819] pci 0000:20:00.2: AMD-Vi: IOMMU performance counters supported
[    0.815836] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[    0.851539] pci 0000:60:00.2: AMD-Vi: Found IOMMU cap 0x40
[    0.851545] pci 0000:40:00.2: AMD-Vi: Found IOMMU cap 0x40
[    0.851549] pci 0000:20:00.2: AMD-Vi: Found IOMMU cap 0x40
[    0.851553] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[    0.855435] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[    0.855449] perf/amd_iommu: Detected AMD IOMMU #1 (2 banks, 4 counters/bank).
[    0.855463] perf/amd_iommu: Detected AMD IOMMU #2 (2 banks, 4 counters/bank).
[    0.855476] perf/amd_iommu: Detected AMD IOMMU #3 (2 banks, 4 counters/bank).

I rebooted the computer and run the VM again
still the same error
Code:
sh /tmp/qm.sh
kvm: -device vfio-pci,host=0000:21:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on: vfio 0000:21:00.0: failed to open /dev/vfio/28: No such file or directory

The card is PNY RTX2080Ti Founders Edition Blower Style
Code:
root@mars:~# cd /sys/bus/pci/devices/0000:21:00.0
root@mars:/sys/bus/pci/devices/0000:21:00.0# echo 1 > rom
root@mars:/sys/bus/pci/devices/0000:21:00.0# cat rom > /usr/share/kvm/rtx2080ti.bin
root@mars:/sys/bus/pci/devices/0000:21:00.0# echo 0 > rom

cd rom-parser
./rom-parser /usr/share/kvm/rtx2080ti.bin
Valid ROM signature found @0h, PCIR offset 170h
        PCIR: type 0 (x86 PC-AT), vendor: 10de, device: 1e04, class: 030000
        PCIR: revision 0, vendor revision: 1
Error, ran off the end
 

Attachments

  • 500.txt
    1.5 KB · Views: 1
Last edited:
I followed your instruction and added these lines
You're booting from UEFI. This means you need to set your command line not in /etc/default/grub, but in /etc/kernel/cmdline. Just one single line with your arguments, i.e. a file containing only quiet amd_iommu=on iommu=pt nofb nomodeset video=vesafb:off,efifb:off. Then run pve-efiboot-tool refresh. See here for more info.

Also, you probably don't need or want the 'pcie_acs_override=downstream,multifunction' in there.

I am not sure how to check if vfio module is correctly loaded.
Did the lsmod command print something? If not, it's not loaded. Might have to do with your kernel commandline being wrong though.

Does blocking the use of single GPU also means that I cannot see any display output (including entering BIOS) anymore?
BIOS will be fine, but PVE will be unable to display anything. There are ways to get that working too (at least when the VM is not running), but that's rather complex and requires messing with manual driver loading/binding etc...
 
You're booting from UEFI. This means you need to set your command line not in /etc/default/grub, but in /etc/kernel/cmdline. Just one single line with your arguments, i.e. a file containing only quiet amd_iommu=on iommu=pt nofb nomodeset video=vesafb:off,efifb:off. Then run pve-efiboot-tool refresh. See here for more info.

Also, you probably don't need or want the 'pcie_acs_override=downstream,multifunction' in there.


Did the lsmod command print something? If not, it's not loaded. Might have to do with your kernel commandline being wrong though.


BIOS will be fine, but PVE will be unable to display anything. There are ways to get that working too (at least when the VM is not running), but that's rather complex and requires messing with manual driver loading/binding etc...

My host system is booted in CSM+UEFI mode. The pure UEFI mode will give to crackling display output when I installed PVE.

I put in kernel/cmdline, in one line
Code:
root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet amd_iommu=on iommu=pt nofb nomodeset video=vesafb:off,efifb:off

I deleted the extra stuff in grub and return it to original
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on"
updated both. rebooted.

lsmod | grep vfio still prints out nothing

I read this "Note that in the 5.4 based kernel (will be used for Proxmox VE 6.2 in Q2/2020) some of those modules are already built into the kernel directly." https://pve.proxmox.com/wiki/Pci_passthrough
I installed the latest version (6.2-4). don't know if that has anything to do with vfio not loading

Code:
full printout of lsmod
lsmod
Module                  Size  Used by
tcp_diag               16384  0
inet_diag              24576  1 tcp_diag
md4                    16384  0
cmac                   16384  1
nls_utf8               16384  2
cifs                 1015808  2
fscache               372736  1 cifs
libdes                 24576  1 cifs
ebtable_filter         16384  0
ebtables               36864  1 ebtable_filter
ip_set                 53248  0
ip6table_raw           16384  0
iptable_raw            16384  0
ip6table_filter        16384  0
ip6_tables             32768  2 ip6table_filter,ip6table_raw
iptable_filter         16384  0
bpfilter               32768  0
softdog                16384  2
nfnetlink_log          20480  1
nfnetlink              16384  3 ip_set,nfnetlink_log
snd_hda_codec_hdmi     61440  1
edac_mce_amd           32768  0
kvm_amd                98304  0
kvm                   659456  1 kvm_amd
crct10dif_pclmul       16384  1
crc32_pclmul           16384  0
ghash_clmulni_intel    16384  0
iwlmvm                376832  0
mac80211              843776  1 iwlmvm
snd_usb_audio         258048  0
aesni_intel           372736  1
libarc4                16384  2 cifs,mac80211
btusb                  57344  0
snd_hda_intel          53248  0
btrtl                  20480  1 btusb
crypto_simd            16384  1 aesni_intel
btbcm                  16384  1 btusb
snd_intel_dspcfg       24576  1 snd_hda_intel
snd_usbmidi_lib        36864  1 snd_usb_audio
cryptd                 24576  2 crypto_simd,ghash_clmulni_intel
input_leds             16384  0
glue_helper            16384  1 aesni_intel
btintel                24576  1 btusb
snd_hda_codec         131072  2 snd_hda_codec_hdmi,snd_hda_intel
snd_rawmidi            36864  1 snd_usbmidi_lib
wmi_bmof               16384  0
pcspkr                 16384  0
snd_seq_device         16384  1 snd_rawmidi
iwlwifi               331776  1 iwlmvm
snd_hda_core           90112  3 snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec
bluetooth             577536  5 btrtl,btintel,btbcm,btusb
joydev                 24576  0
mc                     53248  1 snd_usb_audio
snd_hwdep              20480  2 snd_usb_audio,snd_hda_codec
ecdh_generic           16384  1 bluetooth
snd_pcm               102400  5 snd_hda_codec_hdmi,snd_hda_intel,snd_usb_audio,snd_hda_codec,snd_hda_core
ecc                    32768  1 ecdh_generic
ucsi_ccg               20480  0
snd_timer              36864  1 snd_pcm
cfg80211              704512  3 iwlmvm,iwlwifi,mac80211
typec_ucsi             40960  1 ucsi_ccg
snd                    86016  10 snd_seq_device,snd_hda_codec_hdmi,snd_hwdep,snd_hda_intel,snd_usb_audio,snd_usbmidi_lib,snd_hda_codec,snd_timer,snd_pcm,snd_rawmidi
typec                  45056  1 typec_ucsi
soundcore              16384  1 snd
ccp                    86016  1 kvm_amd
k10temp                16384  0
mxm_wmi                16384  0
mac_hid                16384  0
vhost_net              32768  0
vhost                  49152  1 vhost_net
tap                    24576  1 vhost_net
ib_iser                53248  0
rdma_cm                61440  1 ib_iser
iw_cm                  49152  1 rdma_cm
ib_cm                  57344  1 rdma_cm
ib_core               311296  4 rdma_cm,iw_cm,ib_iser,ib_cm
iscsi_tcp              24576  0
libiscsi_tcp           32768  1 iscsi_tcp
libiscsi               57344  3 libiscsi_tcp,iscsi_tcp,ib_iser
scsi_transport_iscsi   110592  5 libiscsi_tcp,iscsi_tcp,ib_iser,libiscsi
sunrpc                393216  1
adm1021                20480  0
ip_tables              28672  2 iptable_filter,iptable_raw
x_tables               45056  7 ebtables,ip6table_filter,ip6table_raw,iptable_filter,ip6_tables,iptable_raw,ip_tables
autofs4                45056  2
zfs                  3891200  12
zunicode              331776  1 zfs
zlua                  143360  1 zfs
zavl                   16384  1 zfs
icp                   278528  1 zfs
zcommon                86016  2 zfs,icp
znvpair                81920  2 zfs,zcommon
spl                   110592  5 zfs,icp,znvpair,zcommon,zavl
btrfs                1241088  0
xor                    24576  1 btrfs
zstd_compress         155648  1 btrfs
raid6_pq              114688  1 btrfs
libcrc32c              16384  1 btrfs
hid_logitech_hidpp     40960  0
hid_logitech_dj        24576  0
hid_generic            16384  0
usbmouse               16384  0
usbkbd                 16384  0
usbhid                 57344  1 hid_logitech_dj
hid                   131072  4 usbhid,hid_generic,hid_logitech_dj,hid_logitech_hidpp
ahci                   40960  2
igb                   221184  0
libahci                32768  1 ahci
i2c_algo_bit           16384  1 igb
dca                    16384  1 igb
i2c_nvidia_gpu         16384  0
i2c_piix4              28672  0
wmi                    32768  2 wmi_bmof,mxm_wmi
 
Last edited:
I think the problem is still somehow vfio module is not loaded.

/etc/modules has always been like this since I started following the tutorial
Code:
# /etc/modules: kernel modules to load at boot time.
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

Now I also tried this method
https://lunar.computer/posts/gpu-passthrough-proxmox-60/

Code:
$ sudo echo "vfio" > \
  /etc/modules-load.d/vfio.conf
$ sudo echo "vfio_iommu_type1" >> \
  /etc/modules-load.d/vfio.conf
$ sudo echo "vfio_pci" >> \
  /etc/modules-load.d/vfio.conf
$ sudo echo "vfio_virqfd" >> \
  /etc/modules-load.d/vfio.conf
$ sudo echo "options vfio-pci ids=10de:1e04,10de:10f7" > \
  /etc/modprobe.d/vfio.conf

update-initramfs -u -k all

Code:
cat /proc/cmdline
initrd=\EFI\proxmox\5.4.34-1-pve\initrd.img-5.4.34-1-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet amd_iommu=on iommu=pt nofb nomodeset video=vesafb:off,efifb:off

lsmod | grep vfio still prints out nothing
 
Last edited:
I read this "Note that in the 5.4 based kernel (will be used for Proxmox VE 6.2 in Q2/2020) some of those modules are already built into the kernel directly." https://pve.proxmox.com/wiki/Pci_passthrough
That was true for a bit, but on the latest versions we went back to building them as modules. Maybe update your system with 'apt dist-upgrade' to be sure you're on the latest version.

Also, what happens when you load the module manually, i.e. with modprobe vfio? Does the passthrough still not work?
 
That was true for a bit, but on the latest versions we went back to building them as modules. Maybe update your system with 'apt dist-upgrade' to be sure you're on the latest version.

Also, what happens when you load the module manually, i.e. with modprobe vfio? Does the passthrough still not work?

did apt dist-upgrade and rebooted.
lsmod | grep vfio still prints out nothing.
modprobe vfio prints out nothing.

Still shows
Code:
sh /tmp/qm.sh
kvm: -device vfio-pci,host=0000:21:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on: vfio 0000:21:00.0: failed to open /dev/vfio/28: No such file or directory
 
Hm, I'm honestly at a loss then... Maybe just try some other options (e.g. multifunction on/off, etc...) or try passing through a different device. Try different PCIe slots as well.

Can you do 'modinfo vfio' successfully?
 
Hm, I'm honestly at a loss then... Maybe just try some other options (e.g. multifunction on/off, etc...) or try passing through a different device. Try different PCIe slots as well.

Can you do 'modinfo vfio' successfully?

Code:
modinfo vfio
modinfo: ERROR: Module vfio not found.

The fact that both lsmod | grep vfio and modprobe vfio prints out nothing suggests that the vfio module is not loaded. I don't think it's related to a specific VM.
So we need to figure out why vfio module is not loaded.

/etc/modules has always been like this since I started following the tutorial
Code:
# /etc/modules: kernel modules to load at boot time.
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

Now I also tried this method
https://lunar.computer/posts/gpu-passthrough-proxmox-60/

Code:
$ sudo echo "vfio" > \
  /etc/modules-load.d/vfio.conf
$ sudo echo "vfio_iommu_type1" >> \
  /etc/modules-load.d/vfio.conf
$ sudo echo "vfio_pci" >> \
  /etc/modules-load.d/vfio.conf
$ sudo echo "vfio_virqfd" >> \
  /etc/modules-load.d/vfio.conf
$ sudo echo "options vfio-pci ids=10de:1e04,10de:10f7" > \
  /etc/modprobe.d/vfio.conf


I saw this when Proxmox boots.

Error_20200825.JPG
 
Last edited:
I saw this when Proxmox boots.
This is normal on certain AMD platforms.

modinfo vfio
modinfo: ERROR: Module vfio not found.
That means that the vfio module is not available in your installed kernel. Are you sure you've rebooted? Have you done anything funky with your kernel? 'pveversion -v' says?
 
This is normal on certain AMD platforms.


That means that the vfio module is not available in your installed kernel. Are you sure you've rebooted? Have you done anything funky with your kernel? 'pveversion -v' says?

Yes. I reboot every time I made change to either /etc/modules or /etc/default/grub, or /etc/modules-load.d/vfio.conf of course after update-initramfs -u -k all or update-grub

Code:
update-initramfs -u -k all
update-initramfs: Generating /boot/initrd.img-5.4.34-1-pve
Running hook script 'zz-pve-efiboot'..
Re-executing '/etc/kernel/postinst.d/zz-pve-efiboot' in new private mount namespace..
Copying and configuring kernels on /dev/disk/by-uuid/B53C-EB08
        Copying kernel and creating boot-entry for 5.4.34-1-pve
Copying and configuring kernels on /dev/disk/by-uuid/B53D-235B
        Copying kernel and creating boot-entry for 5.4.34-1-pve


Code:
pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.34-1-pve)
pve-manager: 6.2-4 (running version: 6.2-4/9824574a)
pve-kernel-5.4: 6.2-1
pve-kernel-helper: 6.2-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.3
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-2
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve2
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-1
pve-cluster: 6.1-8
pve-container: 3.1-5
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-2
pve-qemu-kvm: 5.0.0-2
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1


This shows that I have those iommu and nofb lines in there.
Code:
cat /proc/cmdline
initrd=\EFI\proxmox\5.4.34-1-pve\initrd.img-5.4.34-1-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet amd_iommu=on iommu=pt nofb nomodeset video=vesafb:off,efifb:off
 
Last edited:
proxmox-ve: 6.2-1 (running kernel: 5.4.34-1-pve)
Even our enterprise repo already ships with 5.4.44 ... Your system is obviously not up to date. Please make sure you run a successful 'apt update' and 'apt dist-upgrade' and do a full reboot. Which apt sources do you have configured?
 
Even our enterprise repo already ships with 5.4.44 ... Your system is obviously not up to date. Please make sure you run a successful 'apt update' and 'apt dist-upgrade' and do a full reboot. Which apt sources do you have configured?
I didn't know that I need to edit apt source.

PS: I am evaluating Proxmox.
Once I think Proxmox suits our need (a research lab at University), our lab will get subscription for more stable updates and support.

updated to
Code:
pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.55-1-pve)
pve-manager: 6.2-11 (running version: 6.2-11/22fb4983)
pve-kernel-5.4: 6.2-5
pve-kernel-helper: 6.2-5
pve-kernel-5.4.55-1-pve: 5.4.55-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-2
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-5
libpve-guest-common-perl: 3.1-2
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-6
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-10
pve-cluster: 6.1-8
pve-container: 3.1-12
pve-docs: 6.2-5
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-2
pve-ha-manager: 3.0-9
pve-i18n: 2.1-3
pve-qemu-kvm: 5.0.0-12
pve-xtermjs: 4.7.0-1
qemu-server: 6.2-11
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.4-pve1

As soon as I add to /etc/modules/
Code:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

OR add
Code:
echo "vfio" > \
  /etc/modules-load.d/vfio.conf
echo "vfio_iommu_type1" >> \
  /etc/modules-load.d/vfio.conf
echo "vfio_pci" >> \
  /etc/modules-load.d/vfio.conf
echo "vfio_virqfd" >> \
  /etc/modules-load.d/vfio.conf
echo "options vfio-pci ids=10de:1e04,10de:10f7" > \
  /etc/modprobe.d/vfio.conf
and then update-initramfs -u -k all will cause boot to stuck at "Reading all physical volumes..." when boot to 5.4.55-1-pve.
I have to boot to 5.4.34-1-pve, remove those lines and then was able to boot to 5.4.55-1-pve.

Both [B]lsmod | grep vfio[/B] and [B]modprobe vfio[/B] still prints out nothing.
The GPU pass through VM still doesn't start
Code:
-device vfio-pci,host=0000:21:00.0,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,multifunction=on: vfio 0000:21:00.0: failed to open /dev/vfio/28: No such file or directory

modinfo vfio now prints
Code:
modinfo vfio
filename:       /lib/modules/5.4.55-1-pve/kernel/drivers/vfio/vfio.ko
softdep:        post: vfio_iommu_type1 vfio_iommu_spapr_tce
alias:          devname:vfio/vfio
alias:          char-major-10-196
description:    VFIO - User Level meta-driver
author:         Alex Williamson <alex.williamson@redhat.com>
license:        GPL v2
version:        0.3
srcversion:     ECAC5D90BAEDD386CFDF593
depends:      
retpoline:      Y
intree:         Y
name:           vfio
vermagic:       5.4.55-1-pve SMP mod_unload modversions
parm:           enable_unsafe_noiommu_mode:Enable UNSAFE, no-IOMMU mode.  This mode provides no device isolation, no DMA translation, no host kernel protection, cannot be used for device assignment to virtual machines, requires RAWIO permissions, and will taint the kernel.  If you do not know what this is for, step away. (default: false) (bool)

The problem now is system boot stuck if I put
vfio_iommu_type1
vfio_pci
vfio_virqfd
But if I don't, passthrough won't work.
 
and then update-initramfs -u -k all will cause boot to stuck at "Reading all physical volumes..." when boot to 5.4.55-1-pve.
Well, you only have one physical GPU installed, correct? In that case, this actually means it's working. The GPU was reserved for passthrough, meaning the host will not be able to use it anymore. The "Reading all physical volumes..." is typically the last thing printed via the UEFI frame buffer before the GPU driver would kick in.

Try adding the configs and when it's "stuck" on that message, wait a bit and see if your PVE appears via the network.
 
You are right. I can still get to web console.

lsmod | grep vfio
Code:
vfio_pci               49152  0
vfio_virqfd            16384  1 vfio_pci
irqbypass              16384  2 vfio_pci,kvm
vfio_iommu_type1       32768  0
vfio                   32768  2 vfio_iommu_type1,vfio_pci

modprobe vfio still prints out nothing.

/etc/kernel/cmdline with pcie_acs_override=downstream
Code:
root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet amd_iommu=on iommu=pt pcie_acs_override=downstream nofb nomodeset video=vesafb:off,efifb:off

Code:
 lspci -nnk | grep "NVIDIA"
21:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] [10de:1e04] (rev a1)
21:00.1 Audio device [0403]: NVIDIA Corporation TU102 High Definition Audio Controller [10de:10f7] (rev a1)
21:00.2 USB controller [0c03]: NVIDIA Corporation TU102 USB 3.1 Controller [10de:1ad6] (rev a1)
21:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 UCSI Controller [10de:1ad7] (rev a1)

/etc/modprobe.d/vfio.conf (all four parts are added)
Code:
options vfio-pci ids=10de:1e04,10de:10f7,10de:1ad6,10de:1ad7 disable_vga=1

/etc/modules
Code:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
# Chip drivers
adm1021

Problem:
I already passed all four parts of the group 28 (attached vm config). but it says: group 28 is not viable
Code:
kvm: -device vfio-pci,host=0000:21:00.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0: vfio 0000:21:00.0: group 28 is not viable
Please ensure all devices within the iommu_group are bound to their vfio bus driver.

Code:
IOMMU Group 28:
        21:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] [10de:1e04] (rev a1)
        21:00.1 Audio device [0403]: NVIDIA Corporation TU102 High Definition Audio Controller [10de:10f7] (rev a1)
        21:00.2 USB controller [0c03]: NVIDIA Corporation TU102 USB 3.1 Controller [10de:1ad6] (rev a1)
        21:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU102 UCSI Controller [10de:1ad7] (rev a1)
 

Attachments

  • 20200830_500.txt
    1.6 KB · Views: 2
  • 20200830_iommu_full.txt
    12.5 KB · Views: 1
  • 20200830_grub.txt
    1.3 KB · Views: 0
Last edited:
/etc/kernel/cmdline with pcie_acs_override=downstream
Try without the pcie_acs_override and list the IOMMU groups then. Also, you have the 'hostpci' device configured 4 times in your config, one time is enough if you enable "All functions" in the GUI.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!