Problems with GPU passthrough

kvickis

New Member
Aug 20, 2022
14
0
1
Hi!
I have trouble getting my nVidia RTX 3060 GPU card passed through to a VM.
I have tried passing it through both to a Linux Mint 20.3 VM and a Windows 10 VM, both with the same result: "Error: Start failed: QEMU exited with code 1".
I am running Proxmox 7.2-7 on a desktop PC from Acer - Nitro N50-620 that has an Intel Core i5-11400F CPU and 48GB of RAM.
The graphics card is an nVidia RTX3060 v2 12G from Asus.
I have followed the instructions on the wiki to isolate the GPU from the host, enabled IOMMU and so forth.
I have added the card as a PCI-device in the respective VM's hardware setup, making sure to use the correct device.
I am not trying to run more than one VM using this at once.

Both attempts (linux and WIndows) fails with the same error. (Both VM's run OK if I remove the PCI-device)

Here is what I get from running the command "journalctl -f" right before I attempt to start the VM:

Code:
-- Journal begins at Wed 2022-03-09 23:07:25 CET. --
Aug 20 17:26:23 pve pveproxy[1230]: worker 10796 started
Aug 20 17:36:19 pve pvedaemon[1224]: <root@pam> successful auth for user 'root@pam'
Aug 20 17:37:18 pve pvedaemon[1223]: worker exit
Aug 20 17:37:18 pve pvedaemon[1221]: worker 1223 finished
Aug 20 17:37:18 pve pvedaemon[1221]: starting 1 worker(s)
Aug 20 17:37:18 pve pvedaemon[1221]: worker 12456 started
Aug 20 17:42:14 pve pvedaemon[1222]: worker exit
Aug 20 17:42:14 pve pvedaemon[1221]: worker 1222 finished
Aug 20 17:42:14 pve pvedaemon[1221]: starting 1 worker(s)
Aug 20 17:42:14 pve pvedaemon[1221]: worker 13281 started
Aug 20 17:45:00 pve pvedaemon[1224]: <root@pam> starting task UPID:pve:0000358B:00071065:6301017C:qmstart:108:root@pam:
Aug 20 17:45:00 pve pvedaemon[13707]: start VM 108: UPID:pve:0000358B:00071065:6301017C:qmstart:108:root@pam:
Aug 20 17:45:00 pve systemd[1]: Started 108.scope.
Aug 20 17:45:01 pve systemd-udevd[13747]: Using default interface naming scheme 'v247'.
Aug 20 17:45:01 pve systemd-udevd[13747]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Aug 20 17:45:01 pve kernel: device tap108i0 entered promiscuous mode
Aug 20 17:45:01 pve systemd-udevd[13747]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Aug 20 17:45:01 pve systemd-udevd[13747]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Aug 20 17:45:01 pve systemd-udevd[13746]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Aug 20 17:45:01 pve systemd-udevd[13746]: Using default interface naming scheme 'v247'.
Aug 20 17:45:01 pve kernel: vmbr0: port 3(fwpr108p0) entered blocking state
Aug 20 17:45:01 pve kernel: vmbr0: port 3(fwpr108p0) entered disabled state
Aug 20 17:45:01 pve kernel: device fwpr108p0 entered promiscuous mode
Aug 20 17:45:01 pve kernel: vmbr0: port 3(fwpr108p0) entered blocking state
Aug 20 17:45:01 pve kernel: vmbr0: port 3(fwpr108p0) entered forwarding state
Aug 20 17:45:01 pve kernel: fwbr108i0: port 1(fwln108i0) entered blocking state
Aug 20 17:45:01 pve kernel: fwbr108i0: port 1(fwln108i0) entered disabled state
Aug 20 17:45:01 pve kernel: device fwln108i0 entered promiscuous mode
Aug 20 17:45:01 pve kernel: fwbr108i0: port 1(fwln108i0) entered blocking state
Aug 20 17:45:01 pve kernel: fwbr108i0: port 1(fwln108i0) entered forwarding state
Aug 20 17:45:01 pve kernel: fwbr108i0: port 2(tap108i0) entered blocking state
Aug 20 17:45:01 pve kernel: fwbr108i0: port 2(tap108i0) entered disabled state
Aug 20 17:45:01 pve kernel: fwbr108i0: port 2(tap108i0) entered blocking state
Aug 20 17:45:01 pve kernel: fwbr108i0: port 2(tap108i0) entered forwarding state
Aug 20 17:45:01 pve kernel: vfio-pci 0000:01:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
Aug 20 17:45:01 pve kernel: vfio-pci 0000:01:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
Aug 20 17:45:01 pve kernel: vfio-pci 0000:01:00.0: vfio_ecap_init: hiding ecap 0x26@0xc1c
Aug 20 17:45:01 pve kernel: vfio-pci 0000:01:00.0: vfio_ecap_init: hiding ecap 0x27@0xd00
Aug 20 17:45:01 pve kernel: vfio-pci 0000:01:00.0: vfio_ecap_init: hiding ecap 0x25@0xe00
Aug 20 17:45:01 pve kernel: fwbr108i0: port 2(tap108i0) entered disabled state
Aug 20 17:45:02 pve kernel: fwbr108i0: port 1(fwln108i0) entered disabled state
Aug 20 17:45:02 pve kernel: vmbr0: port 3(fwpr108p0) entered disabled state
Aug 20 17:45:02 pve kernel: device fwln108i0 left promiscuous mode
Aug 20 17:45:02 pve kernel: fwbr108i0: port 1(fwln108i0) entered disabled state
Aug 20 17:45:02 pve kernel: device fwpr108p0 left promiscuous mode
Aug 20 17:45:02 pve kernel: vmbr0: port 3(fwpr108p0) entered disabled state
Aug 20 17:45:02 pve pvedaemon[1224]: VM 108 qmp command failed - VM 108 not running
Aug 20 17:45:02 pve pvedaemon[13707]: start failed: QEMU exited with code 1
Aug 20 17:45:02 pve pvedaemon[1224]: <root@pam> end task UPID:pve:0000358B:00071065:6301017C:qmstart:108:root@pam: start failed: QEMU exited with code 1
Aug 20 17:45:02 pve systemd[1]: 108.scope: Succeeded.

Here is the configuration file from my Linux Mint VM:

Code:
agent: 1
audio0: device=ich9-intel-hda,driver=spice
boot: order=scsi0;net0
cores: 6
hostpci0: 0000:01:00,pcie=1,x-vga=1
machine: q35
memory: 4096
meta: creation-qemu=6.1.1,ctime=1649444193
name: LinuxMint20.3
net0: virtio=2E:46:5D:E0:7D:2F,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: local-lvm:vm-108-disk-0,size=50G
scsihw: virtio-scsi-pci
smbios1: uuid=6cfcaf4d-1537-4a8c-bf0f-bb23d5c00464
sockets: 1
spice_enhancements: foldersharing=1,videostreaming=filter
vga: qxl,memory=64
vmgenid: 14f84aa9-18d0-40a0-840c-0b4809e46c94

Can someone help me make sense out of this, and point me in the right direction to get this to work or at least give some hints on how to troubleshoot? I am stuck.
 
Just an additional bit of information: By changing to UEFI BIOS in the Linux Mint VM, I managed to start the VM with the GPU passed through. However, I cannot attach to it. It seems that the networking interface stops working or something, because it is no longer possible to ping the machine, let alone attaching to it using RDP.
 
Was there no more information in the Task log than Error: Start failed: QEMU exited with code 1? Often it's because you assign the VM more memory than is available at the time.
Are you using the same GPU for booting the Proxmox host? What is your hardware? What do your IOMMU groups (without pcie_acs_override) look like?
Just an additional bit of information: By changing to UEFI BIOS in the Linux Mint VM, I managed to start the VM with the GPU passed through. However, I cannot attach to it. It seems that the networking interface stops working or something, because it is no longer possible to ping the machine, let alone attaching to it using RDP.
Did you actually get output on a physical display connected to the GPU? Probably, the VM network does not stop but the VM freezes. You don't have qxl/Spice anymore because you enabled Primary GPU (,x-vga=1). Or did the whole Proxmox host become unreachable?
 
Thanks leesteken to taking your time to try to deal with my problem. As I am a newbie as far as proxmox goes, I don't know my way around it very well, except for what I see in the GUI. So to your question about the task log: All I see is this single line at the bottom of the main GUI Summary screen. Perhaps there is away to look at an actual log-file? I wouldn't know where to look for it though.

As I have 48GB of RAM and have allocated 8GB of RAM to this VM, and just one other VM (using 8GB) is started, I don't think RAM would be a problem.

The GPU is not used for booting the Proxmox host. I guess that all that fiddling with GRUB, configuration files etc would serve the purpose of preventing Proxmox to use the GPU for itself, right? However, if I connect the DP out of the GPU to a monitor, I would occationally see a blinking underline character on the screen, indicating that Proxmox is indeed displaying something. Or? However, it is not displaying any text.

I have 16 IOMMU groups. The nVidia card is using two of them, one for graphics and one for audio. Nobody else uses the same groups.

No, the whole Proxmox host did not become unreachable, just the VM in question (Linux Mint).
 
Thanks leesteken to taking your time to try to deal with my problem. As I am a newbie as far as proxmox goes, I don't know my way around it very well, except for what I see in the GUI. So to your question about the task log: All I see is this single line at the bottom of the main GUI Summary screen. Perhaps there is away to look at an actual log-file? I wouldn't know where to look for it though.
You can double-click tasks in the bottom of the web GUI, sometimes they show more information.
There is the Syslog (under System) for your node/host in the web GUI. You can also login to the Proxmox host console or use SSH to connect and run journalctl. Both the Syslog and journalctl display the same information and you'll need to look at the time when you tried starting the VM.
As I have 48GB of RAM and have allocated 8GB of RAM to this VM, and just one other VM (using 8GB) is started, I don't think RAM would be a problem.
If a VM with passthrough cannot start due to a time-out, it's usually that it cannot gather enough continuous free memory. It's probably not your problem.
However, if I connect the DP out of the GPU to a monitor, I would occationally see a blinking underline character on the screen, indicating that Proxmox is indeed displaying something. Or? However, it is not displaying any text.
Sounds like the VM is freezing/stuck after starting, which it probably because the GPU is used during boot (see below).
I have 16 IOMMU groups. The nVidia card is using two of them, one for graphics and one for audio. Nobody else uses the same groups.
Are you using pcie_acs_override (because that breaks the IOMMU grouping)? But it's probably fine.
The GPU is not used for booting the Proxmox host. I guess that all that fiddling with GRUB, configuration files etc would serve the purpose of preventing Proxmox to use the GPU for itself, right?
Sorry, I seem to gave missed the hardware information in your first post. Your system only has a single GPU and therefore it is used for boot.
You can check if this is the problem by looking for BOOTFB in the output of the command cat /proc/iomem and/or BAR cannot reserve memory errors in the Syslog when starting the VM.
If you are running the latest Proxmox with kernel 5.15, you need the kernel parameter initcall_blacklist=sysfb_init as discovered in this thread.
 
Thanks for all the additional information! I will look into it later today. In the mean time, I rebooted the PC with a monitor attached and this is what I found:

1. Grub displays its menu, and I choose "Start Proxmox"
2. Proxmox displays three lines of text related to a volume of the file system. It stays there until I
3. I start the VM using the GPU at which point the display goes blank (indicating to me that the VM now takes over the GPU)
4. After 10 seconds or so, the display goes active again, this time only displaying a blinking underscore character. I have no clue to what this means.

The VM is not reachable.
"# nmap -Pn <ip.address.of.VM>" gives
"Host is up (0.00022s latency).
All 1000 scanned ports on linuxmint.lan <ip-address> are filtered.

Will get back later this afternooon (swedish time)
 
OK, I looked at the "discovered in this thread" and made the changes to GRUB. Now the system behaves differently:

1. The host now stops displaying anything after the initial message about loading boot image.
2. When starting the VM, the display first goes blank and then displays the boot loader from the VM!
3. The boot loader of the VM complains about not being able to load the EFI disk and asks what to do. As there is no way to communicate with the boot-loader at this point, it goes on to try IPX boot and HTTP-boot over IP, both of which fails of course. I am then dropped to a Shell (of the boot-loader, not the the actual Linux Mint shell). It stops there.

I have seen this problem before, and it is related to using the UEFI boot. So I tried using the SEAbios instead. That doesnt display anything on the monitor, on the other hand, I don't think any PCIe is passed through and the VM is not usable.

I also tried to remove the EFI file and create a new on a different volume. No change.
Still stuck, but on a different level...
 
OK, I looked at the "discovered in this thread" and made the changes to GRUB. Now the system behaves differently:

1. The host now stops displaying anything after the initial message about loading boot image.
This is normal, and what you need for single GPU passthrough. If you don't want this make the system boot with another GPU or use a modern AMD GPU (no guarantees).
You didn't get back on checking for BOOTFB issues or error messages but this settles the original problem: booting with the passthrough NVidia GPU on kernel 5.15+.
2. When starting the VM, the display first goes blank and then displays the boot loader from the VM!
This is good. GPU passthrough is working: the device is available inside the VM. No changes to the Proxmox host are necessary.
3. The boot loader of the VM complains about not being able to load the EFI disk and asks what to do. As there is no way to communicate with the boot-loader at this point, it goes on to try IPX boot and HTTP-boot over IP, both of which fails of course. I am then dropped to a Shell (of the boot-loader, not the the actual Linux Mint shell). It stops there.

I have seen this problem before, and it is related to using the UEFI boot. So I tried using the SEAbios instead. That doesnt display anything on the monitor, on the other hand, I don't think any PCIe is passed through and the VM is not usable.

I also tried to remove the EFI file and create a new on a different volume. No change.
Still stuck, but on a different level...
You probably installed the VM with SeaBIOS, but passthrough works better with OVMF (EFI). Try reinstalling the Linux Mint on the VM with OVMF (with an EFI disk) or run from the installation ISO. You may want to passthrough USB ports or a USB controller (like the GPU) to have access to input devices.
 
You probably installed the VM with SeaBIOS, but passthrough works better with OVMF (EFI). Try reinstalling the Linux Mint on the VM with OVMF (with an EFI disk) or run from the installation ISO. You may want to passthrough USB ports or a USB controller (like the GPU) to have access to input devices.

Sorry for my ignorance here, but what do you mean by "installed the VM with SeaBIOS"? Does it matter which BIOS I use when installing the guest operating system? Are there different versions of the guest operating systems made for SeaBIOS and OVMF (EFI)? Can I not change the BIOS type after a VM has been installed?

I looked at all my VMs and they all use SeaBIOS. Trying to change to OVMF and adding an EFI disk make them ALL fail. This is what is displayed when trying to boot them up:

Code:
Guest has not initialized display (yet)

BdsDxe: failed to load Boot0003 "UEFI QEMU QEMU HARDDISK " from PciRoot (0x0)/Pci (0x5,  0x0)/Scsi(0x0,0x0)

Double-clicking the error-message in the task-bar on one of the lines related to Linux Mint gave the following output:

VM 108 qmp command 'set_password' failed - Could not set password TASK ERROR: Failed to run vncproxy.

It seems that something is wrong whenever I try to run OVMF BIOS. This is not at all related to the passthrough of the GPU. Maybe I need to post a different issue about this?
 
Sorry for my ignorance here, but what do you mean by "installed the VM with SeaBIOS"? Does it matter which BIOS I use when installing the guest operating system? Are there different versions of the guest operating systems made for SeaBIOS and OVMF (EFI)? Can I not change the BIOS type after a VM has been installed?
The boot process inside the VM is different when using a SeaBIOS (Legacy boot) or OVMF (EFI). Just like the boot process of a real motherboard is different with and without EFI. Windows really does not not like switching between them, but also the boot of Linux system often works either for SeaBIOS or for OVMF but not both.
You can repair a Linux boot process with a Linux Live/Installer ISO, but I think reinstalling is simpler for you.
I looked at all my VMs and they all use SeaBIOS. Trying to change to OVMF and adding an EFI disk make them ALL fail.
Yes, because they expect one way to boot and you changed it to another way, see my remarks above. I don't know why you would change all your VMs, but it appears that your GPU passthrough works only with OVMF and not withj SeaBIOS.
It seems that something is wrong whenever I try to run OVMF BIOS. This is not at all related to the passthrough of the GPU. Maybe I need to post a different issue about this?
It is not directly related but the boot process also involves the ROM BIOS/firmware of the GPU which is initialized during boot in order to display stuff. Your passthrough GPU works better with EFI, therefore I suggest you create a VM with OVMF. This might require reinstalling the operating system (works for any operating system) or repairing the boot process inside the VM (works only for Linux).
 
  • Like
Reactions: kvickis
leeksteken, you are a really nice person! Thanks a lot! I now have a working configuration passing the GPU through to Linux Mint, and everything is working as expected. I also passed through the keyboard and mouse to the VM, otherwise I would not have been able to enroll the "Owners key" that UEFI requires to use proprietary drivers in Linux Mint. Anyway, thanks a lot, I wouldn't have done this without you!
 
Just one more thing...
I seem to be totally unable to make a new installation of Windows 10, using the OVMF Bios. I am dropped into the UEFI shell. The windows installer is never started. I have followed the advice here on the Proxmox Wiki, but it didn't work out for me.
 
I seem to be totally unable to make a new installation of Windows 10, using the OVMF Bios. I am dropped into the UEFI shell.
Do you have a virtual CD/DVD Drive in the VM? Is it connected to the VM as IDE 0 or 2, because Windows does not come with VirtIO drivers?
Is it the first option in the Boot Order under Options? Or do you press the Esc button during the start of the VM and choose the boot drive manually?
If you would share the VM configuration file (from the /etc/pve/qemu-server/ directory on the Proxmox host), I could check this for you.

EDIT: Do you want help here or move over to the new thread?
 
Last edited:
Yes, I have a virtual CD/DVD Drive in the VM. It is connected as IDE 2. It is the first option in the Boot Order under Options, and all the other virtual drives are unchecked. I dont press the ESC button during boot.
Here is the VM configuration file:

Code:
agent: 1
balloon: 0
bios: ovmf
boot: order=ide2
cores: 6
efidisk0: local-lvm:vm-101-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
ide0: nfsproxmox:iso/virtio-win-0.1.215.iso,media=cdrom,size=528322K
ide2: nfsproxmox:iso/Windows_10_Pro_x64_En-US_Activated.iso,media=cdrom,size=3778602K
machine: pc-q35-6.2
memory: 8196
meta: creation-qemu=6.2.0,ctime=1661106007
name: Win10
net0: virtio=A2:C6:2B:08:F1:79,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsi0: local-lvm:vm-101-disk-1,cache=writeback,discard=on,size=100G
scsihw: virtio-scsi-single
smbios1: uuid=bd77da3e-c7ed-48ea-8630-92bbd1cb9450
sockets: 1
vmgenid: e6a4e11c-7c47-4142-9e36-6e2e3785b56b

Maybe it would be better to move over to the new thread for further discussions.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!