Hi everybody,
I've been trying to get GPU passthrough working on the latest Proxmox VE version (8.4.5 right now) and the latest 6.14 kernel (6.14.8-1~bpo12+1), but keep getting segfaults when starting a VM with the GPU passed through.
These are the relevant specs of my system:
Asrock Rack X570D4U-2L2T motherboard
AMD Ryzen 9 5950X
64 Gb RAM
Asus Phoenix Geforce RTX 3060 12g
I followed all standard advise for getting GPU passthrough working. It all seems OK, until I start a VM with GPU passed through.
IOMMU is enabled and working:
The command:
Shows the GPU devices (video/audio) in it's own iommu group (gives a large table, not a good idea to post here I think).
I blacklisted the nvidia drivers, enabled vfio modules, which seems to have worked, since
The VM config looks good I think:
And yet, when I start the VM I immediately get a QEMU error in the GUI:
And journalctl shows this:
I tried different settings in the GUI for the PCI device (Primary GPU on/of, ROM-Bar on/off, PCI-Express on/off), but the error is always the same.
Does anybody have any clue what I might be doing wrong?
Searching for segfault errors when passing through a GPU on Proxmox on this forum or the Internet doesn't give any helpful results.
Thanks in advance!
I've been trying to get GPU passthrough working on the latest Proxmox VE version (8.4.5 right now) and the latest 6.14 kernel (6.14.8-1~bpo12+1), but keep getting segfaults when starting a VM with the GPU passed through.
These are the relevant specs of my system:
Asrock Rack X570D4U-2L2T motherboard
AMD Ryzen 9 5950X
64 Gb RAM
Asus Phoenix Geforce RTX 3060 12g
I followed all standard advise for getting GPU passthrough working. It all seems OK, until I start a VM with GPU passed through.
IOMMU is enabled and working:
dmesg | grep -e DMAR -e IOMMU[ 0.668197] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported[ 0.674759] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).The command:
pvesh get /nodes/{nodename}/hardware/pci --pci-class-blacklist ""Shows the GPU devices (video/audio) in it's own iommu group (gives a large table, not a good idea to post here I think).
I blacklisted the nvidia drivers, enabled vfio modules, which seems to have worked, since
lspci -nnk shows:2d:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA106 [GeForce RTX 3060 Lite Hash Rate] [10de:2504] (rev a1) Subsystem: ASUSTeK Computer Inc. GA106 [GeForce RTX 3060 Lite Hash Rate] [1043:8810] Kernel driver in use: vfio-pci Kernel modules: nvidiafb, nouveau2d:00.1 Audio device [0403]: NVIDIA Corporation GA106 High Definition Audio Controller [10de:228e] (rev a1) Subsystem: ASUSTeK Computer Inc. GA106 High Definition Audio Controller [1043:8810] Kernel driver in use: vfio-pci Kernel modules: snd_hda_intelThe VM config looks good I think:
agent: 1balloon: 0bios: ovmfboot: order=scsi0;ide2;net0cores: 4cpu: hostefidisk0: local-lvm:vm-101-disk-0,efitype=4m,pre-enrolled-keys=1,size=4Mhostpci0: 0000:2d:00,pcie=1ide2: local:iso/Fedora-Server-netinst-x86_64-42-1.1.iso,media=cdrom,size=943370Kmachine: q35memory: 4096meta: creation-qemu=9.2.0,ctime=1752926522name: ainet0: virtio=BC:24:11:C5:CC:56,bridge=vmbr0,firewall=1numa: 0onboot: 1ostype: l26scsi0: local-lvm:vm-101-disk-1,discard=on,iothread=1,size=128G,ssd=1scsihw: virtio-scsi-singlesmbios1: uuid=1c103652-aaa9-4724-b46d-2307e239337asockets: 1tpmstate0: local-lvm:vm-101-disk-2,size=4M,version=v2.0vga: nonevmgenid: 82ba3e2f-63eb-49b6-8e8d-0c4f7517e905And yet, when I start the VM I immediately get a QEMU error in the GUI:
TASK ERROR: start failed: QEMU exited with code 1And journalctl shows this:
Jul 19 14:26:42 pve1 pvedaemon[23026]: start VM 101: UPID:pve1:000059F2:00043ABC:687B8F02:qmstart:101:root@pam:Jul 19 14:26:42 pve1 pvedaemon[2178]: <root@pam> starting task UPID:pve1:000059F2:00043ABC:687B8F02:qmstart:101:root@pam:Jul 19 14:26:42 pve1 kernel: vfio-pci 0000:2d:00.0: resettingJul 19 14:26:42 pve1 kernel: vfio-pci 0000:2d:00.0: reset doneJul 19 14:26:43 pve1 systemd[1]: Started 101.scope.Jul 19 14:26:43 pve1 audit[23043]: AVC apparmor="DENIED" operation="capable" class="cap" profile="swtpm" pid=23043 comm="swtpm" capability=21 capname="sys_admin"Jul 19 14:26:43 pve1 kernel: audit: type=1400 audit(1752928003.044:35): apparmor="DENIED" operation="capable" class="cap" profile="swtpm" pid=23043 comm="swtpm" capability=21 capname="sys_admin"Jul 19 14:26:43 pve1 kernel: tap101i0: entered promiscuous modeJul 19 14:26:43 pve1 kernel: fwbr101i0: port 1(fwln101i0) entered disabled stateJul 19 14:26:43 pve1 kernel: vmbr0: port 5(fwpr101p0) entered disabled stateJul 19 14:26:43 pve1 kernel: fwln101i0 (unregistering): left allmulticast modeJul 19 14:26:43 pve1 kernel: fwln101i0 (unregistering): left promiscuous modeJul 19 14:26:43 pve1 kernel: fwbr101i0: port 1(fwln101i0) entered disabled stateJul 19 14:26:43 pve1 kernel: fwpr101p0 (unregistering): left allmulticast modeJul 19 14:26:43 pve1 kernel: fwpr101p0 (unregistering): left promiscuous modeJul 19 14:26:43 pve1 kernel: vmbr0: port 5(fwpr101p0) entered disabled stateJul 19 14:26:43 pve1 kernel: vmbr0: port 5(fwpr101p0) entered blocking stateJul 19 14:26:43 pve1 kernel: vmbr0: port 5(fwpr101p0) entered disabled stateJul 19 14:26:43 pve1 kernel: fwpr101p0: entered allmulticast modeJul 19 14:26:43 pve1 kernel: fwpr101p0: entered promiscuous modeJul 19 14:26:43 pve1 kernel: vmbr0: port 5(fwpr101p0) entered blocking stateJul 19 14:26:43 pve1 kernel: vmbr0: port 5(fwpr101p0) entered forwarding stateJul 19 14:26:43 pve1 kernel: fwbr101i0: port 1(fwln101i0) entered blocking stateJul 19 14:26:43 pve1 kernel: fwbr101i0: port 1(fwln101i0) entered disabled stateJul 19 14:26:43 pve1 kernel: fwln101i0: entered allmulticast modeJul 19 14:26:43 pve1 kernel: fwln101i0: entered promiscuous modeJul 19 14:26:43 pve1 kernel: fwbr101i0: port 1(fwln101i0) entered blocking stateJul 19 14:26:43 pve1 kernel: fwbr101i0: port 1(fwln101i0) entered forwarding stateJul 19 14:26:43 pve1 kernel: fwbr101i0: port 2(tap101i0) entered blocking stateJul 19 14:26:43 pve1 kernel: fwbr101i0: port 2(tap101i0) entered disabled stateJul 19 14:26:43 pve1 kernel: tap101i0: entered allmulticast modeJul 19 14:26:43 pve1 kernel: fwbr101i0: port 2(tap101i0) entered blocking stateJul 19 14:26:43 pve1 kernel: fwbr101i0: port 2(tap101i0) entered forwarding stateJul 19 14:26:43 pve1 kernel: vfio-pci 0000:2d:00.0: resettingJul 19 14:26:44 pve1 kernel: vfio-pci 0000:2d:00.0: reset doneJul 19 14:26:44 pve1 kernel: kvm[23048]: segfault at b8 ip 000064aebf9fd9e5 sp 00007ffedd8de640 error 4 in qemu-system-x86_64[7659e5,64aebf5cd000+6ba000] likely on CPU 18 (core 2, socket 0)Jul 19 14:26:44 pve1 kernel: Code: 48 85 c0 75 f0 48 8b 6b 60 48 89 b3 80 00 00 00 e8 d0 7f 00 00 48 8b 7b 40 83 05 d1 96 30 01 01 48 85 ff 74 05 e8 1b 6d 07 00 <48> 8b 85 b8 00 00 00 48 85 c0 74 7f 8b 93 b0 00 00 00 eb 13 0f 1fJul 19 14:26:44 pve1 kernel: fwbr101i0: port 2(tap101i0) entered disabled stateJul 19 14:26:44 pve1 kernel: tap101i0 (unregistering): left allmulticast modeJul 19 14:26:44 pve1 kernel: fwbr101i0: port 2(tap101i0) entered disabled stateJul 19 14:26:44 pve1 pvedaemon[22245]: VM 101 qmp command failed - VM 101 not runningJul 19 14:26:44 pve1 pvedaemon[23034]: stopping swtpm instance (pid 23043) due to QEMU startup errorJul 19 14:26:44 pve1 pvedaemon[23026]: start failed: QEMU exited with code 1Jul 19 14:26:44 pve1 pvedaemon[2178]: <root@pam> end task UPID:pve1:000059F2:00043ABC:687B8F02:qmstart:101:root@pam: start failed: QEMU exited with code 1Jul 19 14:26:44 pve1 systemd[1]: 101.scope: Deactivated successfully.I tried different settings in the GUI for the PCI device (Primary GPU on/of, ROM-Bar on/off, PCI-Express on/off), but the error is always the same.
Does anybody have any clue what I might be doing wrong?
Searching for segfault errors when passing through a GPU on Proxmox on this forum or the Internet doesn't give any helpful results.
Thanks in advance!