Proxmox VE 7.4 + HUANANZI X79 GPU Passthrough trouble

kallibr44

New Member
Mar 3, 2023
3
0
1
Hello guys, please help me.

I'm trying to make Gaming VM with RTX 2060 SUPER passthrough (Win10). Here is my GRUB config:

# If you change this file, run 'update-grub' afterwards to update
# /boot/grub/grub.cfg.
# For full documentation of the options in this file, see:
# info -f grub -n 'Simple configuration'

GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt initcall_blacklist=sysfb_init pcie_acs_override=downstream,multifunction nofb nomodeset"
#GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt nomodeset video=vesafb:off video=efifb:off"
GRUB_CMDLINE_LINUX=""

# Uncomment to enable BadRAM filtering, modify to suit your needs
# This works with Linux (no patch required) and with any kernel that obtains
# the memory map information from GRUB (GNU Mach, kernel of FreeBSD ...)
#GRUB_BADRAM="0x01234567,0xfefefefe,0x89abcdef,0xefefefef"

# Uncomment to disable graphical terminal (grub-pc only)
#GRUB_TERMINAL=console

# The resolution used on graphical terminal
# note that you can use only modes which your graphic card supports via VBE
# you can see them in real GRUB with the command `vbeinfo'
#GRUB_GFXMODE=640x480

# Uncomment if you don't want GRUB to pass "root=UUID=xxx" parameter to Linux
#GRUB_DISABLE_LINUX_UUID=true

# Uncomment to disable generation of recovery mode menu entries
#GRUB_DISABLE_RECOVERY="true"

# Uncomment to get a beep at grub start
#GRUB_INIT_TUNE="480 440 1"

options vfio_iommu_type1 allow_unsafe_interrupts=1

options kvm ignore_msrs=1

options vfio-pci ids=10de:1f06,10de:10f9,10de:1ada,10de:1adb disable_vga=1

blacklist radeon
blacklist nouveau
blacklist nvidia

VM Config
agent: 0
bios: ovmf
boot: order=scsi0;ide0
cores: 6
cpu: host,flags=+pcid
efidisk0: local-lvm:vm-112-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:03:00,pcie=1
ide0: local:iso/virtio-win-0.1.229.iso,media=cdrom,size=522284K
machine: q35
memory: 8192
meta: creation-qemu=7.2.0,ctime=1680009706
name: GPU-VM
net0: virtio=16:3D:79:A9:A3:0F,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: NVME:vm-112-disk-0,cache=writeback,iothread=1,size=60G,ssd=1
scsi1: NVME:vm-112-disk-1,cache=writeback,iothread=1,size=150G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=0b6ed0d4-92ed-4032-8226-fb46179d4215
sockets: 1
vga: none
vmgenid: 0f00905e-18dc-44ec-8b21-c54cd7e72056
If I start the VM (and sometimes without it) I get a Kernel Panic (which I can't process because it goes off screen and I don't know how to save it completely)
Here what i got in kern.log:
Apr 02 14:27:31 srv1 pvedaemon[1545]: <root@pam> end task UPID:srv1:00000CF3:00004356:64292E5E:qmstart:112:root@pam: OK
Apr 02 14:34:35 srv1 kernel: general protection fault, probably for non-canonical address 0xb1dd01200c00: 0000 [#1] SMP PTI
Apr 02 14:34:35 srv1 kernel: CPU: 1 PID: 2353 Comm: CPU 0/KVM Tainted: P O 5.15.102-1-pve #1
Apr 02 14:34:35 srv1 kernel: Hardware name: HUANANZHI /X79, BIOS 4.6.5 10/28/2019
Apr 02 14:34:35 srv1 kernel: RIP: 0010:tdp_iter_refresh_sptep+0x55/0x90 [kvm]
Apr 02 14:34:35 srv1 kernel: Code: 00 00 44 8d 68 ff 4d 63 ed 49 83 fd 04 77 31 4a 8b 44 eb 10 49 d3 ec 41 81 e4 ff 01 00 00 4a 8d 04 e0 48 89 43 38 48 8b 43 38 <48> 8b 00 48 89 43 58 48 83 c4 08 5b 41 5c 41 5d 5d c3 cc cc cc cc
Apr 02 14:34:35 srv1 kernel: RSP: 0018:ffffb9b5caae3ab0 EFLAGS: 00010202
Apr 02 14:34:35 srv1 kernel: RAX: 0000b1dd01200c00 RBX: ffffb9b5caae3b18 RCX: 000000000000000c
Apr 02 14:34:35 srv1 kernel: RDX: 0000b1dd01200000 RSI: 00006630c0000000 RDI: ffffb9b5caae3b18
Apr 02 14:34:35 srv1 kernel: RBP: ffffb9b5caae3ad0 R08: 0000000000141d80 R09: 0000000000000000
Apr 02 14:34:35 srv1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000180
Apr 02 14:34:35 srv1 kernel: R13: 0000000000000000 R14: ffffb9b5caae3c28 R15: 0000000000000000
Apr 02 14:34:35 srv1 kernel: FS: 00007fe01a321700(0000) GS:ffff99dacfa40000(0000) knlGS:000000a870237000
Apr 02 14:34:35 srv1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 02 14:34:35 srv1 kernel: CR2: 00006bc6207a8380 CR3: 000000031aad2005 CR4: 00000000000626e0
Apr 02 14:34:35 srv1 kernel: Call Trace:
Apr 02 14:34:35 srv1 kernel: <TASK>
Apr 02 14:34:35 srv1 kernel: tdp_iter_next+0x18b/0x1c0 [kvm]
Apr 02 14:34:35 srv1 kernel: ? tdp_iter_start+0x83/0xc0 [kvm]
Apr 02 14:34:35 srv1 kernel: kvm_tdp_mmu_get_walk+0x88/0xd0 [kvm]
Apr 02 14:34:35 srv1 kernel: kvm_mmu_page_fault+0x6a4/0x8e0 [kvm]
Apr 02 14:34:35 srv1 kernel: ? kvm_io_bus_get_first_dev+0x58/0xe0 [kvm]
Apr 02 14:34:35 srv1 kernel: ? __kvm_io_bus_write+0x2d/0xc0 [kvm]
Apr 02 14:34:35 srv1 kernel: handle_ept_misconfig+0x57/0x130 [kvm_intel]
Apr 02 14:34:35 srv1 kernel: vmx_handle_exit+0x775/0x8d0 [kvm_intel]
Apr 02 14:34:35 srv1 kernel: kvm_arch_vcpu_ioctl_run+0xdd6/0x1730 [kvm]
Apr 02 14:34:35 srv1 kernel: ? kvm_vcpu_ioctl+0x2bb/0x6b0 [kvm]
Apr 02 14:34:35 srv1 kernel: kvm_vcpu_ioctl+0x252/0x6b0 [kvm]
Apr 02 14:34:35 srv1 kernel: ? kvm_on_user_return+0x80/0xf0 [kvm]
Apr 02 14:34:35 srv1 kernel: ? __fget_files+0x86/0xc0
Apr 02 14:34:35 srv1 kernel: __x64_sys_ioctl+0x95/0xd0
Apr 02 14:34:35 srv1 kernel: do_syscall_64+0x5c/0xc0
Apr 02 14:34:35 srv1 kernel: ? put_timespec64+0x3d/0x70
Apr 02 14:34:35 srv1 kernel: ? exit_to_user_mode_prepare+0x37/0x1b0
Apr 02 14:34:35 srv1 kernel: ? syscall_exit_to_user_mode+0x27/0x50
Apr 02 14:34:35 srv1 kernel: ? do_syscall_64+0x69/0xc0
Apr 02 14:34:35 srv1 kernel: ? do_syscall_64+0x69/0xc0
Apr 02 14:34:35 srv1 kernel: ? sysvec_apic_timer_interrupt+0x4e/0x90
Apr 02 14:34:35 srv1 kernel: entry_SYSCALL_64_after_hwframe+0x61/0xcb
Apr 02 14:34:35 srv1 kernel: RIP: 0033:0x7fe02676c5f7
Apr 02 14:34:35 srv1 kernel: Code: 00 00 00 48 8b 05 99 c8 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 69 c8 0d 00 f7 d8 64 89 01 48
Apr 02 14:34:35 srv1 kernel: RSP: 002b:00007fe01a31c288 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Apr 02 14:34:35 srv1 kernel: RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007fe02676c5f7
Apr 02 14:34:35 srv1 kernel: RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000021
Apr 02 14:34:35 srv1 kernel: RBP: 0000561c078790f0 R08: 0000561c059c8240 R09: 00000000ffffffff
Apr 02 14:34:35 srv1 kernel: R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
Apr 02 14:34:35 srv1 kernel: R13: 0000561c060d3020 R14: 0000000000000000 R15: 0000000000000000
Apr 02 14:34:35 srv1 kernel: </TASK>
Apr 02 14:34:35 srv1 kernel: Modules linked in: veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables bonding tls softdog nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio kvm_intel snd_hda_intel snd_intel_dspcfg kvm snd_intel_sdw_acpi crct10dif_pclmul btusb ghash_clmulni_intel snd_hda_codec btrtl aesni_intel snd_hda_core crypto_simd btbcm snd_hwdep cryptd btintel snd_pcm bluetooth ucsi_ccg mei_me cp210x snd_timer rapl typec_ucsi ecdh_generic snd ecc intel_cstate usbserial typec soundcore mei efi_pstore serio_raw pcspkr zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) mac_hid zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vfio_pci vfio_pci_core vfio_virqfd irqbypass vfio_iommu_type1 vfio drm sunrpc ip_tables
Apr 02 14:34:35 srv1 kernel: x_tables autofs4 btrfs blake2b_generic xor zstd_compress raid6_pq simplefb dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c crc32_pclmul psmouse xhci_pci ahci i2c_i801 xhci_pci_renesas r8169 i2c_smbus lpc_ich libahci ehci_pci nvme realtek ehci_hcd xhci_hcd i2c_nvidia_gpu nvme_core
Apr 02 14:34:35 srv1 kernel: ---[ end trace ebaa0857a60fe92f ]---
Apr 02 14:34:35 srv1 kernel: RIP: 0010:tdp_iter_refresh_sptep+0x55/0x90 [kvm]
Apr 02 14:34:35 srv1 kernel: Code: 00 00 44 8d 68 ff 4d 63 ed 49 83 fd 04 77 31 4a 8b 44 eb 10 49 d3 ec 41 81 e4 ff 01 00 00 4a 8d 04 e0 48 89 43 38 48 8b 43 38 <48> 8b 00 48 89 43 58 48 83 c4 08 5b 41 5c 41 5d 5d c3 cc cc cc cc
Apr 02 14:34:35 srv1 kernel: RSP: 0018:ffffb9b5caae3ab0 EFLAGS: 00010202
Apr 02 14:34:35 srv1 kernel: RAX: 0000b1dd01200c00 RBX: ffffb9b5caae3b18 RCX: 000000000000000c
Apr 02 14:34:35 srv1 kernel: RDX: 0000b1dd01200000 RSI: 00006630c0000000 RDI: ffffb9b5caae3b18
Apr 02 14:34:35 srv1 kernel: RBP: ffffb9b5caae3ad0 R08: 0000000000141d80 R09: 0000000000000000
Apr 02 14:34:35 srv1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000180
Apr 02 14:34:35 srv1 kernel: R13: 0000000000000000 R14: ffffb9b5caae3c28 R15: 0000000000000000
Apr 02 14:34:35 srv1 kernel: FS: 00007fe01a321700(0000) GS:ffff99dacfa40000(0000) knlGS:000000a870237000
Apr 02 14:34:35 srv1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 02 14:34:35 srv1 kernel: CR2: 00006bc6207a8380 CR3: 000000031aad2005 CR4: 00000000000626e0
Apr 02 14:34:43 srv1 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000008
Apr 02 14:34:43 srv1 kernel: #PF: supervisor read access in kernel mode
Apr 02 14:34:43 srv1 kernel: #PF: error_code(0x0000) - not-present page
-- Reboot --

I don't know what to do, please help me with make this gpu work.

P.S. I have only one GPU in server
 
Hi,

What Proxmox VE version are you running, pveversion -v?
Code:
root@srv1:~# pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.102-1-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.3-3
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-3
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-1
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.3.3-1
proxmox-backup-file-restore: 2.3.3-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.6.3
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20221111-1
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.11-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1
root@srv1:~#
 
UPD. Sometimes api server (which sending data to web interfase) breaks down (no metrics in web interface + i can't see config on any machine, but can connect to noVNC panel)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!