PVE 9.05 disk devices lost with PCIe Passthrough VM. Only root drive not affected.

kobemtl

Member
Nov 25, 2020
17
1
23
37
Simular to Proxmox VE 9 PCIe Passthrough TrueNAS VM and PCIe-Passthrough no Longer Working on PVE 9.0.3 with kernel 6.14 - VM Hangs on Start, I have problem when VM using PCIe Passthrough. Everything works fine before with PVE 8.4. After upgrade to 9.05, this problem happenes. I thought something I did wrong then I reinstalled from sratch. Still the same problem. Everything time I create VM with PCIe Passthrough, host disk devices lose except the root drive. After reboot, all disks show up again.

You can see all details below. Please let me know if there is anymore information could help. Thank you.

Before VM start:
Bash:
# lsblk -o+FSTYPE,MODEL,TRAN,VENDOR
NAME               MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS FSTYPE      MODEL              TRAN   VENDOR
sda                  8:0    0 476.9G  0 disk                         Timetec SD08       sas    ATA   
sdb                  8:16   0 117.4G  0 disk                         SanDisk SDSSDP128G sata   ATA   
├─sdb1               8:17   0  1007K  0 part
├─sdb2               8:18   0     1G  0 part             vfat
└─sdb3               8:19   0 116.4G  0 part             LVM2_member
  ├─pve-swap       252:0    0     8G  0 lvm  [SWAP]      swap
  ├─pve-root       252:1    0  39.1G  0 lvm  /           ext4
  ├─pve-data_tmeta 252:2    0     1G  0 lvm
  │ └─pve-data     252:4    0  52.8G  0 lvm
  └─pve-data_tdata 252:3    0  52.8G  0 lvm
    └─pve-data     252:4    0  52.8G  0 lvm
sdc                  8:32   1  57.3G  0 disk                         SanDisk 3.2Gen1    usb     USB 
├─sdc1               8:33   1  57.3G  0 part             exfat
└─sdc2               8:34   1    32M  0 part             vfat
vme0n1                     259:0    0 953.9G  0 disk             LVM2_member TEAM TM8FP4001T    nvme                                                                                                         
└─nvme--01-vm--999--disk--0 252:5    0    32G  0 lvm

After VM start, nvme0n1 and sda are gone:
Bash:
# lsblk -o+FSTYPE,MODEL,TRAN,VENDOR
NAME               MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS FSTYPE      MODEL              TRAN   VENDOR
sdb                  8:16   0 117.4G  0 disk                         SanDisk SDSSDP128G sata   ATA
├─sdb1               8:17   0  1007K  0 part
├─sdb2               8:18   0     1G  0 part             vfat
└─sdb3               8:19   0 116.4G  0 part             LVM2_member
  ├─pve-swap       252:0    0     8G  0 lvm  [SWAP]      swap
  ├─pve-root       252:1    0  39.1G  0 lvm  /           ext4
  ├─pve-data_tmeta 252:2    0     1G  0 lvm
  │ └─pve-data     252:4    0  52.8G  0 lvm
  └─pve-data_tdata 252:3    0  52.8G  0 lvm
    └─pve-data     252:4    0  52.8G  0 lvm
sdc                  8:32   1  57.3G  0 disk                         SanDisk 3.2Gen1    usb     USB
├─sdc1               8:33   1  57.3G  0 part             exfat
└─sdc2               8:34   1    32M  0 part             vfat

The PCIe device using
Bash:
# lspci -nnk
....
01:00.0 3D controller [0302]: NVIDIA Corporation GP104GL [Tesla P4] [10de:1bb3] (rev a1)
        Subsystem: NVIDIA Corporation Device [10de:11d8]
        Kernel modules: nvidiafb, nouveau
....

journalctl -f
Bash:
Aug 26 15:20:49 pve-dellr330 pvedaemon[1315]: <root@pam> starting task UPID:pve-dellr330:0000299D:00031CC4:68AE0911:qmstart:999:root@pam:
Aug 26 15:20:49 pve-dellr330 pvedaemon[10653]: start VM 999: UPID:pve-dellr330:0000299D:00031CC4:68AE0911:qmstart:999:root@pam:
Aug 26 15:20:50 pve-dellr330 kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
Aug 26 15:20:50 pve-dellr330 kernel: mpt3sas_cm0: mpt3sas_transport_port_remove: removed: sas_addr(0x4433221106000000)
Aug 26 15:20:50 pve-dellr330 kernel: mpt3sas_cm0: removing handle(0x000a), sas_addr(0x4433221106000000)
Aug 26 15:20:50 pve-dellr330 kernel: mpt3sas_cm0: enclosure logical id(0x54cd98f04e0aaa00), slot(0)
Aug 26 15:20:50 pve-dellr330 kernel: mpt3sas_cm0: enclosure level(0x0001), connector name(     )
Aug 26 15:20:50 pve-dellr330 kernel: mpt3sas_cm0: sending message unit reset !!
Aug 26 15:20:50 pve-dellr330 kernel: mpt3sas_cm0: message unit reset: SUCCESS
Aug 26 15:20:51 pve-dellr330 systemd[1]: Created slice qemu.slice - Slice /qemu.
Aug 26 15:20:51 pve-dellr330 systemd[1]: Started 999.scope.
Aug 26 15:20:51 pve-dellr330 kernel: kvm: attempt to access beyond end of device
                                     nvme0n1: rw=2048, sector=2048, nr_sectors = 8 limit=0
Aug 26 15:20:52 pve-dellr330 kernel: tap999i0: entered promiscuous mode
Aug 26 15:20:52 pve-dellr330 kernel: vmbr0: port 2(fwpr999p0) entered blocking state
Aug 26 15:20:52 pve-dellr330 kernel: vmbr0: port 2(fwpr999p0) entered disabled state
Aug 26 15:20:52 pve-dellr330 kernel: fwpr999p0: entered allmulticast mode
Aug 26 15:20:52 pve-dellr330 kernel: fwpr999p0: entered promiscuous mode
Aug 26 15:20:52 pve-dellr330 kernel: vmbr0: port 2(fwpr999p0) entered blocking state
Aug 26 15:20:52 pve-dellr330 kernel: vmbr0: port 2(fwpr999p0) entered forwarding state
Aug 26 15:20:52 pve-dellr330 kernel: fwbr999i0: port 1(fwln999i0) entered blocking state
Aug 26 15:20:52 pve-dellr330 kernel: fwbr999i0: port 1(fwln999i0) entered disabled state
Aug 26 15:20:52 pve-dellr330 kernel: fwln999i0: entered allmulticast mode
Aug 26 15:20:52 pve-dellr330 kernel: fwln999i0: entered promiscuous mode
Aug 26 15:20:52 pve-dellr330 kernel: fwbr999i0: port 1(fwln999i0) entered blocking state
Aug 26 15:20:52 pve-dellr330 kernel: fwbr999i0: port 1(fwln999i0) entered forwarding state
Aug 26 15:20:52 pve-dellr330 kernel: fwbr999i0: port 2(tap999i0) entered blocking state
Aug 26 15:20:52 pve-dellr330 kernel: fwbr999i0: port 2(tap999i0) entered disabled state
Aug 26 15:20:52 pve-dellr330 kernel: tap999i0: entered allmulticast mode
Aug 26 15:20:52 pve-dellr330 kernel: fwbr999i0: port 2(tap999i0) entered blocking state
Aug 26 15:20:52 pve-dellr330 kernel: fwbr999i0: port 2(tap999i0) entered forwarding state
Aug 26 15:20:52 pve-dellr330 kernel: vfio-pci 0000:01:00.0: Enabling HDA controller
Aug 26 15:20:53 pve-dellr330 kernel: worker: attempt to access beyond end of device
                                     nvme0n1: rw=2048, sector=2048, nr_sectors = 1 limit=0
Aug 26 15:20:54 pve-dellr330 pvedaemon[10653]: VM 999 started with PID 10689.
Aug 26 15:20:55 pve-dellr330 pvedaemon[1315]: <root@pam> end task UPID:pve-dellr330:0000299D:00031CC4:68AE0911:qmstart:999:root@pam: OK
Aug 26 15:20:57 pve-dellr330 kernel: kvm: attempt to access beyond end of device
                                     nvme0n1: rw=0, sector=2048, nr_sectors = 1 limit=0
Aug 26 15:21:36 pve-dellr330 kernel: kvm: attempt to access beyond end of device
                                     nvme0n1: rw=0, sector=2048, nr_sectors = 8 limit=0
Aug 26 15:21:36 pve-dellr330 kernel: kvm: attempt to access beyond end of device
                                     nvme0n1: rw=0, sector=2048, nr_sectors = 8 limit=0
Aug 26 15:21:36 pve-dellr330 kernel: kvm: attempt to access beyond end of device
                                     nvme0n1: rw=0, sector=2048, nr_sectors = 8 limit=0
Aug 26 15:21:36 pve-dellr330 kernel: kvm: attempt to access beyond end of device
                                     nvme0n1: rw=0, sector=2048, nr_sectors = 8 limit=0
Aug 26 15:21:36 pve-dellr330 kernel: kvm: attempt to access beyond end of device
                                     nvme0n1: rw=0, sector=2048, nr_sectors = 8 limit=0
Aug 26 15:21:36 pve-dellr330 kernel: kvm: attempt to access beyond end of device
                                     nvme0n1: rw=0, sector=2048, nr_sectors = 8 limit=0
Aug 26 15:21:36 pve-dellr330 kernel: kvm: attempt to access beyond end of device
                                     nvme0n1: rw=0, sector=2048, nr_sectors = 8 limit=0
Aug 26 15:21:36 pve-dellr330 kernel: kvm: attempt to access beyond end of device
                                     nvme0n1: rw=0, sector=2048, nr_sectors = 8 limit=0
Aug 26 15:21:36 pve-dellr330 kernel: kvm: attempt to access beyond end of device
                                     nvme0n1: rw=0, sector=2048, nr_sectors = 8 limit=0
Aug 26 15:21:36 pve-dellr330 kernel: kvm: attempt to access beyond end of device
                                     nvme0n1: rw=0, sector=2048, nr_sectors = 8 limit=0
Aug 26 15:21:41 pve-dellr330 kernel: bio_check_eod: 103 callbacks suppressed

INI:
# /etc/pve/qemu-server# cat 999.conf

boot: order=scsi0;ide2;net0
cores: 8
cpu: x86-64-v2-AES
hostpci0: 0000:01:00.0
ide2: none,media=cdrom
memory: 2048
meta: creation-qemu=10.0.2,ctime=1756231267
name: test
net0: virtio=BC:24:11:8A:F5:AE,bridge=vmbr0,firewall=1
numa: 0
ostype: l26
scsi0: nvme-01:vm-999-disk-0,iothread=1,size=32G
scsihw: virtio-scsi-single
smbios1: uuid=fda878ef-ec81-41b2-b3ab-39c867273835
sockets: 1
vmgenid: b5196a27-1adb-44e5-8073-80b44cacd5c8
 
Last edited:
Problem resovled with the update below.

grub:
add pcie_acs_override=downstream to span>GRUB_CMDLINE_LINUX_DEFAULT

modprobe:
echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/iommu_unsafe_interrupts.conf
echo "options kvm ignore_msrs=1 report_ignored_msrs=0" > /etc/modprobe.d/kvm.conf