I've got a storage VM that worked in proxmox 5.4, I've upgraded to 6.0.
This VM has two PCIe SAS cards passed through, one LSI SAS2116 (in the form of a LSAS SAS9201-16E) and one LSI SAS2008 (in the form of a Dell 85M9R mezzanine controller for a C1100 and C2100).
All controllers passed through correctly on 5.4, but on 6.0 when starting the vm, dmesg outputs
If I remove the offending mezzanine controller, the vm boots and all attached drives behave normally.
I have tried changing the machine tags to "pc-q35-3.1" both with and without the "kernel_irqchip=on" option as described in https://forum.proxmox.com/threads/gpu-passthought-ko-with-kernel-5-0-pve-6-0.56059/
This VM has two PCIe SAS cards passed through, one LSI SAS2116 (in the form of a LSAS SAS9201-16E) and one LSI SAS2008 (in the form of a Dell 85M9R mezzanine controller for a C1100 and C2100).
All controllers passed through correctly on 5.4, but on 6.0 when starting the vm, dmesg outputs
Code:
[ 2660.462336] INFO: task vgs:18985 blocked for more than 120 seconds.
[ 2660.462990] Tainted: P O L 5.0.15-1-pve #1
[ 2660.463587] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2660.464180] vgs D 0 18985 4446 0x00000000
[ 2660.464182] Call Trace:
[ 2660.464190] __schedule+0x2d4/0x870
[ 2660.464192] schedule+0x2c/0x70
[ 2660.464199] raid5_make_request+0x20b/0xc80 [raid456]
[ 2660.464203] ? __split_and_process_bio+0x1b0/0x2b0
[ 2660.464206] ? wait_woken+0x80/0x80
[ 2660.464208] ? kmem_cache_alloc_node+0x1c7/0x200
[ 2660.464211] md_handle_request+0x127/0x1a0
[ 2660.464212] md_make_request+0x7b/0x170
[ 2660.464214] generic_make_request+0x19e/0x400
[ 2660.464216] submit_bio+0x49/0x140
[ 2660.464218] ? bio_set_pages_dirty+0x39/0x50
[ 2660.464219] blkdev_direct_IO+0x3ec/0x450
[ 2660.464221] ? free_ioctx+0x80/0x80
[ 2660.464224] generic_file_read_iter+0xb3/0xd60
[ 2660.464228] ? security_file_permission+0x9d/0xf0
[ 2660.464229] blkdev_read_iter+0x35/0x40
[ 2660.464230] aio_read+0xf8/0x160
[ 2660.464233] ? handle_mm_fault+0xe1/0x210
[ 2660.464235] ? __do_page_fault+0x25a/0x4c0
[ 2660.464237] ? mem_cgroup_try_charge+0x8b/0x190
[ 2660.464238] ? _cond_resched+0x19/0x30
[ 2660.464239] ? io_submit_one+0x96/0xb60
[ 2660.464241] io_submit_one+0x170/0xb60
[ 2660.464243] ? page_fault+0x1e/0x30
[ 2660.464245] __x64_sys_io_submit+0xa9/0x190
[ 2660.464246] ? __x64_sys_io_submit+0xa9/0x190
[ 2660.464250] do_syscall_64+0x5a/0x110
[ 2660.464251] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2660.464253] RIP: 0033:0x7f2b1705df59
[ 2660.464257] Code: Bad RIP value.
[ 2660.464258] RSP: 002b:00007ffcf4052128 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
[ 2660.464259] RAX: ffffffffffffffda RBX: 00007f2b16cfe700 RCX: 00007f2b1705df59
[ 2660.464260] RDX: 00007ffcf40521d0 RSI: 0000000000000001 RDI: 00007f2b176c4000
[ 2660.464261] RBP: 00007f2b176c4000 R08: 000055d6624a1000 R09: 0000000000000000
[ 2660.464261] R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000001
[ 2660.464262] R13: 0000000000000000 R14: 00007ffcf40521d0 R15: 0000000000000000
If I remove the offending mezzanine controller, the vm boots and all attached drives behave normally.
Code:
root@pve2:~# pveversion -v
proxmox-ve: 6.0-2 (running kernel: 5.0.15-1-pve)
pve-manager: 6.0-4 (running version: 6.0-4/2a719255)
pve-kernel-5.0: 6.0-5
pve-kernel-helper: 6.0-5
pve-kernel-4.15: 5.4-6
pve-kernel-4.13: 5.2-2
pve-kernel-5.0.15-1-pve: 5.0.15-1
pve-kernel-4.15.18-18-pve: 4.15.18-44
pve-kernel-4.15.18-15-pve: 4.15.18-40
pve-kernel-4.15.18-14-pve: 4.15.18-39
pve-kernel-4.15.18-12-pve: 4.15.18-36
pve-kernel-4.15.18-11-pve: 4.15.18-34
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-7-pve: 4.15.18-27
pve-kernel-4.15.17-3-pve: 4.15.17-14
pve-kernel-4.13.16-4-pve: 4.13.16-51
pve-kernel-4.13.16-3-pve: 4.13.16-50
pve-kernel-4.13.16-1-pve: 4.13.16-46
pve-kernel-4.13.13-4-pve: 4.13.13-35
pve-kernel-4.13.4-1-pve: 4.13.4-26
pve-kernel-4.10.17-4-pve: 4.10.17-24
pve-kernel-4.10.17-3-pve: 4.10.17-23
pve-kernel-4.10.17-1-pve: 4.10.17-18
pve-kernel-4.10.15-1-pve: 4.10.15-15
pve-kernel-4.4.35-1-pve: 4.4.35-76
ceph: 14.2.1-pve2
ceph-fuse: 14.2.1-pve2
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.10-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-2
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-5
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-61
lxcfs: 3.0.3-pve60
novnc-pve: 1.0.0-60
openvswitch-switch: 2.10.0+2018.08.28+git.8ca7c82b7d+ds1-12
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-5
pve-cluster: 6.0-4
pve-container: 3.0-4
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-5
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-2
pve-qemu-kvm: 4.0.0-3
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-5
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve1
Code:
root@pve2:~# cat /etc/pve/qemu-server/100.conf
bios: ovmf
boot: cdn
bootdisk: virtio0
cores: 3
efidisk0: local-lvm:vm-100-disk-2,size=4M
hotplug: disk,network,usb
hostpci0: 04:00.0,pcie=1
hostpci0: 01:00.0,pcie=1
ide2: none,media=cdrom
machine: q35,kernel_irqchip=on
memory: 12000
name: Storage
net0: virtio=1A:34:38:5D:EE:FE,bridge=vmbr0,queues=1,tag=3
numa: 0
onboot: 0
ostype: l26
scsihw: virtio-scsi-pci
smbios1: uuid=8381717e-a35c-43e3-bc35-55e18576e222
sockets: 1
virtio0: local-lvm:vm-100-disk-1,cache=writeback,size=16G
virtio1: local-lvm:vm-100-disk-3,cache=writeback,size=8G
virtio2: local-lvm:vm-100-disk-0,backup=0,cache=writeback,size=110G
I have tried changing the machine tags to "pc-q35-3.1" both with and without the "kernel_irqchip=on" option as described in https://forum.proxmox.com/threads/gpu-passthought-ko-with-kernel-5-0-pve-6-0.56059/