Proxmox6 LSI SAS2008 PCIe Passthrough

Zombielinux

Active Member
Jun 23, 2017
6
0
41
34
I've got a storage VM that worked in proxmox 5.4, I've upgraded to 6.0.

This VM has two PCIe SAS cards passed through, one LSI SAS2116 (in the form of a LSAS SAS9201-16E) and one LSI SAS2008 (in the form of a Dell 85M9R mezzanine controller for a C1100 and C2100).

All controllers passed through correctly on 5.4, but on 6.0 when starting the vm, dmesg outputs

Code:
[ 2660.462336] INFO: task vgs:18985 blocked for more than 120 seconds.
[ 2660.462990]       Tainted: P           O L    5.0.15-1-pve #1
[ 2660.463587] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2660.464180] vgs             D    0 18985   4446 0x00000000
[ 2660.464182] Call Trace:
[ 2660.464190]  __schedule+0x2d4/0x870
[ 2660.464192]  schedule+0x2c/0x70
[ 2660.464199]  raid5_make_request+0x20b/0xc80 [raid456]
[ 2660.464203]  ? __split_and_process_bio+0x1b0/0x2b0
[ 2660.464206]  ? wait_woken+0x80/0x80
[ 2660.464208]  ? kmem_cache_alloc_node+0x1c7/0x200
[ 2660.464211]  md_handle_request+0x127/0x1a0
[ 2660.464212]  md_make_request+0x7b/0x170
[ 2660.464214]  generic_make_request+0x19e/0x400
[ 2660.464216]  submit_bio+0x49/0x140
[ 2660.464218]  ? bio_set_pages_dirty+0x39/0x50
[ 2660.464219]  blkdev_direct_IO+0x3ec/0x450
[ 2660.464221]  ? free_ioctx+0x80/0x80
[ 2660.464224]  generic_file_read_iter+0xb3/0xd60
[ 2660.464228]  ? security_file_permission+0x9d/0xf0
[ 2660.464229]  blkdev_read_iter+0x35/0x40
[ 2660.464230]  aio_read+0xf8/0x160
[ 2660.464233]  ? handle_mm_fault+0xe1/0x210
[ 2660.464235]  ? __do_page_fault+0x25a/0x4c0
[ 2660.464237]  ? mem_cgroup_try_charge+0x8b/0x190
[ 2660.464238]  ? _cond_resched+0x19/0x30
[ 2660.464239]  ? io_submit_one+0x96/0xb60
[ 2660.464241]  io_submit_one+0x170/0xb60
[ 2660.464243]  ? page_fault+0x1e/0x30
[ 2660.464245]  __x64_sys_io_submit+0xa9/0x190
[ 2660.464246]  ? __x64_sys_io_submit+0xa9/0x190
[ 2660.464250]  do_syscall_64+0x5a/0x110
[ 2660.464251]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2660.464253] RIP: 0033:0x7f2b1705df59
[ 2660.464257] Code: Bad RIP value.
[ 2660.464258] RSP: 002b:00007ffcf4052128 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
[ 2660.464259] RAX: ffffffffffffffda RBX: 00007f2b16cfe700 RCX: 00007f2b1705df59
[ 2660.464260] RDX: 00007ffcf40521d0 RSI: 0000000000000001 RDI: 00007f2b176c4000
[ 2660.464261] RBP: 00007f2b176c4000 R08: 000055d6624a1000 R09: 0000000000000000
[ 2660.464261] R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000001
[ 2660.464262] R13: 0000000000000000 R14: 00007ffcf40521d0 R15: 0000000000000000

If I remove the offending mezzanine controller, the vm boots and all attached drives behave normally.

Code:
root@pve2:~# pveversion -v
proxmox-ve: 6.0-2 (running kernel: 5.0.15-1-pve)
pve-manager: 6.0-4 (running version: 6.0-4/2a719255)
pve-kernel-5.0: 6.0-5
pve-kernel-helper: 6.0-5
pve-kernel-4.15: 5.4-6
pve-kernel-4.13: 5.2-2
pve-kernel-5.0.15-1-pve: 5.0.15-1
pve-kernel-4.15.18-18-pve: 4.15.18-44
pve-kernel-4.15.18-15-pve: 4.15.18-40
pve-kernel-4.15.18-14-pve: 4.15.18-39
pve-kernel-4.15.18-12-pve: 4.15.18-36
pve-kernel-4.15.18-11-pve: 4.15.18-34
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-7-pve: 4.15.18-27
pve-kernel-4.15.17-3-pve: 4.15.17-14
pve-kernel-4.13.16-4-pve: 4.13.16-51
pve-kernel-4.13.16-3-pve: 4.13.16-50
pve-kernel-4.13.16-1-pve: 4.13.16-46
pve-kernel-4.13.13-4-pve: 4.13.13-35
pve-kernel-4.13.4-1-pve: 4.13.4-26
pve-kernel-4.10.17-4-pve: 4.10.17-24
pve-kernel-4.10.17-3-pve: 4.10.17-23
pve-kernel-4.10.17-1-pve: 4.10.17-18
pve-kernel-4.10.15-1-pve: 4.10.15-15
pve-kernel-4.4.35-1-pve: 4.4.35-76
ceph: 14.2.1-pve2
ceph-fuse: 14.2.1-pve2
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.10-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-2
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-5
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-61
lxcfs: 3.0.3-pve60
novnc-pve: 1.0.0-60
openvswitch-switch: 2.10.0+2018.08.28+git.8ca7c82b7d+ds1-12
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-5
pve-cluster: 6.0-4
pve-container: 3.0-4
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-5
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-2
pve-qemu-kvm: 4.0.0-3
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-5
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve1

Code:
root@pve2:~# cat /etc/pve/qemu-server/100.conf
bios: ovmf
boot: cdn
bootdisk: virtio0
cores: 3
efidisk0: local-lvm:vm-100-disk-2,size=4M
hotplug: disk,network,usb
hostpci0: 04:00.0,pcie=1
hostpci0: 01:00.0,pcie=1
ide2: none,media=cdrom
machine: q35,kernel_irqchip=on
memory: 12000
name: Storage
net0: virtio=1A:34:38:5D:EE:FE,bridge=vmbr0,queues=1,tag=3
numa: 0
onboot: 0
ostype: l26
scsihw: virtio-scsi-pci
smbios1: uuid=8381717e-a35c-43e3-bc35-55e18576e222
sockets: 1
virtio0: local-lvm:vm-100-disk-1,cache=writeback,size=16G
virtio1: local-lvm:vm-100-disk-3,cache=writeback,size=8G
virtio2: local-lvm:vm-100-disk-0,backup=0,cache=writeback,size=110G

I have tried changing the machine tags to "pc-q35-3.1" both with and without the "kernel_irqchip=on" option as described in https://forum.proxmox.com/threads/gpu-passthought-ko-with-kernel-5-0-pve-6-0.56059/
 
did you blacklist the appropriate kernel driver for those cards on the host?

the dmesg output indicated that the host is using the card (maybe the name of the driver changed from kernel 4.15 to 5.0)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!