H11Dsi / Epyc 7402 and VFIO issues

zeroservices

New Member
Jan 12, 2021
3
0
1
35
Hello everyone,

we have a H11Dsi running Proxmox 6.3-3 and latest BIOS. System boots off of NVMe m.2 SSD.
What we want to do is, pass through the onboard SATA Lan controllers.

What we tried is:
Code:
$ cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=1022:7901,15d9:7901

lspci confirms the Controllers are using vfio-pci:
Code:
$ lspci -knn | grep -A2 SATA
43:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
    Subsystem: Super Micro Computer Inc FCH SATA Controller [AHCI mode] [15d9:7901]
    Kernel driver in use: vfio-pci
    Kernel modules: ahci
44:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
    Subsystem: Super Micro Computer Inc FCH SATA Controller [AHCI mode] [15d9:7901]
    Kernel driver in use: vfio-pci
    Kernel modules: ahci
--
86:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
    Subsystem: Super Micro Computer Inc FCH SATA Controller [AHCI mode] [15d9:7901]
    Kernel driver in use: vfio-pci
    Kernel modules: ahci
87:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
    Subsystem: Super Micro Computer Inc FCH SATA Controller [AHCI mode] [15d9:7901]
    Kernel driver in use: vfio-pci
    Kernel modules: ahci

We now created a new VM with PCI passed through:
Screenshot 2021-01-12 at 12.31.31.png

When we start the VM I get messaged like below and the system starts to hang:
Code:
[16343.254314] vfio-pci 0000:43:00.0: not ready 1023ms after FLR; waiting
[16345.302265] vfio-pci 0000:43:00.0: not ready 2047ms after FLR; waiting
[16348.406183] vfio-pci 0000:43:00.0: not ready 4095ms after FLR; waiting
[16353.526033] vfio-pci 0000:43:00.0: not ready 8191ms after FLR; waiting
[16362.741776] vfio-pci 0000:43:00.0: not ready 16383ms after FLR; waiting
[16380.917293] vfio-pci 0000:43:00.0: not ready 32767ms after FLR; waiting
[16415.732320] vfio-pci 0000:43:00.0: not ready 65535ms after FLR; giving up
[16426.873342] INFO: NMI handler (ghes_notify_nmi) took too long to run: 999.434 msecs
[16445.862658] watchdog: BUG: soft lockup - CPU#13 stuck for 22s! [task UPID:pve01:30500]
[16445.862692] Modules linked in: binfmt_misc udp_diag tcp_diag inet_diag veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter softdog nfnetlink_log nfnetlink ipmi_ssif zfs(PO) zunicode(PO) zlua(PO) amd64_edac_mod zavl(PO) edac_mce_amd icp(PO) kvm_amd kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper pcspkr input_leds joydev ast drm_vram_helper ttm drm_kms_helper drm fb_sys_fops syscopyarea sysfillrect sysimgblt ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid zcommon(PO) znvpair(PO) spl(O) vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vfio_pci vfio_virqfd irqbypass vfio_iommu_type1 vfio sunrpc ip_tables x_tables autofs4 btrfs xor zstd_compress hid_generic usbmouse usbkbd usbhid hid raid6_pq dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c ixgbe igb xfrm_algo i2c_algo_bit mdio dca xhci_pci xhci_hcd
[16445.862732]  i2c_piix4
[16445.862735] CPU: 13 PID: 30500 Comm: task UPID:pve01 Tainted: P           O      5.4.78-2-pve #1
[16445.862735] Hardware name: Supermicro Super Server/H11DSi, BIOS 2.1 02/21/2020
[16445.862739] RIP: 0010:pci_mmcfg_read+0x9c/0xd0
[16445.862741] Code: ff 01 74 0d 31 c0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 49 63 c5 48 01 d8 8a 00 0f b6 c0 41 89 04 24 eb e2 49 63 c5 48 01 d8 8b 00 <41> 89 04 24 eb d4 49 63 c5 48 01 d8 66 8b 00 0f b7 c0 41 89 04 24
[16445.862742] RSP: 0018:ffffb50a00b23c90 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
[16445.862743] RAX: 00000000ffffffff RBX: ffffb50a14300000 RCX: 0000000000000ffc
[16445.862743] RDX: 00000000000000ff RSI: 0000000000000043 RDI: 0000000000000000
[16445.862744] RBP: ffffb50a00b23cb8 R08: 0000000000000004 R09: ffffb50a00b23cec
[16445.862744] R10: 0000000000000000 R11: 00000eef1a7d3c15 R12: ffffb50a00b23cec
[16445.862744] R13: 0000000000000ffc R14: 0000000000000000 R15: 0000000000000004
[16445.862745] FS:  00007f610fd6b1c0(0000) GS:ffff8b174e540000(0000) knlGS:0000000000000000
[16445.862746] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[16445.862746] CR2: 000055ba44ded0f0 CR3: 0000000f6c738000 CR4: 0000000000340ee0
[16445.862747] Call Trace:
[16445.862751]  raw_pci_read+0x35/0x40
[16445.862753]  pci_read+0x2c/0x30
[16445.862755]  pci_bus_read_config_dword+0x4a/0x70
[16445.862756]  pci_read_config_dword+0x23/0x40
[16445.862758]  pci_find_next_ext_capability.part.21+0x66/0xc0
[16445.862759]  pci_find_ext_capability.part.22+0x12/0x20
[16445.862760]  pci_restore_state.part.47+0xb9/0x420
[16445.862762]  pci_dev_restore+0x4b/0x60
[16445.862763]  pci_reset_function+0x49/0x70
[16445.862764]  reset_store+0x5e/0xa0
[16445.862766]  dev_attr_store+0x17/0x30
[16445.862769]  sysfs_kf_write+0x3b/0x40
[16445.862770]  kernfs_fop_write+0xda/0x1c0
[16445.862772]  __vfs_write+0x1b/0x40
[16445.862773]  vfs_write+0xab/0x1b0
[16445.862774]  ksys_write+0x61/0xe0
[16445.862775]  __x64_sys_write+0x1a/0x20
[16445.862777]  do_syscall_64+0x57/0x190
[16445.862779]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[16445.862781] RIP: 0033:0x7f610ff78471
[16445.862782] Code: 00 00 75 05 48 83 c4 58 c3 e8 0b 4d ff ff 66 2e 0f 1f 84 00 00 00 00 00 90 8b 05 da ef 00 00 85 c0 75 16 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 41 54 49 89 d4 55 48
[16445.862782] RSP: 002b:00007fffcfba76e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[16445.862783] RAX: ffffffffffffffda RBX: 000055ccd220a260 RCX: 00007f610ff78471
[16445.862783] RDX: 0000000000000001 RSI: 000055ccd7a76480 RDI: 0000000000000009
[16445.862784] RBP: 000055ccd7a76480 R08: 0000000000000000 R09: aaaaaaaaaaaaaaab
[16445.862784] R10: 000055ccd79f25b8 R11: 0000000000000246 R12: 0000000000000001
[16445.862784] R13: 000055ccd220a260 R14: 0000000000000009 R15: 000055ccd7a75360

Does anyone have any ideas how to get the controller passed through?
I did not expect this to be an issue in 2021... worked like a charm with Intel for years now :/

Thanks in advance!
 
Hello have you tried to pass through all 4 controllers / ports one by one? maybe together?
And what does the exact hardware config look like? Sata 0-3 are served by cpu1 4-7 and the sata dom by cpu2
 
Last edited:
In case anyone else stumbles across this issue:

We have tried every possible combination with that onboard SATA controller and never got this working.
Ordered Broadcom 9400-8i, passed that one through. Directly works as expected.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!