Buggy behavior of the driver fnic in kernel 6.8.4-2-pve

May 7, 2024
3
0
1
Hi. We have a cluster Proxmox whose servers (blades) boot their disks using driver fnic via FCoE
CPUs in some servers are a few years old, so their kernel boot with mitigations for L1TF (nosmt kvm-intel.vmentry_l1d_flush=always). It implies the kernel disables hyperthreading on those servers.

After the following problems, I have find that many changes have been made to Introduce support for multiqueue (MQ) in fnic

After upgrading to PVE 8.2 and booting with kernel 6.8.4-2-pve several bugs have arisen:
  1. Kernel oops related to work queues for driver fnic

    Code:
    [ 8.725389] ------------[ cut here ]------------
    [    8.735795] WARNING: CPU: 8 PID: 422 at kernel/workqueue.c:1790 __queue_work+0x3b3/0x4e0
    [    8.744831] Modules linked in: hid fnic libfcoe libfc ehci_pci crc32_pclmul scsi_transport_fc enic ehci_hcd lpc_ich wmi
    [    8.756873] CPU: 8 PID: 422 Comm: (udev-worker) Tainted: G        W          6.8.4-2-pve #1
    [    8.766193] Hardware name: Cisco Systems Inc UCSB-B200-M4/UCSB-B200-M4, BIOS B200M4.4.0.1d.0.1003181546 10/03/2018
    [    8.777743] RIP: 0010:__queue_work+0x3b3/0x4e0
    [    8.782694] Code: 25 80 43 03 00 f6 47 2c 20 75 77 0f 0b 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc <0f> 0b e9 3d fe ff ff 48 8b 0b 44 89 e0 49 8d 57 68 83 c8 07 83 e1
    [    8.803652] RSP: 0018:ffffb45f4c850e58 EFLAGS: 00010086
    [    8.809482] RAX: 0000000000000000 RBX: ffff96905d3c2520 RCX: 0000000000000000
    [    8.817445] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
    [    8.825408] RBP: ffffb45f4c850e90 R08: 0000000000000000 R09: 0000000000000000
    [    8.833370] R10: 0000000000000000 R11: 0000000000000000 R12: ffff96afca1ab600
    [    8.841335] R13: ffff96905d3c2528 R14: ffff969040214000 R15: ffff96afca1c2a00
    [    8.849298] FS:  00007868402d08c0(0000) GS:ffff96af3f800000(0000) knlGS:0000000000000000
    [    8.858327] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [    8.864740] CR2: 0000786840941346 CR3: 000000208b306004 CR4: 00000000001706f0
    [    8.872703] Call Trace:
    [    8.875428]  <IRQ>
    [    8.877670]  ? show_regs+0x6d/0x80
    [    8.881466]  ? __warn+0x89/0x160
    [    8.885069]  ? __queue_work+0x3b3/0x4e0
    [    8.889348]  ? report_bug+0x17e/0x1b0
    [    8.893434]  ? handle_bug+0x46/0x90
    [    8.897326]  ? exc_invalid_op+0x18/0x80
    [    8.901605]  ? asm_exc_invalid_op+0x1b/0x20
    [    8.906275]  ? __queue_work+0x3b3/0x4e0
    [    8.910556]  ? __queue_work+0x101/0x4e0
    [    8.914834]  queue_work_on+0x67/0x70
    [    8.918825]  fnic_wq_copy_cmpl_handler+0x4b2/0x7a0 [fnic]
    [    8.924859]  fnic_isr_msix_wq_copy+0x81/0xe0 [fnic]
    [    8.930311]  __handle_irq_event_percpu+0x4f/0x1c0
    [    8.935559]  handle_irq_event+0x39/0x80
    [    8.939838]  handle_edge_irq+0x8c/0x250
    [    8.944116]  __common_interrupt+0x41/0xb0
    [    8.948588]  common_interrupt+0x9f/0xb0
    [    8.952867]  </IRQ>
    [    8.955205]  <TASK>
    [    8.957543]  asm_common_interrupt+0x27/0x40
    [    8.962211] RIP: 0010:lookup_fast+0x70/0x100
    [    8.966977] Code: 4c 89 e0 41 5c 41 5d 5d 31 d2 31 f6 31 ff c3 cc cc cc cc 4c 89 ef e8 ef 4b 01 00 49 89 c4 48 85 c0 74 48 41 f6 04 24 04 74 d5 <49> 8b 44 24 68 8b 73 38 4c 89 e7 48 8b 00 e8 8d 5c cb 00 41 89 c5
    [    8.987934] RSP: 0018:ffffb45f4e387ac0 EFLAGS: 00000202
    [    8.993763] RAX: ffff96905a99a300 RBX: ffffb45f4e387bd0 RCX: 0000000000000000
    [    9.001726] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
    [    9.009688] RBP: ffffb45f4e387ad8 R08: 0000000000000000 R09: 0000000000000000
    [    9.017650] R10: 0000000000000000 R11: 0000000000000000 R12: ffff96905a99a300
    [    9.025612] R13: ffff96905a99be00 R14: d0d0d0d0d0d0d0d0 R15: 2f2f2f2f2f2f2f2f
    [    9.033576]  walk_component+0x2c/0x190
    [    9.037758]  ? inode_permission+0x74/0x1b0
    [    9.042329]  link_path_walk.part.0.constprop.0+0x2af/0x3c0
    [    9.048450]  ? path_init+0x298/0x3d0
    [    9.052437]  path_lookupat+0x3e/0x1a0
    [    9.056522]  filename_lookup+0xe4/0x200
    [    9.060802]  vfs_statx+0x95/0x1d0
    [    9.064501]  vfs_fstatat+0xaa/0xe0
    [    9.068297]  __do_sys_newfstatat+0x44/0x90
    [    9.072868]  __x64_sys_newfstatat+0x1c/0x30
    [    9.077536]  do_syscall_64+0x87/0x180
    [    9.081621]  ? __fput+0x15e/0x2e0
    [    9.085320]  ? syscall_exit_to_user_mode+0x86/0x260
    [    9.090764]  ? do_syscall_64+0x93/0x180
    [    9.095041]  ? do_syscall_64+0x93/0x180
    [    9.099318]  ? do_syscall_64+0x93/0x180
    [    9.103596]  entry_SYSCALL_64_after_hwframe+0x73/0x7b
    [    9.109232] RIP: 0033:0x7868409d375a
    [    9.113221] Code: 48 89 f2 b9 00 01 00 00 48 89 fe bf 9c ff ff ff e9 0b 00 00 00 66 2e 0f 1f 84 00 00 00 00 00 90 41 89 ca b8 06 01 00 00 0f 05 <3d> 00 f0 ff ff 77 07 31 c0 c3 0f 1f 40 00 48 8b 15 71 a6 0d 00 f7
    [    9.134175] RSP: 002b:00007ffd89d76518 EFLAGS: 00000202 ORIG_RAX: 0000000000000106
    [    9.142621] RAX: ffffffffffffffda RBX: 00005d57009940e0 RCX: 00007868409d375a
    [    9.150584] RDX: 00007ffd89d76570 RSI: 00005d5700992f20 RDI: 00000000ffffff9c
    [    9.158546] RBP: 00005d570097de27 R08: 0000786840aaed00 R09: 0000000000000000
    [    9.166508] R10: 0000000000000100 R11: 0000000000000202 R12: 00007ffd89d76658
    [    9.174471] R13: 00007ffd89d76570 R14: 0000000000000000 R15: 00007ffd89d76548
    [    9.182434]  </TASK>
    [    9.184869] ---[ end trace 0000000000000000 ]---

    After two oops (perhaps because there are two hba interfaces), kernel goes on.

  2. Servers with previous mitigations can not find its disk and so, they did not boot

    The call to module libfc from module fnic does not appear in kernel ring buffer, at least after 60 seconds, and the hba interfaces are not activated.
    Session with BusyBox shell appears.

  3. If I remove parameter for mitigations nosmt and disable hyperthreading in BIOS, the kernel boots :oops:
    I don't understand the relationship between deactivating hyperthreading at the operating system level and fnic hang

  4. Even if servers can boot without nosmt mitigation, they take 20 seconds to find its disk. With kernel 6.5.13-5-pve is inmediate

    kernel 6.8.4-2-pve

    Code:
    [    2.998153] scsi host1: fnic[    3.532183] fnic: Resetting the read idx[    3.532335] host0: libfc: Link up on port (000000)
    [    3.543134] host0: Assigned Port ID eb0003
    [    3.911345] host1: libfc: Link up on port (000000)
    [    3.923807] host1: Assigned Port ID 9c0006
    [   23.955279] scsi 1:0:0:0: Direct-Access     NETAPP   LUN C-Mode       9131 PQ: 0 ANSI: 5
    [   23.955614] scsi 0:0:0:0: Direct-Access     NETAPP   LUN C-Mode       9131 PQ: 0 ANSI: 5
    [   23.958037] sd 1:0:0:0: Attached scsi generic sg0 type 0
    [   23.958282] sd 1:0:0:0: Power-on or device reset occurred
    [   23.958506] sd 0:0:0:0: Attached scsi generic sg1 type 0
    [   23.958651] sd 1:0:0:0: [sda] 83886080 512-byte logical blocks: (42.9 GB/40.0 GiB)
    [   23.958768] sd 0:0:0:0: Power-on or device reset occurred
    [   23.958778] sd 1:0:0:0: [sda] 4096-byte physical blocks
    [   23.958859] scsi 1:0:1:0: Direct-Access     NETAPP   LUN C-Mode       9131 PQ: 0 ANSI: 5
    [   23.959126] sd 0:0:0:0: [sdb] 83886080 512-byte logical blocks: (42.9 GB/40.0 GiB)
    [   23.959245] sd 1:0:0:0: [sda] Write Protect is off
    [   23.959252] sd 0:0:0:0: [sdb] 4096-byte physical blocks

    kernel 6.5.13-5-pve

    Code:
    [    3.117890] scsi host1: fnic
    [    3.646505] fnic: Resetting the read idx
    [    3.646645] host0: libfc: Link up on port (000000)
    [    3.659189] host0: Assigned Port ID eb0003
    [    3.663652] scsi 0:0:0:0: Direct-Access     NETAPP   LUN C-Mode       9131 PQ: 0 ANSI: 5
    [    3.665961] sd 0:0:0:0: Attached scsi generic sg0 type 0
    [    3.666219] sd 0:0:0:0: Power-on or device reset occurred
    [    3.666556] sd 0:0:0:0: [sda] 83886080 512-byte logical blocks: (42.9 GB/40.0 GiB)
    [    3.666686] sd 0:0:0:0: [sda] 4096-byte physical blocks
    [    3.666914] sd 0:0:0:0: [sda] Write Protect is off
    [    3.666980] scsi 0:0:1:0: Direct-Access     NETAPP   LUN C-Mode       9131 PQ: 0 ANSI: 5
    [    3.667001] sd 0:0:0:0: [sda] Mode Sense: af 00 00 08
    [    3.667289] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
    [    3.667769] sd 0:0:0:0: [sda] Preferred minimum I/O size 4096 bytes
    [    3.667861] sd 0:0:0:0: [sda] Optimal transfer size 65536 bytes
    [    3.669276] sd 0:0:1:0: Attached scsi generic sg1 type 0
    [    3.669570] sd 0:0:1:0: Power-on or device reset occurred
    [    3.669857] sd 0:0:1:0: [sdb] 83886080 512-byte logical blocks: (42.9 GB/40.0 GiB)
    [    3.669983] sd 0:0:1:0: [sdb] 4096-byte physical blocks
    [    3.670283] sd 0:0:1:0: [sdb] Write Protect is off
    [    3.670371] sd 0:0:1:0: [sdb] Mode Sense: af 00 00 08
Does anyone have problems using driver fnic with kernel 6.8?
 
Last edited:
We detected this issue already with PVE 7.2. It is related to the buggy fnic driver versions used in newer kernels. The SUSE team reported it already January 2023 - but Cisco seems not longer to be interested in fixing FC issues.

Our workaround : proxmox-boot-tool kernel pin 5.13.19-6-pve
this kernel uses fnic 1.6.0.53 that works fine
... and hope that PVE will support this kernel until we can kick out our Cisco hardware
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!