Kernel oops lpfc after migration to 7.1

Feb 5, 2019
5
0
21
50
I have 2 server hp
- DL380 g6
- DL360p g8

all servers are connected to a storage (netapp 3400) via fiber channel adapters:

- Fibre Channel: Emulex Corporation Zephyr-X LightPulse Fibre Channel Host Adapter (rev 02)

All work fine up to proxmox 7.0

After migration to 7.1 with kernel 5.13.19-1-pve server boot extremely slow and nothing work (VM not start, not network, ecc ecc ecc )
if i try to boot with kernel 5.11.22-7-pve (last from 7.0) all work fine

I have try to find the motivation and i have see this problem :
When boot with 5.13.19-1-pve , during boot appare a kernel oops on lpfc

The problem exist on all two servers

Are there a bug or other?

This is a part of /var/log/messages with oops

Code:
Nov 19 00:16:54 lb-pve01 kernel: [    2.792069] sd 2:2:2:0: Attached scsi generic sg4 type 0
Nov 19 00:16:54 lb-pve01 kernel: [    2.792108] sd 2:2:2:0: [sdc] 3902341120 512-byte logical blocks: (2.00 TB/1.82 TiB)
Nov 19 00:16:54 lb-pve01 kernel: [    2.792153] sd 2:2:1:0: [sdb] Write Protect is off
Nov 19 00:16:54 lb-pve01 kernel: [    2.792156] sd 2:2:2:0: [sdc] Write Protect is off
Nov 19 00:16:54 lb-pve01 kernel: [    2.792223] sd 2:2:2:0: [sdc] Write cache: enabled, read cache: disabled, doesn't support DPO or FUA
Nov 19 00:16:54 lb-pve01 kernel: [    2.792290] sd 2:2:1:0: [sdb] Write cache: enabled, read cache: disabled, doesn't support DPO or FUA
Nov 19 00:16:54 lb-pve01 kernel: [    2.816611]  sdb: sdb1
Nov 19 00:16:54 lb-pve01 kernel: [    2.848210] sd 2:2:1:0: [sdb] Attached SCSI disk
Nov 19 00:16:54 lb-pve01 kernel: [    2.858735] usb 6-1: new full-speed USB device number 2 using uhci_hcd
Nov 19 00:16:54 lb-pve01 kernel: [    2.871950] sd 2:2:2:0: [sdc] Attached SCSI disk
Nov 19 00:16:54 lb-pve01 kernel: [    2.919551] e1000e 0000:0a:00.1 eth1: (PCI Express:2.5GT/s:Width x4) 00:26:55:dc:03:0f
Nov 19 00:16:54 lb-pve01 kernel: [    2.919557] e1000e 0000:0a:00.1 eth1: Intel(R) PRO/1000 Network Connection
Nov 19 00:16:54 lb-pve01 kernel: [    2.919633] e1000e 0000:0a:00.1 eth1: MAC: 0, PHY: 4, PBA No: D51930-007
Nov 19 00:16:54 lb-pve01 kernel: [    2.921120] e1000e 0000:0a:00.0 ens2f0: renamed from eth0
Nov 19 00:16:54 lb-pve01 kernel: [    2.963054] e1000e 0000:0a:00.1 ens2f1: renamed from eth1
Nov 19 00:16:54 lb-pve01 kernel: [    2.992558] random: lvm: uninitialized urandom read (4 bytes read)
Nov 19 00:16:54 lb-pve01 kernel: [    3.023016] random: lvm: uninitialized urandom read (4 bytes read)
Nov 19 00:16:54 lb-pve01 kernel: [    3.027957] random: lvm: uninitialized urandom read (2 bytes read)
Nov 19 00:16:54 lb-pve01 kernel: [    3.029959] usb 6-1: New USB device found, idVendor=03f0, idProduct=1027, bcdDevice= 0.02
Nov 19 00:16:54 lb-pve01 kernel: [    3.029965] usb 6-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Nov 19 00:16:54 lb-pve01 kernel: [    3.029968] usb 6-1: Product: Virtual Keyboard
Nov 19 00:16:54 lb-pve01 kernel: [    3.029969] usb 6-1: Manufacturer: HP
Nov 19 00:16:54 lb-pve01 kernel: [    3.038861] PGD 0 P4D 0
Nov 19 00:16:54 lb-pve01 kernel: [    3.038905] Oops: 0000 [#1] SMP NOPTI
Nov 19 00:16:54 lb-pve01 kernel: [    3.038949] CPU: 12 PID: 275 Comm: systemd-udevd Tainted: G          I       5.13.19-1-pve #1
Nov 19 00:16:54 lb-pve01 kernel: [    3.039002] Hardware name: HP ProLiant DL380 G6, BIOS P62 08/16/2015
Nov 19 00:16:54 lb-pve01 kernel: [    3.039050] RIP: 0010:lpfc_dmp_dbg.part.0+0x2f/0xc0 [lpfc]
Nov 19 00:16:54 lb-pve01 kernel: [    3.039163] Code: 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 e8 e6 bb 02 00 48 89 c6 48 85 c0 74 28 41 0f b7 8c 24 98 17 00 00 31 c0 eb 12 <8b> 92 e4 02 00 00 85 d2 75 6c 48 83 c0 01 39 c1 7c 09 48 8b 14 c6
Nov 19 00:16:54 lb-pve01 kernel: [    3.039235] RSP: 0018:ffffb91d07e738f0 EFLAGS: 00010202
Nov 19 00:16:54 lb-pve01 kernel: [    3.039282] RAX: 0000000000000001 RBX: ffff8a4222219700 RCX: 00000000000000ff
Nov 19 00:16:54 lb-pve01 kernel: [    3.039332] RDX: 000000313a333432 RSI: ffff8a3d08b4d810 RDI: ffff8a3d17481788
Nov 19 00:16:54 lb-pve01 kernel: [    3.039383] RBP: ffffb91d07e73918 R08: 0000000000000008 R09: ffff8a3d08b4d810
Nov 19 00:16:54 lb-pve01 kernel: [    3.039432] R10: 0000000000000007 R11: ffff8a3d17482ea9 R12: ffff8a3d17480000
Nov 19 00:16:54 lb-pve01 kernel: [    3.039482] R13: 00000000ffffffff R14: ffff8a3d15337850 R15: 0000000000000000
Nov 19 00:16:54 lb-pve01 kernel: [    3.039536] FS:  00007f04bcb5d8c0(0000) GS:ffff8a4207b80000(0000) knlGS:0000000000000000
Nov 19 00:16:54 lb-pve01 kernel: [    3.039616] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 19 00:16:54 lb-pve01 kernel: [    3.039677] CR2: 000000313a333716 CR3: 0000000115aea000 CR4: 00000000000006e0
Nov 19 00:16:54 lb-pve01 kernel: [    3.039738] Call Trace:
Nov 19 00:16:54 lb-pve01 kernel: [    3.039795]  lpfc_pci_probe_one+0x1fc3/0x2320 [lpfc]
Nov 19 00:16:54 lb-pve01 kernel: [    3.039908]  ? mutex_lock+0x13/0x40
Nov 19 00:16:54 lb-pve01 kernel: [    3.039974]  local_pci_probe+0x48/0x80
Nov 19 00:16:54 lb-pve01 kernel: [    3.040036]  pci_device_probe+0x105/0x1c0
Nov 19 00:16:54 lb-pve01 kernel: [    3.040093]  really_probe+0x24b/0x4c0
Nov 19 00:16:54 lb-pve01 kernel: [    3.040153]  driver_probe_device+0xf0/0x160
Nov 19 00:16:54 lb-pve01 kernel: [    3.040209]  device_driver_attach+0xab/0xb0
Nov 19 00:16:54 lb-pve01 kernel: [    3.040265]  __driver_attach+0xb2/0x140
Nov 19 00:16:54 lb-pve01 kernel: [    3.040320]  ? device_driver_attach+0xb0/0xb0
Nov 19 00:16:54 lb-pve01 kernel: [    3.040377]  bus_for_each_dev+0x7e/0xc0
Nov 19 00:16:54 lb-pve01 kernel: [    3.040433]  driver_attach+0x1e/0x20
Nov 19 00:16:54 lb-pve01 kernel: [    3.040488]  bus_add_driver+0x135/0x1f0
Nov 19 00:16:54 lb-pve01 kernel: [    3.040544]  driver_register+0x91/0xf0
Nov 19 00:16:54 lb-pve01 kernel: [    3.040601]  __pci_register_driver+0x57/0x60
Nov 19 00:16:54 lb-pve01 kernel: [    3.040659]  lpfc_init+0x10c/0x1000 [lpfc]
Nov 19 00:16:54 lb-pve01 kernel: [    3.040746]  ? 0xffffffffc01bf000
Nov 19 00:16:54 lb-pve01 kernel: [    3.040799]  do_one_initcall+0x46/0x1d0
Nov 19 00:16:54 lb-pve01 kernel: [    3.040858]  ? kmem_cache_alloc_trace+0xfb/0x240
Nov 19 00:16:54 lb-pve01 kernel: [    3.040918]  do_init_module+0x62/0x290
Nov 19 00:16:54 lb-pve01 kernel: [    3.040975]  load_module+0x265e/0x2720
Nov 19 00:16:54 lb-pve01 kernel: [    3.041031]  __do_sys_finit_module+0xc2/0x120
Nov 19 00:16:54 lb-pve01 kernel: [    3.041088]  __x64_sys_finit_module+0x1a/0x20
Nov 19 00:16:54 lb-pve01 kernel: [    3.041143]  do_syscall_64+0x61/0xb0
Nov 19 00:16:54 lb-pve01 kernel: [    3.041201]  ? syscall_exit_to_user_mode+0x27/0x50
Nov 19 00:16:54 lb-pve01 kernel: [    3.041258]  ? __x64_sys_mmap+0x33/0x40
Nov 19 00:16:54 lb-pve01 kernel: [    3.041313]  ? do_syscall_64+0x6e/0xb0
Nov 19 00:16:54 lb-pve01 kernel: [    3.041368]  ? syscall_exit_to_user_mode+0x27/0x50
Nov 19 00:16:54 lb-pve01 kernel: [    3.041424]  ? __x64_sys_lseek+0x1a/0x20
Nov 19 00:16:54 lb-pve01 kernel: [    3.041481]  ? do_syscall_64+0x6e/0xb0
Nov 19 00:16:54 lb-pve01 kernel: [    3.041535]  ? syscall_exit_to_user_mode+0x27/0x50
Nov 19 00:16:54 lb-pve01 kernel: [    3.041591]  ? __x64_sys_newstat+0x16/0x20
Nov 19 00:16:54 lb-pve01 kernel: [    3.041647]  ? do_syscall_64+0x6e/0xb0
Nov 19 00:16:54 lb-pve01 kernel: [    3.041707]  ? syscall_exit_to_user_mode+0x27/0x50
Nov 19 00:16:54 lb-pve01 kernel: [    3.041770]  ? do_syscall_64+0x6e/0xb0
Nov 19 00:16:54 lb-pve01 kernel: [    3.041824]  ? asm_sysvec_call_function+0xa/0x20
Nov 19 00:16:54 lb-pve01 kernel: [    3.041882]  entry_SYSCALL_64_after_hwframe+0x44/0xae
Nov 19 00:16:54 lb-pve01 kernel: [    3.041939] RIP: 0033:0x7f04bd0169b9
Nov 19 00:16:54 lb-pve01 kernel: [    3.041993] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a7 54 0c 00 f7 d8 64 89 01 48
Nov 19 00:16:54 lb-pve01 kernel: [    3.042102] RSP: 002b:00007ffe18c29798 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
Nov 19 00:16:54 lb-pve01 kernel: [    3.042178] RAX: ffffffffffffffda RBX: 0000562e25770f10 RCX: 00007f04bd0169b9
Nov 19 00:16:54 lb-pve01 kernel: [    3.042238] RDX: 0000000000000000 RSI: 00007f04bd1a1e2d RDI: 0000000000000011
Nov 19 00:16:54 lb-pve01 kernel: [    3.042298] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000562e2573a530
Nov 19 00:16:54 lb-pve01 kernel: [    3.042357] R10: 0000000000000011 R11: 0000000000000246 R12: 00007f04bd1a1e2d
Nov 19 00:16:54 lb-pve01 kernel: [    3.042417] R13: 0000000000000000 R14: 0000562e2576d1b0 R15: 0000562e25770f10
Nov 19 00:16:54 lb-pve01 kernel: [    3.042478] Modules linked in: gpio_ich lpfc(+) nvmet_fc nvmet nvme_fc nvme_fabrics nvme_core ehci_pci uhci_hcd hpsa psmouse pata_acpi lpc_ich ehci_hcd bnx2 megaraid_sas e1000e scsi_transport_fc scsi_transport_sas
Nov 19 00:16:54 lb-pve01 kernel: [    3.042601] CR2: 000000313a333716
Nov 19 00:16:54 lb-pve01 kernel: [    3.042668] ---[ end trace 85b36dcdab1dd495 ]---
Nov 19 00:16:54 lb-pve01 kernel: [    3.042725] RIP: 0010:lpfc_dmp_dbg.part.0+0x2f/0xc0 [lpfc]
Nov 19 00:16:54 lb-pve01 kernel: [    3.042845] Code: 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 e8 e6 bb 02 00 48 89 c6 48 85 c0 74 28 41 0f b7 8c 24 98 17 00 00 31 c0 eb 12 <8b> 92 e4 02 00 00 85 d2 75 6c 48 83 c0 01 39 c1 7c 09 48 8b 14 c6
Nov 19 00:16:54 lb-pve01 kernel: [    3.042956] RSP: 0018:ffffb91d07e738f0 EFLAGS: 00010202
Nov 19 00:16:54 lb-pve01 kernel: [    3.043013] RAX: 0000000000000001 RBX: ffff8a4222219700 RCX: 00000000000000ff
Nov 19 00:16:54 lb-pve01 kernel: [    3.043074] RDX: 000000313a333432 RSI: ffff8a3d08b4d810 RDI: ffff8a3d17481788
Nov 19 00:16:54 lb-pve01 kernel: [    3.043134] RBP: ffffb91d07e73918 R08: 0000000000000008 R09: ffff8a3d08b4d810
Nov 19 00:16:54 lb-pve01 kernel: [    3.043194] R10: 0000000000000007 R11: ffff8a3d17482ea9 R12: ffff8a3d17480000
Nov 19 00:16:54 lb-pve01 kernel: [    3.043255] R13: 00000000ffffffff R14: ffff8a3d15337850 R15: 0000000000000000
Nov 19 00:16:54 lb-pve01 kernel: [    3.043315] FS:  00007f04bcb5d8c0(0000) GS:ffff8a4207b80000(0000) knlGS:0000000000000000
Nov 19 00:16:54 lb-pve01 kernel: [    3.043392] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 19 00:16:54 lb-pve01 kernel: [    3.043450] CR2: 000000313a333716 CR3: 0000000115aea000 CR4: 00000000000006e0
Nov 19 00:16:54 lb-pve01 kernel: [    3.044713] hid: raw HID events driver (C) Jiri Kosina
Nov 19 00:16:54 lb-pve01 kernel: [    3.054038] usbcore: registered new interface driver usbhid
Nov 19 00:16:54 lb-pve01 kernel: [    3.054098] usbhid: USB HID core driver
Nov 19 00:16:54 lb-pve01 kernel: [    3.055327] usbcore: registered new interface driver usbmouse
Nov 19 00:16:54 lb-pve01 kernel: [    3.055340] usbcore: registered new interface driver usbkbd
Nov 19 00:16:54 lb-pve01 kernel: [    3.057038] input: HP Virtual Keyboard as /devices/pci0000:00/0000:00:1e.0/0000:01:04.4/usb6/6-1/6-1:1.0/0003:03F0:1027.0001/input/input4
Nov 19 00:16:54 lb-pve01 kernel: [    3.115172] hid-generic 0003:03F0:1027.0001: input,hidraw0: USB HID v1.01 Keyboard [HP Virtual Keyboard] on usb-0000:01:04.4-1/input0
Nov 19 00:16:54 lb-pve01 kernel: [    3.115417] input: HP Virtual Keyboard as /devices/pci0000:00/0000:00:1e.0/0000:01:04.4/usb6/6-1/6-1:1.1/0003:03F0:1027.0002/input/input5
Nov 19 00:16:54 lb-pve01 kernel: [    3.116011] hid-generic 0003:03F0:1027.0002: input,hidraw1: USB HID v1.01 Mouse [HP Virtual Keyboard] on usb-0000:01:04.4-1/input1
Nov 19 00:16:54 lb-pve01 kernel: [    3.650740] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input3
Nov 19 00:16:54 lb-pve01 kernel: [    4.301366] random: crng init done
Nov 19 00:16:54 lb-pve01 kernel: [    4.301434] random: 1 urandom warning(s) missed due to ratelimiting
 
I experienced the same problem with an IBM x3650 7979KZG (Intel Xeon E5405) with identical HBAs when upgrading from PVE 6.4 (kernel 5.4.174-2-pve) to PVE 7.1 (kernel 5.13.19-6-pve).

It seems that there is some bug in the lpfc module in the 5.13 kernel.

In the end I opted for jumping to the 5.15 kernel, skipping 5.13 altogether, i.e.: apt-get install pve-kernel-5.15 && apt-get purge pve-kernel-5.13

(Now I have pve-manager/7.1-12/b3c09de3 (running kernel: 5.15.30-2-pve))

Hope this helps.
 
Last edited:
EDIT @2022-05-06. Please see my next post.

Following up on this topic.

TL;DR: The LPe1150 HBAs are obsolete and don't work with Proxmox 7. Replace them with something newer.
The LPe1150 HBAs are old. They must be upgraded to the latest firmware in order for them to work with Proxmox 7.


As it turns out, it was an incompatibility issue between the Fibre HBA and the lpfc driver. It seems that the LPe1150L-F4-F8C adapter was supported up to the 5.4 kernel branch (lpfc driver v12.6.0.4), but is not on 5.13 onwards (lpfc driver v12.8.0.9).

After upgrading to pve-kernel-5.15 (lpfc driver v14.0.0.4) the server doesn't crash on startup. The error handling in the lpfc module is better than on pve-kernel-5.13. It boots up OK, but there is still no access to the SAN (i.e. the server doesn't detect any LUNs).
dmesg log shows this message:
lpfc 0000:07:00.0: 1:0431 Failed to enable interrupt.

Reviewing this discussion, I see that the lpfc driver has dropped SLI-2 mode support (... "Additionally, the driver no longer supports SLI-2, so only SLI-3 mode should be allowed."), and SLI-2 is the latest available on the LPe-1150-E HBA. I wasn't able to find any firmware upgrade for these HBAs, so I ended up replacing them with LPe-11000 cards.

Make sure to upgrade the HBAs to the latest firmware in order to enable SLI-3 mode support. FW version and SLI mode can be checked with:
cat /sys/class/scsi_host/host*/fwrev
2.82X6 (Z3D2.82X6), sli-3
2.82X6 (Z3D2.82X6), sli-3
 
Last edited:
I found the release notes for the Emulex LPe1150 (see the attached document). The 2.86x6 firmware is also available for them, so they definitely have SLI-3 mode support. Good news, you don't have to throw them away!

The firmware upgrade process is tricky, though. I wasn't able to do it from Linux, so I had to use a Windows 10 PC.

You need the following files from the Broadcom support documents and downloads page:
1.- LPe1150 Firmware and LPe1150 Pair Boot Code (navigating from Legacy Products -> Legacy FC Host Bus Adapters -> LPe1150).
2.- OneInstall-Setup-10.4.255.26 driver. You can directly search for "OneInstall", but you have to go all the way back to version 10.4 because Broadcom removed support for LPe1150 (and many others) starting with version 10.6. If you install the latest driver kit, Windows won't recognize the card in the device manager, and you won't be able to flash the firmware.
3.- elxflashStandalone-windows-10.4.255.16-1 utility. Again, you can directly search for "elxflash", but you have to go all the way back to version 10.4

Unzip the elxflash utility, put WF282A4.ALL in win\x64\firmware\ and WP513A10.PRG (extract it from the .zip first!) in win\x64\boot\
Start an elevated command prompt and run:

D:\elxflashStandalone-windows-10.4.255.16-1\win\x64>elxflash.exe /ff /auto HBA=LPe1150, Port Type=FC, WWN=10:00:00:00:XX:XX:XX:XX, Update=Firmware, Image=WF282A4.ALL, New=282A4, Old=250A6, Status=Success elxflash.exe: All required firmware downloads succeeded - Return Code=0

If it says the are no supported HBAs, check the entries in win\x64\fwmatrix.txt: LPe1150 has to be present in it. If it is not, download earler OneInstall drivers until you find it included in fwmatrix.txt :)

I plugged back the HBAs into the server and confirmed they worked as expected, just like the LPe11000 ones.
 

Attachments

  • Emulex_2.82x6 Firmware_Release_Notes.pdf
    18.7 KB · Views: 5
Last edited:
I found the release notes for the Emulex LPe1150 (see the attached document). The 2.86x6 firmware is also available for them, so they definitely have SLI-3 mode support. Good news, you don't have to throw them away!

The firmware upgrade process is tricky, though. I wasn't able to do it from Linux, so I had to use a Windows 10 PC.

You need the following files from the Broadcom support documents and downloads page:
1.- LPe1150 Firmware and LPe1150 Pair Boot Code (navigating from Legacy Products -> Legacy FC Host Bus Adapters -> LPe1150).
2.- OneInstall-Setup-10.4.255.26 driver. You can directly search for "OneInstall", but you have to go all the way back to version 10.4 because Broadcom removed support for LPe1150 (and many others) starting with version 10.6. If you install the latest driver kit, Windows won't recognize the card in the device manager, and you won't be able to flash the firmware.
3.- elxflashStandalone-windows-10.4.255.16-1 utility. Again, you can directly search for "elxflash", but you have to go all the way back to version 10.4

Unzip the elxflash utility, put WF282A4.ALL in win\x64\firmware\ and WP513A10.PRG (extract it from the .zip first!) in win\x64\boot\
Start an elevated command prompt and run:

D:\elxflashStandalone-windows-10.4.255.16-1\win\x64>elxflash.exe /ff /auto HBA=LPe1150, Port Type=FC, WWN=10:00:00:00:XX:XX:XX:XX, Update=Firmware, Image=WF282A4.ALL, New=282A4, Old=250A6, Status=Success elxflash.exe: All required firmware downloads succeeded - Return Code=0

If it says the are no supported HBAs, check the entries in win\x64\fwmatrix.txt: LPe1150 has to be present in it. If it is not, download earler OneInstall drivers until you find it included in fwmatrix.txt :)

I plugged back the HBAs into the server and confirmed they worked as expected, just like the LPe11000 ones.
Thank You very much !!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!