Opt-in Linux 6.8 Kernel for Proxmox VE 8 available on test & no-subscription

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
6,395
3,187
303
South Tyrol/Italy
shop.proxmox.com
We recently uploaded a 6.8 kernel into our repositories, it will be used as new default kernel in the next Proxmox VE 8.2 point release (Q2'2024).
This follows our tradition of upgrading the Proxmox VE kernel to match the current Ubuntu version until we reach an (Ubuntu) LTS release. This kernel is based on the upcoming Ubuntu 24.04 Noble release.

We have run this kernel on some parts of our test setups over the last few days without any notable issues.

How to install:
  1. Ensure that either the pve-no-subscription or pvetest repository is set up correctly.
    You can do so via CLI text-editor or using the web UI under Node -> Repositories.
  2. Open a shell as root, e.g. through SSH or using the integrated shell on the web UI.
  3. apt update
  4. apt install proxmox-kernel-6.8
  5. reboot
Future updates to the 6.8 kernel will now be installed automatically when upgrading a node.

Please note:
  • The current 6.5 kernel is still supported and will still receive updates until the 6.8 becomes the new default.
  • There were many changes, for improved hardware support and performance improvements all over the place.
    Examples include adding the EEVDF (Earliest Eligible Virtual Deadline First) task scheduler, improving latencies, the new shadow stacks to prevent exploits, and a new advisor for automated tuning of the KSM (Kernel Same-page Merging) subsystem. For a more complete list of changes we recommend checking out the kernel-newbies site for 6.6, 6.7, and the LWN's 6.8 merge window part 1 and part 2.
  • The kernel is also available on the test and no-subscription repositories of Proxmox Backup Server and Proxmox Mail Gateway.
  • If you're unsure, we recommend continuing to use the 6.5-based kernel for now.

Feedback about how the new kernel performs in any of your setups is welcome!
Please provide basic details like CPU model, storage types used, ZFS as root file system, and the like, for both positive feedback or if you ran into some issues, where the 6.8 kernel seems to be the likely cause.
 
Unable to build gasket-dkms

Code:
Building initial module for 6.8.1-1-pve
Deprecated feature: REMAKE_INITRD (/var/lib/dkms/gasket/1.0/source/dkms.conf)
Error! Bad return status for module build on kernel: 6.8.1-1-pve (x86_64)
Consult /var/lib/dkms/gasket/1.0/build/make.log for more information.
dpkg: error processing package gasket-dkms (--install):
 installed gasket-dkms package post-installation script subprocess returned error exit status 10
Errors were encountered while processing:
 gasket-dkms

Contents of make.log:

Code:
DKMS make.log for gasket-1.0 for kernel 6.8.1-1-pve (x86_64)
Sat Apr  6 04:10:05 AM PDT 2024
make: Entering directory '/usr/src/linux-headers-6.8.1-1-pve'
  CC [M]  /var/lib/dkms/gasket/1.0/build/gasket_core.o
  CC [M]  /var/lib/dkms/gasket/1.0/build/gasket_ioctl.o
  CC [M]  /var/lib/dkms/gasket/1.0/build/gasket_interrupt.o
  CC [M]  /var/lib/dkms/gasket/1.0/build/gasket_page_table.o
  CC [M]  /var/lib/dkms/gasket/1.0/build/gasket_sysfs.o
  CC [M]  /var/lib/dkms/gasket/1.0/build/apex_driver.o
/var/lib/dkms/gasket/1.0/build/gasket_interrupt.c: In function ‘gasket_handle_interrupt’:
/var/lib/dkms/gasket/1.0/build/gasket_interrupt.c:161:17: error: too many arguments to function ‘eventfd_signal’
  161 |                 eventfd_signal(ctx, 1);
      |                 ^~~~~~~~~~~~~~
In file included from /var/lib/dkms/gasket/1.0/build/gasket_interrupt.h:11,
                 from /var/lib/dkms/gasket/1.0/build/gasket_interrupt.c:4:
./include/linux/eventfd.h:87:20: note: declared here
   87 | static inline void eventfd_signal(struct eventfd_ctx *ctx)
      |                    ^~~~~~~~~~~~~~
make[2]: *** [scripts/Makefile.build:243: /var/lib/dkms/gasket/1.0/build/gasket_interrupt.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [/usr/src/linux-headers-6.8.1-1-pve/Makefile:1926: /var/lib/dkms/gasket/1.0/build] Error 2
make: *** [Makefile:240: __sub-make] Error 2
make: Leaving directory '/usr/src/linux-headers-6.8.1-1-pve'
 
Just updated and noticed that r8125-dkms drivers won't compile with the new kernel.

A fix has already been provided for ubuntu users: https://bugs.launchpad.net/ubuntu/+source/r8125/+bug/2059256
If they could just be bothered to create actual good upstream drivers... But oh well, as this package is available from the Debian repos, and has a significant amount of users, we will look into this and provide an update soonish – thanks for the report in any way!
 
  • Like
Reactions: _gabriel
network did not came up after kernel update, due to interface name change. My 10G Intel NIC's suffered a name change,
from enp2s0f0 to enp2s0f0np0, had to adapt /etc/network/interfaces accordingly.

Code:
[root@pve-mini ~]$ ethtool -i enp2s0f0np0
driver: i40e
version: 6.8.1-1-pve
firmware-version: 9.20 0x8000d8c5 0.0.0
expansion-rom-version:
bus-info: 0000:02:00.0
 
  • Like
Reactions: ucholak
This model is not available from us or Debian, so best to see if upstream has a fix.

From a quick look there it seems that this got already reported: https://github.com/google/gasket-driver/issues/23
That report includes a diff for a fix you could try to apply locally (open a new thread if you have further question to this).
I believe I have implemented the fix for both recent kernels: https://github.com/google/gasket-driver/issues/24#issuecomment-2041102866
 
Hi @all

For me, Proxmox no longer wants to start with the 6.8 kernel.

I have no idea what it is.

It starts and then ends in rescue mode.

I have now started again with the 6.5.13-3-pve kernel.

I have attached an excerpt from the journal in the attached errorlog.txt

Proxmox version:

Code:
proxmox-ve: 8.1.0 (running kernel: 6.5.13-3-pve)
pve-manager: 8.1.10 (running version: 8.1.10/4b06efb5db453f29)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.1-1
proxmox-kernel-6.8.1-1-pve-signed: 6.8.1-1
proxmox-kernel-6.5.13-3-pve-signed: 6.5.13-3
proxmox-kernel-6.5: 6.5.13-3
proxmox-kernel-6.5.13-1-pve-signed: 6.5.13-1
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.3
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.5
libpve-cluster-perl: 8.0.5
libpve-common-perl: 8.1.1
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.6
libpve-network-perl: 0.9.6
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.1.4
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.5-1
proxmox-backup-file-restore: 3.1.5-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.1.5
pve-cluster: 8.0.5
pve-container: 5.0.9
pve-docs: 8.1.5
pve-edk2-firmware: 4.2023.08-4
pve-firewall: 5.0.3
pve-firmware: 3.10-1
pve-ha-manager: 4.0.3
pve-i18n: 3.2.1
pve-qemu-kvm: 8.1.5-4
pve-xtermjs: 5.3.0-3
qemu-server: 8.1.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.3-pve1

I have attached my hardware according to lshw in the attached hardware.txt


Best regards
Marcel
 

Attachments

Hi @all

For me, Proxmox no longer wants to start with the 6.8 kernel.

I have no idea what it is.

It starts and then ends in rescue mode.

I have now started again with the 6.5.13-3-pve kernel.

I have attached an excerpt from the journal in the attached errorlog.txt

Proxmox version:

Code:
proxmox-ve: 8.1.0 (running kernel: 6.5.13-3-pve)
pve-manager: 8.1.10 (running version: 8.1.10/4b06efb5db453f29)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.1-1
proxmox-kernel-6.8.1-1-pve-signed: 6.8.1-1
proxmox-kernel-6.5.13-3-pve-signed: 6.5.13-3
proxmox-kernel-6.5: 6.5.13-3
proxmox-kernel-6.5.13-1-pve-signed: 6.5.13-1
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.3
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.5
libpve-cluster-perl: 8.0.5
libpve-common-perl: 8.1.1
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.6
libpve-network-perl: 0.9.6
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.1.4
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.5-1
proxmox-backup-file-restore: 3.1.5-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.1.5
pve-cluster: 8.0.5
pve-container: 5.0.9
pve-docs: 8.1.5
pve-edk2-firmware: 4.2023.08-4
pve-firewall: 5.0.3
pve-firmware: 3.10-1
pve-ha-manager: 4.0.3
pve-i18n: 3.2.1
pve-qemu-kvm: 8.1.5-4
pve-xtermjs: 5.3.0-3
qemu-server: 8.1.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.3-pve1

I have attached my hardware according to lshw in the attached hardware.txt


Best regards
Marcel
i can only see in your error log, that /dev/disk/by-uuid/99d194ec-763a-425e-81f0-8c45dd214b03 is missing.
maybe you replaced a disk and forgot to update something, but thats not the culprit why it doesn't boot.

However, otherwise in your error log is absolutely nothing, it looks to me like everything work as it should, the only thing is that that error (boot log) is somewhat incomplete, i dont see there the proxmox systemd services starting. like a part of the boot log is missing.

Cheers
 
Hi Ramalama

Nothing was changed on the hard drives.
I just ran apt update, apt install proxmox-kernel-6.8, reboot.
Afterwards the Proxmox only started in rescue mode.
There was nothing else to be found in the journal after the last start.
I created the errorlog.txt with the command "journalctl -S today > /errorlog.txt" from rescue mode and then deleted everything that was before the current boot process.
/dev/disk/by-uuid/99d194ec-763a-425e-81f0-8c45dd214b03 is an HDD that is mounted directly via fstab and is used for backups.
I have no idea where the problem is.
With kernel 6.5.13-3-pve everything works great but with kernel 6.8.1-1-pve the Proxmox no longer starts.
Any idea where I could look to track down the error ?

Best regards
Marcel
 
Last edited:
Hi Ramalama

Nothing was changed on the hard drives.
I just ran apt update, apt install proxmox-kernel-6.8, reboot.
Afterwards the Proxmox only started in rescue mode.
There was nothing else to be found in the journal after the last start.
I created the errorlog.txt with the command "journalctl -S today > /errorlog.txt" from rescue mode and then deleted everything that was before the current boot process.
/dev/disk/by-uuid/99d194ec-763a-425e-81f0-8c45dd214b03 is an HDD that is mounted directly via fstab and is used for backups.
I have no idea where the problem is.
With kernel 6.5.13-3-pve everything works great but with kernel 6.8.1-1-pve the Proxmox no longer starts.
Any idea where I could look to track down the error ?

Best regards
Marcel
The journalctl approach to get the last bootlog, was already great, just sadly it dosn't contain anything usefull. Its just like at a point of booting it stops and the real issue isn't logged.

https://pve.proxmox.com/wiki/Kernel_Crash_Trace_Log
You could add once before rebooting into 6.8 kernel netconsole=5555@10.10.10.1/eth0,5555@10.10.10.2/0c:c4:7a:44:1e:fe loglevel=7 to the linux cmdline, then update-initramfs -k all -u once to get sure, then reboot into 6.8 kernel and get the log files from the other system.
Then reboot back to your 6.5 kernel and remove that string from the cmdline again and update initramfs to get sure again. Then reboot again into your 6.5 kernel, to stop logging to the other system.
Just change the ip/mac and so on, but thats clear i think.

Other as that, i mean if you have an ibmc or something, your could take there screenshots, or make a photo of your physical screen in the right moment.
But you said, that it simply returns to recovery mode, so you probably cannot take a photo/screenshot of the issue itself, so that leaves only the Crash trace method.

Cheers
 
Updated and everthing went smoothly.
I did check
dpkg -l | grep dkms
to make sure that I dont have dkms though.
 
infiniband bnxt_re0 also hangs with firmware-error on 6.8 and there is no network after boot.
back to 6.5 everything is fine again
 

Attachments

Last edited:
network did not came up after kernel update, due to interface name change. My 10G Intel NIC's suffered a name change,
from enp2s0f0 to enp2s0f0np0, had to adapt /etc/network/interfaces accordingly.

Code:
[root@pve-mini ~]$ ethtool -i enp2s0f0np0
driver: i40e
version: 6.8.1-1-pve
firmware-version: 9.20 0x8000d8c5 0.0.0
expansion-rom-version:
bus-info: 0000:02:00.0
Thanks for your feedback, and yeah changes in kernel release, systemd version and moving around HW can unfortunately result in such name changes. IME the ones from kernel updates stabilize once all features of a HW are supported correctly and no new issues come up.

One thing to avoid such changes is to pin the name of the interfaces manually. E.g., one could name a network interface net0 through matching their MAC address in a /etc/systemd/network/00-net0.link configuration like:
Code:
[Match]
MACAddress=aa:bb:cc:12:34:56
# only apply on actual ethernet links to avoid that virtual links (like bridges) match
# could be also wlan or wwan
Type=ether

[Link]
Name=net0

This is something that might be worth to expose as option in our installer.

Edit: change from eth0 to net0 for the example to avoid potential race between kernel and udev.
Edit 2: specify Type to avoid issues when matching virtual links.
 
Last edited:
infiniband bnxt_re0 also hangs with firmware-error on 6.8 and there is no network after boot.
back to 6.5 everything is fine again
Thanks for your report, can you please post the full error line from the kernel (e.g. check journalctl -b-1 for the system log of the last boot)?
 
installed the kernel on the following system and is running fine:

mini-pc with intel pentium gold 8505 (alder-lake u)
intel s3610 running as zfs mirror (also boot-pool)
noname nvme-drive running as xfs media store and swap
6x intel i226v nics

kernel was installed, system rebooted and everything came up the way it was.
no changed device name, no missing functionality.
just works.

edit: also installed in on an ancient atom j1900 pbs system (former igel thinclient, no zfs) with the same result.
system came up and just works as it used to.
 
Last edited:
Thanks for your feedback, and yeah changes in kernel release, systemd version and moving around HW can unfortunately result in such name changes. IME the ones from kernel updates stabilize once all features of a HW are supported correctly and no new issues come up.

One thing to avoid such changes is to pin the name of the interfaces manually. E.g., one could name a network interface eth0 through matching their MAC address in a /etc/systemd/network/00-eth0.link configuration like:
Code:
[Match]
MACAddress=aa:bb:cc:12:34:56
[Link]
Name=eth0

This is something that might be worth to expose as option in our installer
Hi,

I think this should be done by default (with an checkbox to disable it), each kernel/driver upgrade they are breaking name change.

Note that using ethX can be dangerous from the systemd doc
https://www.freedesktop.org/software/systemd/man/latest/systemd.link.html

"Note that specifying a name that the kernel might use for another interface (for example "eth0") is dangerous because the name assignment done by udev will race with the assignment done by the kernel, and only one interface may use the name. Depending on the order of operations, either udev or the kernel will win, making the naming unpredictable. It is best to use some different prefix, for example "internal0"/"external0" or "lan0"/"lan1"/"lan3"."
 
Thanks for your report, can you please post the full error line from the kernel (e.g. check journalctl -b-1 for the system log of the last boot)?
Yes, thank you.

bnxt_en 0000:3d:00.0 (unnamed net_device) (uninitialized): Device requests max timeout of 100 seconds, may trigger hung task watchdog
bnxt_en 0000:3d:00.0: Unable to read VPD

Apr 07 00:07:44 fbo-vmh-024 kernel: ------------[ cut here ]------------
Apr 07 00:07:44 fbo-vmh-024 kernel: UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13
Apr 07 00:07:44 fbo-vmh-024 kernel: shift exponent 64 is too large for 64-bit type 'long unsigned int'
Apr 07 00:07:44 fbo-vmh-024 kernel: CPU: 45 PID: 1471 Comm: (udev-worker) Tainted: P O 6.8.1-1-pve #1
Apr 07 00:07:44 fbo-vmh-024 kernel: Hardware name: Supermicro Super Server/X13DEI-T, BIOS 2.1 12/13/2023
Apr 07 00:07:44 fbo-vmh-024 kernel: Call Trace:
Apr 07 00:07:44 fbo-vmh-024 kernel: <TASK>
Apr 07 00:07:44 fbo-vmh-024 kernel: dump_stack_lvl+0x48/0x70
Apr 07 00:07:44 fbo-vmh-024 kernel: dump_stack+0x10/0x20
Apr 07 00:07:44 fbo-vmh-024 kernel: __ubsan_handle_shift_out_of_bounds+0x1ac/0x360
Apr 07 00:07:44 fbo-vmh-024 kernel: bnxt_qplib_alloc_init_hwq.cold+0x8c/0xd7 [bnxt_re]
Apr 07 00:07:44 fbo-vmh-024 kernel: bnxt_qplib_create_qp+0x1d5/0x8c0 [bnxt_re]
Apr 07 00:07:44 fbo-vmh-024 kernel: bnxt_re_create_qp+0x71d/0xf30 [bnxt_re]
Apr 07 00:07:44 fbo-vmh-024 kernel: ? bnxt_qplib_create_cq+0x247/0x330 [bnxt_re]
Apr 07 00:07:44 fbo-vmh-024 kernel: ? __kmalloc+0x1ab/0x400
Apr 07 00:07:44 fbo-vmh-024 kernel: create_qp+0x17a/0x290 [ib_core]
Apr 07 00:07:44 fbo-vmh-024 kernel: ? create_qp+0x17a/0x290 [ib_core]
Apr 07 00:07:44 fbo-vmh-024 kernel: ib_create_qp_kernel+0x3b/0xe0 [ib_core]
Apr 07 00:07:44 fbo-vmh-024 kernel: create_mad_qp+0x8e/0x100 [ib_core]
Apr 07 00:07:44 fbo-vmh-024 kernel: ? __pfx_qp_event_handler+0x10/0x10 [ib_core]
Apr 07 00:07:44 fbo-vmh-024 kernel: ib_mad_init_device+0x2c2/0x8a0 [ib_core]
Apr 07 00:07:44 fbo-vmh-024 kernel: add_client_context+0x127/0x1c0 [ib_core]
Apr 07 00:07:44 fbo-vmh-024 kernel: enable_device_and_get+0xe6/0x1e0 [ib_core]
Apr 07 00:07:44 fbo-vmh-024 kernel: ib_register_device+0x506/0x610 [ib_core]
Apr 07 00:07:44 fbo-vmh-024 kernel: bnxt_re_probe+0xe7d/0x11a0 [bnxt_re]
Apr 07 00:07:44 fbo-vmh-024 kernel: ? __pfx_bnxt_re_probe+0x10/0x10 [bnxt_re]
Apr 07 00:07:44 fbo-vmh-024 kernel: auxiliary_bus_probe+0x3e/0xa0
Apr 07 00:07:44 fbo-vmh-024 kernel: really_probe+0x1c9/0x430
Apr 07 00:07:44 fbo-vmh-024 kernel: __driver_probe_device+0x8c/0x190
Apr 07 00:07:44 fbo-vmh-024 kernel: driver_probe_device+0x24/0xd0
Apr 07 00:07:44 fbo-vmh-024 kernel: __driver_attach+0x10b/0x210
Apr 07 00:07:44 fbo-vmh-024 kernel: ? __pfx___driver_attach+0x10/0x10
Apr 07 00:07:44 fbo-vmh-024 kernel: bus_for_each_dev+0x8a/0xf0
Apr 07 00:07:44 fbo-vmh-024 kernel: driver_attach+0x1e/0x30
Apr 07 00:07:44 fbo-vmh-024 kernel: bus_add_driver+0x156/0x260
Apr 07 00:07:44 fbo-vmh-024 kernel: driver_register+0x5e/0x130
Apr 07 00:07:44 fbo-vmh-024 kernel: __auxiliary_driver_register+0x73/0xf0
Apr 07 00:07:44 fbo-vmh-024 kernel: ? __pfx_bnxt_re_mod_init+0x10/0x10 [bnxt_re]
Apr 07 00:07:44 fbo-vmh-024 kernel: bnxt_re_mod_init+0x3e/0xff0 [bnxt_re]
Apr 07 00:07:44 fbo-vmh-024 kernel: ? __pfx_bnxt_re_mod_init+0x10/0x10 [bnxt_re]
Apr 07 00:07:44 fbo-vmh-024 kernel: do_one_initcall+0x5b/0x340
Apr 07 00:07:44 fbo-vmh-024 kernel: do_init_module+0x97/0x290
Apr 07 00:07:44 fbo-vmh-024 kernel: load_module+0x213a/0x22a0
Apr 07 00:07:44 fbo-vmh-024 kernel: init_module_from_file+0x96/0x100
Apr 07 00:07:44 fbo-vmh-024 kernel: ? init_module_from_file+0x96/0x100
Apr 07 00:07:44 fbo-vmh-024 kernel: idempotent_init_module+0x11c/0x2b0
Apr 07 00:07:44 fbo-vmh-024 kernel: __x64_sys_finit_module+0x64/0xd0
Apr 07 00:07:44 fbo-vmh-024 kernel: do_syscall_64+0x84/0x180
Apr 07 00:07:44 fbo-vmh-024 kernel: ? syscall_exit_to_user_mode+0x86/0x260
Apr 07 00:07:44 fbo-vmh-024 kernel: ? do_syscall_64+0x93/0x180
Apr 07 00:07:44 fbo-vmh-024 kernel: entry_SYSCALL_64_after_hwframe+0x6e/0x76
Apr 07 00:07:44 fbo-vmh-024 kernel: RIP: 0033:0x7ac146137719
Apr 07 00:07:44 fbo-vmh-024 kernel: Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 f>
Apr 07 00:07:44 fbo-vmh-024 kernel: RSP: 002b:00007ffc8a83b208 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
Apr 07 00:07:44 fbo-vmh-024 kernel: RAX: ffffffffffffffda RBX: 00005f4b75018a80 RCX: 00007ac146137719
Apr 07 00:07:44 fbo-vmh-024 kernel: RDX: 0000000000000000 RSI: 00007ac1462caefd RDI: 000000000000000f
Apr 07 00:07:44 fbo-vmh-024 kernel: RBP: 00007ac1462caefd R08: 0000000000000000 R09: 00005f4b74fd8720
Apr 07 00:07:44 fbo-vmh-024 kernel: R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000
Apr 07 00:07:44 fbo-vmh-024 kernel: R13: 0000000000000000 R14: 00005f4b7500f170 R15: 00005f4b74858ec1
Apr 07 00:07:44 fbo-vmh-024 kernel: </TASK>
Apr 07 00:07:44 fbo-vmh-024 kernel: ---[ end trace ]---

Apr 07 00:08:45 fbo-vmh-024 systemd-udevd[1463]: bnxt_en.rdma.0: Worker [1642] processing SEQNUM=18223 is taking a long time
Apr 07 00:08:45 fbo-vmh-024 systemd-udevd[1463]: bnxt_en.rdma.1: Worker [1471] processing SEQNUM=18226 is taking a long time
Apr 07 00:09:26 fbo-vmh-024 kernel: bnxt_en 0000:3d:00.0: QPLIB: bnxt_re_is_fw_stalled: FW STALL Detected. cmdq[0xe]=0x3 waited (102422 > 100000) msec active 1
Apr 07 00:09:26 fbo-vmh-024 kernel: bnxt_en 0000:3d:00.0 bnxt_re0: Failed to modify HW QP
Apr 07 00:09:26 fbo-vmh-024 kernel: infiniband bnxt_re0: Couldn't change QP1 state to INIT: -110
Apr 07 00:09:26 fbo-vmh-024 kernel: infiniband bnxt_re0: Couldn't start port
Apr 07 00:09:26 fbo-vmh-024 kernel: bnxt_en 0000:3d:00.0 bnxt_re0: Failed to destroy HW QP
Apr 07 00:09:26 fbo-vmh-024 kernel: ------------[ cut here ]------------
Apr 07 00:09:26 fbo-vmh-024 kernel: WARNING: CPU: 11 PID: 1471 at drivers/infiniband/core/cq.c:322 ib_free_cq+0x109/0x150 [ib_core]
Apr 07 00:09:26 fbo-vmh-024 kernel: Modules linked in: ipmi_ssif intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common intel_ifs i10nm_edac nfit x86_pk>
Apr 07 00:09:26 fbo-vmh-024 kernel: nvme_auth i2c_i801 spi_intel_pci megaraid_sas xhci_hcd libahci i2c_smbus spi_intel i2c_ismt wmi pinctrl_emmitsburg
Apr 07 00:09:26 fbo-vmh-024 kernel: CPU: 11 PID: 1471 Comm: (udev-worker) Tainted: P O 6.8.1-1-pve #1
Apr 07 00:09:26 fbo-vmh-024 kernel: Hardware name: Supermicro Super Server/X13DEI-T, BIOS 2.1 12/13/2023
Apr 07 00:09:26 fbo-vmh-024 kernel: RIP: 0010:ib_free_cq+0x109/0x150 [ib_core]
Apr 07 00:09:26 fbo-vmh-024 kernel: Code: e8 fc 9c 02 00 65 ff 0d 9d 87 e5 3e 0f 85 70 ff ff ff 0f 1f 44 00 00 e9 66 ff ff ff 48 8d 7f 50 e8 0c 3a 33 df e9 35 ff ff ff <0f> 0b 31 c0 3>
Apr 07 00:09:26 fbo-vmh-024 kernel: RSP: 0018:ff6fb876ceb3b6f0 EFLAGS: 00010202
Apr 07 00:09:26 fbo-vmh-024 kernel: RAX: 0000000000000002 RBX: 0000000000000001 RCX: 0000000000000000
Apr 07 00:09:26 fbo-vmh-024 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: ff4118c220ef4400
Apr 07 00:09:26 fbo-vmh-024 kernel: RBP: ff6fb876ceb3b760 R08: 0000000000000000 R09: 0000000000000000
Apr 07 00:09:26 fbo-vmh-024 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ff4118c235c00000
Apr 07 00:09:26 fbo-vmh-024 kernel: R13: ff4118c209bb8500 R14: 00000000ffffff92 R15: ff4118c22e88f000
Apr 07 00:09:26 fbo-vmh-024 kernel: FS: 00007ac145a2a8c0(0000) GS:ff4118e0ff780000(0000) knlGS:0000000000000000
Apr 07 00:09:26 fbo-vmh-024 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 07 00:09:26 fbo-vmh-024 kernel: CR2: 00005f4b7509f1e8 CR3: 0000000131da2003 CR4: 0000000000f71ef0
Apr 07 00:09:26 fbo-vmh-024 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 07 00:09:26 fbo-vmh-024 kernel: DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
Apr 07 00:09:26 fbo-vmh-024 kernel: PKRU: 55555554
Apr 07 00:09:26 fbo-vmh-024 kernel: Call Trace:
Apr 07 00:09:26 fbo-vmh-024 kernel: <TASK>
Apr 07 00:09:26 fbo-vmh-024 kernel: ? show_regs+0x6d/0x80
Apr 07 00:09:26 fbo-vmh-024 kernel: ? __warn+0x89/0x160
Apr 07 00:09:26 fbo-vmh-024 kernel: ? ib_free_cq+0x109/0x150 [ib_core]
Apr 07 00:09:26 fbo-vmh-024 kernel: ? report_bug+0x17e/0x1b0
Apr 07 00:09:26 fbo-vmh-024 kernel: ? handle_bug+0x46/0x90
Apr 07 00:09:26 fbo-vmh-024 kernel: ? exc_invalid_op+0x18/0x80
Apr 07 00:09:26 fbo-vmh-024 kernel: ? asm_exc_invalid_op+0x1b/0x20
Apr 07 00:09:26 fbo-vmh-024 kernel: ? ib_free_cq+0x109/0x150 [ib_core]
Apr 07 00:09:26 fbo-vmh-024 kernel: ? ib_mad_init_device+0x54c/0x8a0 [ib_core]
Apr 07 00:09:26 fbo-vmh-024 kernel: add_client_context+0x127/0x1c0 [ib_core]
Apr 07 00:09:26 fbo-vmh-024 kernel: enable_device_and_get+0xe6/0x1e0 [ib_core]
Apr 07 00:09:26 fbo-vmh-024 kernel: ? ib_mad_init_device+0x54c/0x8a0 [ib_core]
Apr 07 00:09:26 fbo-vmh-024 kernel: add_client_context+0x127/0x1c0 [ib_core]
Apr 07 00:09:26 fbo-vmh-024 kernel: enable_device_and_get+0xe6/0x1e0 [ib_core]
Apr 07 00:09:26 fbo-vmh-024 kernel: ib_register_device+0x506/0x610 [ib_core]
Apr 07 00:09:26 fbo-vmh-024 kernel: bnxt_re_probe+0xe7d/0x11a0 [bnxt_re]
Apr 07 00:09:26 fbo-vmh-024 kernel: ? __pfx_bnxt_re_probe+0x10/0x10 [bnxt_re]
Apr 07 00:09:26 fbo-vmh-024 kernel: auxiliary_bus_probe+0x3e/0xa0
Apr 07 00:09:26 fbo-vmh-024 kernel: really_probe+0x1c9/0x430
Apr 07 00:09:26 fbo-vmh-024 kernel: __driver_probe_device+0x8c/0x190
Apr 07 00:09:26 fbo-vmh-024 kernel: driver_probe_device+0x24/0xd0
Apr 07 00:09:26 fbo-vmh-024 kernel: __driver_attach+0x10b/0x210
Apr 07 00:09:26 fbo-vmh-024 kernel: ? __pfx___driver_attach+0x10/0x10
Apr 07 00:09:26 fbo-vmh-024 kernel: bus_for_each_dev+0x8a/0xf0
Apr 07 00:09:26 fbo-vmh-024 kernel: driver_attach+0x1e/0x30
Apr 07 00:09:26 fbo-vmh-024 kernel: bus_add_driver+0x156/0x260
Apr 07 00:09:26 fbo-vmh-024 kernel: driver_register+0x5e/0x130
Apr 07 00:09:26 fbo-vmh-024 kernel: __auxiliary_driver_register+0x73/0xf0
Apr 07 00:09:26 fbo-vmh-024 kernel: ? __pfx_bnxt_re_mod_init+0x10/0x10 [bnxt_re]
Apr 07 00:09:26 fbo-vmh-024 kernel: bnxt_re_mod_init+0x3e/0xff0 [bnxt_re]
Apr 07 00:09:26 fbo-vmh-024 kernel: ? __pfx_bnxt_re_mod_init+0x10/0x10 [bnxt_re]
Apr 07 00:09:26 fbo-vmh-024 kernel: do_one_initcall+0x5b/0x340
Apr 07 00:09:26 fbo-vmh-024 kernel: do_init_module+0x97/0x290
Apr 07 00:09:26 fbo-vmh-024 kernel: load_module+0x213a/0x22a0
Apr 07 00:09:26 fbo-vmh-024 kernel: init_module_from_file+0x96/0x100
Apr 07 00:09:26 fbo-vmh-024 kernel: ? init_module_from_file+0x96/0x100
Apr 07 00:09:26 fbo-vmh-024 kernel: idempotent_init_module+0x11c/0x2b0
Apr 07 00:09:26 fbo-vmh-024 kernel: __x64_sys_finit_module+0x64/0xd0
Apr 07 00:09:26 fbo-vmh-024 kernel: do_syscall_64+0x84/0x180
Apr 07 00:09:26 fbo-vmh-024 kernel: ? syscall_exit_to_user_mode+0x86/0x260
Apr 07 00:09:26 fbo-vmh-024 kernel: ? do_syscall_64+0x93/0x180
Apr 07 00:09:26 fbo-vmh-024 kernel: ? exc_page_fault+0x94/0x1b0
Apr 07 00:09:26 fbo-vmh-024 kernel: entry_SYSCALL_64_after_hwframe+0x6e/0x76
Apr 07 00:09:26 fbo-vmh-024 kernel: RIP: 0033:0x7ac146137719
Apr 07 00:09:26 fbo-vmh-024 kernel: Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 f>
Apr 07 00:09:26 fbo-vmh-024 kernel: RSP: 002b:00007ffc8a83b208 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
Apr 07 00:09:26 fbo-vmh-024 kernel: RAX: ffffffffffffffda RBX: 00005f4b75018a80 RCX: 00007ac146137719
Apr 07 00:09:26 fbo-vmh-024 kernel: RDX: 0000000000000000 RSI: 00007ac1462caefd RDI: 000000000000000f
Apr 07 00:09:26 fbo-vmh-024 kernel: RBP: 00007ac1462caefd R08: 0000000000000000 R09: 00005f4b74fd8720
Apr 07 00:09:26 fbo-vmh-024 kernel: R10: 000000000000000f R11: 0000000000000246 R12: 0000000000020000
Apr 07 00:09:26 fbo-vmh-024 kernel: R13: 0000000000000000 R14: 00005f4b7500f170 R15: 00005f4b74858ec1
Apr 07 00:09:26 fbo-vmh-024 kernel: </TASK>
Apr 07 00:09:26 fbo-vmh-024 kernel: ---[ end trace 0000000000000000 ]---
Apr 07 00:09:26 fbo-vmh-024 kernel: bnxt_en 0000:3d:00.0 bnxt_re0: Free MW failed: 0xffffff92
Apr 07 00:09:26 fbo-vmh-024 kernel: infiniband bnxt_re0: Couldn't open port 1


Apr 07 00:11:09 fbo-vmh-024 kernel: bnxt_en 0000:3d:00.1: QPLIB: bnxt_re_is_fw_stalled: FW STALL Detected. cmdq[0xe]=0x3 waited (102345 > 100000) msec active 1
Apr 07 00:11:09 fbo-vmh-024 kernel: bnxt_en 0000:3d:00.1 bnxt_re1: Failed to modify HW QP
Apr 07 00:11:09 fbo-vmh-024 kernel: infiniband bnxt_re1: Couldn't change QP1 state to INIT: -110
Apr 07 00:11:09 fbo-vmh-024 kernel: infiniband bnxt_re1: Couldn't start port
Apr 07 00:11:09 fbo-vmh-024 kernel: bnxt_en 0000:3d:00.1 bnxt_re1: Failed to destroy HW QP
Apr 07 00:11:09 fbo-vmh-024 kernel: bnxt_en 0000:3d:00.1 bnxt_re1: Free MW failed: 0xffffff92
Apr 07 00:11:09 fbo-vmh-024 kernel: infiniband bnxt_re1: Couldn't open port 1
 
This is something that might be worth to expose as option in our installer
Not sure what you had in mind there. Maybe
* name & MACAddress match -> proceed without user intervention
* name or MACAddress match -> user confirmation required to update non matching item noting total requiring user confirmation
* neither name or MACAddress match but outstanding interface yet to match -> user confirmation from all remaining to assign
* miss match between total number of interfaces to match -> report error

Or maybe a simplified subset of the above or something different entirely.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!