Opt-in Linux 6.5 Kernel with ZFS 2.2 for Proxmox VE 8 available on test & no-subscription

Hi,
I have a r620 with broadcom BCM7520 && mellanox connect-x4, it's booting fine with 6.5.3 (I'll test newer other 6.5.x to compare)

(I can't tell if the physical console is working as I only manage them remotly through idrac)

Code:
# lspci |grep -i connect
41:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
41:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

#lspci |grep BCM5720
01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
02:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
02:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
# uname -a
Linux formationkvm3 6.5.3-1-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.3-1 (2023-10-23T08:03Z) x86_64 GNU/Linux
Our cluster is:

PVE221: Dell PowerEdge R240 Xeon(R) E-2234
PVE222: Dell PowerEdge R610 Xeon(R) CPU L5520 (just checked, I thought it was R620)
PVE223: Dell PowerEdge R6515 AMD EPYC 7313P

Only PVE221 can't boot with kernel 6.5.
No matter if tg3 module is loaded or not and it's not a display issue.


Regards,
 
  • Like
Reactions: Stoiko Ivanov
Hello everyone,

I would also like to add my 5 coins. I have exactly the same problem, with PVE Kernel 6.5 I can no longer boot my Fujitsu S740 (mini PC) system. It just freezes:
Not sure if this has been already suggested here - but in the boot-loader - try removing the 'quiet' flag from the kernel commandline - maybe this helps in finding out where the issue is.
 
Not sure if this has been already suggested here - but in the boot-loader - try removing the 'quiet' flag from the kernel commandline - maybe this helps in finding out where the issue is.
Hello,
thanks for the quick response. I did, the system also seems to be stuck on the network adapter :-(.
boot.jpeg
What can I do now? It is not a productive system and only a backup system at another location.

But if I update the productive system, it may also have problems and that would be very bad for a productive system.
 
Hello,
thanks for the quick response. I did, the system also seems to be stuck on the network adapter :-(.
View attachment 59321
What can I do now? It is not a productive system and only a backup system at another location.

But if I update the productive system, it may also have problems and that would be very bad for a productive system.
Hello,

when I start the server with network it gets stuck in this status. If I start the server without a network, the system remains stuck at this point for 5-10 minutes and then restarts. After plugging in the NIC, I still cannot reach the system. :-(

Thanks for any further help!
 
Hello,
thanks for the quick response. I did, the system also seems to be stuck on the network adapter :-(.
View attachment 59321
What can I do now? It is not a productive system and only a backup system at another location.

But if I update the productive system, it may also have problems and that would be very bad for a productive system.

Ah, that looks very promising, actually. So, this is not a problem with the kernel failing to start. That should be much easier to debug. While I don't know about issues that are specific to Proxmox, this is a situation that you could encounter/debug with any Linux distribution. So, let's look at some of the normal tools you'd use in this kind of situation.

As a first attempt, I would reboot and then edit the command line to so that you remove the quiet part and add systemd.debug-shell instead. If everything in Proxmox is configured how I usually expect it to be on Linux, you should then be able to press CTRL-ALT-F9 when the system appears to be hanging. This brings you to a different text console, where you can start a shell and start poking at things.

I would use commands such as ps -Helf and dmesg to get a general idea of what is trying to run at this point. Then run systemctl, systemctl status, and systemctl status networking to get a better idea of what systemd thinks is happening right now; yes, all three of these commands show different information. Another similar command to try would be systemctl list-jobs. journalctl -xef could also be a good idea, and so is ip a.

Unfortunately, ProxmoxVE doesn't use systemd for managing networking. So, the information that you'll get here is going to be a little limited. You might have to dig down deeper into how Proxmox's implementation of ifup works. And since we don't really have a good theory yet, what is going wrong, I also can't really make any further suggestion what to do past this point.

But hopefully, you'll be able to collect useful data.

If systemd.debug-shell doesn't allow you to make progress, try changing the kernel command line to say systemd.unit=rescue.target instead. That stops at a different point during the boot process. It can be helpful, but it can also be more confusing and require you to know a little bit more about systemd. So, this would be my second choice to try.
 
Last edited:
I confirm DELL with BCM5720 + kernel 6.5 won't boot on PowerEdge T140.

This is a very serious issue, because a lot of DELL servers use this NIC.
Please resolve!!
 
Dell T340 same problem, same NIC

About the VM migrations, they hang when moving from/to any host running kernel 6.5.
In this cluster we have a Dell R620 and a Dell R6515 (running 6.5) and the R240 (running 6.2).

I confirm DELL with BCM5720 + kernel 6.5 won't boot on PowerEdge T140.

If I see this right the only affected Dell Servers with BCM5720 are 14th gen? T140, R240, T340 (the 4 as second digit indicates 13th gen)

because - we have reports of the tg3/BCM5720 NICs working in a R620, R630 by @spirit - and a R630 I have access to also runs fine with kernel 6.5, also I think there were users with working R610 (or similar) , and R650 (or similiar) servers?
Could the affected users maybe share their firmware versions?:

Code:
ethtool -i eno1
driver: tg3
version: 6.5.11-6-pve
firmware-version: FFV22.00.6 bc 5720-v1.39
expansion-rom-version: 
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
 
Affected T140

ethtool -i eno1
driver: tg3
version: 6.2.16-19-pve
firmware-version: 5719-v1.46 NCSI v1.5.33.0
expansion-rom-version:
bus-info: 0000:02:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
 
If I see this right the only affected Dell Servers with BCM5720 are 14th gen? T140, R240, T340 (the 4 as second digit indicates 13th gen)

because - we have reports of the tg3/BCM5720 NICs working in a R620, R630 by @spirit - and a R630 I have access to also runs fine with kernel 6.5, also I think there were users with working R610 (or similar) , and R650 (or similiar) servers?
Same network card in our Dell R720. The new kernel is working fine.
ethtool -i eno1
driver: tg3
version: 6.5.11-4-pve
firmware-version: FFV21.60.16 bc 5720-v1.39
expansion-rom-version:
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
 
Last edited:
Affected T140

ethtool -i eno1
driver: tg3
version: 6.2.16-19-pve
firmware-version: 5719-v1.46 NCSI v1.5.33.0
expansion-rom-version:
bus-info: 0000:02:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
not too familiar with the versioning of broadcom nics (in dell servers) - but is the NIC a BCM 5720 - or BCM 5719?
(`lspci -nnk | grep -i broadcom` should provide the information)
 
not too familiar with the versioning of broadcom nics (in dell servers) - but is the NIC a BCM 5720 - or BCM 5719?
(`lspci -nnk | grep -i broadcom` should provide the information)
root@xxx:~# lspci -nnk | grep -i broadcom
01:00.0 RAID bus controller [0104]: Broadcom / LSI MegaRAID SAS-3 3008 [Fury] [1000:005f] (rev 02)
05:00.0 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe [14e4:165f]
05:00.1 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe [14e4:165f]

on a Dell T140 which won't boot with 6.5
 
On Dell PowerEdge T340

root@pve-t340:~# ethtool -i eno1
driver: tg3
version: 6.2.16-19-pve
firmware-version: FFV22.61.8 bc 5720-v1.39
expansion-rom-version:
bus-info: 0000:05:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

Removing the 'quiet' flag from the kernel commandline don't give me more information. Stop after "loading initial ramdisk"

Disable NIC BCM5720 in bios doesn't change anything.
 
I have some dell r620 with broadcom BCM5720, and they are booting fine

Code:
# lspci -nnk | grep -i broadcom
01:00.0 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe [14e4:165f]
01:00.1 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe [14e4:165f]
02:00.0 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe [14e4:165f]
02:00.1 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe [14e4:165f]

# uname -a
#1 SMP PREEMPT_DYNAMIC PMX 6.5.11-6 (2023-11-29T08:32Z) x86_64 GNU/Linux
 
Booting on 6.5 kernel don't generate any log in /var/log/ifupdown2/ for me.
For my part, on T340 the server hang after "loading initial ramdisk"
Replacing "quiet" by "by systemd.debug-shell=1" don't give me more info in console and CTRL-ALT-F9 give nothing.
 
  • Like
Reactions: zodiac
Booting on 6.5 kernel don't generate any log in /var/log/ifupdown2/ for me.
For my part, on T340 the server hang after "loading initial ramdisk"
Replacing "quiet" by "by systemd.debug-shell=1" don't give me more info in console and CTRL-ALT-F9 give nothing.
Does replacing "quiet" with "break=top" make any difference? That's usually a good way to get into the initramfs, and it should be the very first thing that happens after the kernel initialized itself and started userspace.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!