Opt-in Linux 6.5 Kernel with ZFS 2.2 for Proxmox VE 8 available on test & no-subscription

Pakillo77 · Dec 3, 2023

Hi,
I have a r620 with broadcom BCM7520 && mellanox connect-x4, it's booting fine with 6.5.3 (I'll test newer other 6.5.x to compare)

(I can't tell if the physical console is working as I only manage them remotly through idrac)

Code:

# lspci |grep -i connect
41:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
41:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

#lspci |grep BCM5720
01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
02:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
02:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
# uname -a
Linux formationkvm3 6.5.3-1-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.3-1 (2023-10-23T08:03Z) x86_64 GNU/Linux

Our cluster is:

PVE221: Dell PowerEdge R240 Xeon(R) E-2234
PVE222: Dell PowerEdge R610 Xeon(R) CPU L5520 (just checked, I thought it was R620)
PVE223: Dell PowerEdge R6515 AMD EPYC 7313P

Only PVE221 can't boot with kernel 6.5.
No matter if tg3 module is loaded or not and it's not a display issue.

Regards,

spirit · Dec 4, 2023

tested with a Dell r630, it's booting fine too.

- raid controller is a perc H330
- mellanox connect-x4
- embedded broadcom nic

bios : 2.13.0

intel-microcode package is installed too.

redjohn · Dec 5, 2023

Hello everyone,

I would also like to add my 5 coins. I have exactly the same problem, with PVE Kernel 6.5 I can no longer boot my Fujitsu S740 (mini PC) system. It just freezes:

If I boot the kernel in recovery mode then I can at least boot the system.

https://pve.proxmox.com/wiki/Install_Proxmox_VE_on_Debian_12_Bookworm

What can you do now?

Thanks!

Stoiko Ivanov · Dec 5, 2023

redjohn said:
Hello everyone,

I would also like to add my 5 coins. I have exactly the same problem, with PVE Kernel 6.5 I can no longer boot my Fujitsu S740 (mini PC) system. It just freezes:

Not sure if this has been already suggested here - but in the boot-loader - try removing the 'quiet' flag from the kernel commandline - maybe this helps in finding out where the issue is.

redjohn · Dec 5, 2023

Stoiko Ivanov said:
Not sure if this has been already suggested here - but in the boot-loader - try removing the 'quiet' flag from the kernel commandline - maybe this helps in finding out where the issue is.

Hello,
thanks for the quick response. I did, the system also seems to be stuck on the network adapter :-(.

What can I do now? It is not a productive system and only a backup system at another location.

But if I update the productive system, it may also have problems and that would be very bad for a productive system.

redjohn · Dec 5, 2023

redjohn said:
Hello,
thanks for the quick response. I did, the system also seems to be stuck on the network adapter :-(.
View attachment 59321
What can I do now? It is not a productive system and only a backup system at another location.

But if I update the productive system, it may also have problems and that would be very bad for a productive system.

Hello,

when I start the server with network it gets stuck in this status. If I start the server without a network, the system remains stuck at this point for 5-10 minutes and then restarts. After plugging in the NIC, I still cannot reach the system. :-(

Thanks for any further help!

zodiac · Dec 5, 2023

redjohn said:
Hello,
thanks for the quick response. I did, the system also seems to be stuck on the network adapter :-(.
View attachment 59321
What can I do now? It is not a productive system and only a backup system at another location.

But if I update the productive system, it may also have problems and that would be very bad for a productive system.

Ah, that looks very promising, actually. So, this is not a problem with the kernel failing to start. That should be much easier to debug. While I don't know about issues that are specific to Proxmox, this is a situation that you could encounter/debug with any Linux distribution. So, let's look at some of the normal tools you'd use in this kind of situation.

As a first attempt, I would reboot and then edit the command line to so that you remove the quiet part and add systemd.debug-shell instead. If everything in Proxmox is configured how I usually expect it to be on Linux, you should then be able to press CTRL-ALT-F9 when the system appears to be hanging. This brings you to a different text console, where you can start a shell and start poking at things.

I would use commands such as ps -Helf and dmesg to get a general idea of what is trying to run at this point. Then run systemctl, systemctl status, and systemctl status networking to get a better idea of what systemd thinks is happening right now; yes, all three of these commands show different information. Another similar command to try would be systemctl list-jobs. journalctl -xef could also be a good idea, and so is ip a.

Unfortunately, ProxmoxVE doesn't use systemd for managing networking. So, the information that you'll get here is going to be a little limited. You might have to dig down deeper into how Proxmox's implementation of ifup works. And since we don't really have a good theory yet, what is going wrong, I also can't really make any further suggestion what to do past this point.

But hopefully, you'll be able to collect useful data.

If systemd.debug-shell doesn't allow you to make progress, try changing the kernel command line to say systemd.unit=rescue.target instead. That stops at a different point during the boot process. It can be helpful, but it can also be more confusing and require you to know a little bit more about systemd. So, this would be my second choice to try.

scontin · Dec 7, 2023

I confirm DELL with BCM5720 + kernel 6.5 won't boot on PowerEdge T140.

This is a very serious issue, because a lot of DELL servers use this NIC.
Please resolve!!

Stoiko Ivanov · Dec 7, 2023

proteus said:
Dell T340 same problem, same NIC

Pakillo77 said:
About the VM migrations, they hang when moving from/to any host running kernel 6.5.
In this cluster we have a Dell R620 and a Dell R6515 (running 6.5) and the R240 (running 6.2).

scontin said:
I confirm DELL with BCM5720 + kernel 6.5 won't boot on PowerEdge T140.

If I see this right the only affected Dell Servers with BCM5720 are 14th gen? T140, R240, T340 (the 4 as second digit indicates 13th gen)

because - we have reports of the tg3/BCM5720 NICs working in a R620, R630 by @spirit - and a R630 I have access to also runs fine with kernel 6.5, also I think there were users with working R610 (or similar) , and R650 (or similiar) servers?
Could the affected users maybe share their firmware versions?:

Code:

ethtool -i eno1
driver: tg3
version: 6.5.11-6-pve
firmware-version: FFV22.00.6 bc 5720-v1.39
expansion-rom-version: 
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

scontin · Dec 7, 2023

Affected T140

ethtool -i eno1
driver: tg3
version: 6.2.16-19-pve
firmware-version: 5719-v1.46 NCSI v1.5.33.0
expansion-rom-version:
bus-info: 0000:02:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

JensF · Dec 7, 2023

Stoiko Ivanov said:
If I see this right the only affected Dell Servers with BCM5720 are 14th gen? T140, R240, T340 (the 4 as second digit indicates 13th gen)

because - we have reports of the tg3/BCM5720 NICs working in a R620, R630 by @spirit - and a R630 I have access to also runs fine with kernel 6.5, also I think there were users with working R610 (or similar) , and R650 (or similiar) servers?

Same network card in our Dell R720. The new kernel is working fine.
ethtool -i eno1
driver: tg3
version: 6.5.11-4-pve
firmware-version: FFV21.60.16 bc 5720-v1.39
expansion-rom-version:
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

Stoiko Ivanov · Dec 7, 2023

scontin said:
Affected T140

ethtool -i eno1
driver: tg3
version: 6.2.16-19-pve
firmware-version: 5719-v1.46 NCSI v1.5.33.0
expansion-rom-version:
bus-info: 0000:02:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

not too familiar with the versioning of broadcom nics (in dell servers) - but is the NIC a BCM 5720 - or BCM 5719?
(`lspci -nnk | grep -i broadcom` should provide the information)

ivenae · Dec 9, 2023

Stoiko Ivanov said:
not too familiar with the versioning of broadcom nics (in dell servers) - but is the NIC a BCM 5720 - or BCM 5719?
(`lspci -nnk | grep -i broadcom` should provide the information)

root@xxx:~# lspci -nnk | grep -i broadcom
01:00.0 RAID bus controller [0104]: Broadcom / LSI MegaRAID SAS-3 3008 [Fury] [1000:005f] (rev 02)
05:00.0 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe [14e4:165f]
05:00.1 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe [14e4:165f]

on a Dell T140 which won't boot with 6.5

llevet · Dec 9, 2023

On Dell PowerEdge T340

root@pve-t340:~# ethtool -i eno1
driver: tg3
version: 6.2.16-19-pve
firmware-version: FFV22.61.8 bc 5720-v1.39
expansion-rom-version:
bus-info: 0000:05:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

Removing the 'quiet' flag from the kernel commandline don't give me more information. Stop after "loading initial ramdisk"

Disable NIC BCM5720 in bios doesn't change anything.

spirit · Dec 9, 2023

I have some dell r620 with broadcom BCM5720, and they are booting fine

Code:

# lspci -nnk | grep -i broadcom
01:00.0 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe [14e4:165f]
01:00.1 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe [14e4:165f]
02:00.0 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe [14e4:165f]
02:00.1 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe [14e4:165f]

# uname -a
#1 SMP PREEMPT_DYNAMIC PMX 6.5.11-6 (2023-11-29T08:32Z) x86_64 GNU/Linux

spirit · Dec 9, 2023

if the networking service is hanging, you should have debug log history in /var/log/ifupdown2/network_config_ifupdown2_<datetime>/

llevet · Dec 9, 2023

Booting on 6.5 kernel don't generate any log in /var/log/ifupdown2/ for me.
For my part, on T340 the server hang after "loading initial ramdisk"
Replacing "quiet" by "by systemd.debug-shell=1" don't give me more info in console and CTRL-ALT-F9 give nothing.

zodiac · Dec 9, 2023

llevet said:
Booting on 6.5 kernel don't generate any log in /var/log/ifupdown2/ for me.
For my part, on T340 the server hang after "loading initial ramdisk"
Replacing "quiet" by "by systemd.debug-shell=1" don't give me more info in console and CTRL-ALT-F9 give nothing.

Does replacing "quiet" with "break=top" make any difference? That's usually a good way to get into the initramfs, and it should be the very first thing that happens after the kernel initialized itself and started userspace.

donhwyo · Dec 9, 2023

Could it be something like this?
https://forum.proxmox.com/threads/no-sas2008-after-upgrade.129499/post-607858

llevet · Dec 9, 2023

"break=top" make kernel 6.5 and 6.2 to hang
"pci=realloc=off" make no change on boot 6.5 kernel. It hang

Opt-in Linux 6.5 Kernel with ZFS 2.2 for Proxmox VE 8 available on test & no-subscription

Active Member

Distinguished Member

Renowned Member

Proxmox Staff Member

Renowned Member

Renowned Member

Active Member

Renowned Member

Proxmox Staff Member

Renowned Member

Renowned Member

Proxmox Staff Member

Member

Member

Distinguished Member

Distinguished Member

Member

Active Member

Member

Member

We value your privacy