Proxmox VE 8.2 released!

Getting random complete server freeze after the update:
cpu: epyc 3
mobo: supermicro h12-ssl
nic: intel X520-DA2
kernel: 6.8.4-2-pve

8 nodes, licensed proxmox, hyperconverged ceph (17.2.7-pve3)
happened 5 times in ~24 hours
random load, random timings, random servers

What is the safe way revert to kernel 6.5 ?


Thank you.
 
Has anyone provided an official answer to these questions?
It's probably fine considering stock debian (of which proxmox OS is based) runs of kernel 6.5 out of the box and all ubuntu releases short of new LTS do as well (proxmox kernel based off ubuntu one). No issues so far for me
 
  • Like
Reactions: kesawi
I needed to do a clean install of Proxmox 8.2 because I broke something.

I'd like to test a DKMS module under 8.2 with the 6.5.13 kernel, but I don't see it in the repos with the Enterprise repo enabled, or the pvetest repo.

Is there a way I can get it back?

EDIT: I was searching for "pve-kernel," but it shows up when searching for "proxmox-kernel" as:
Code:
proxmox-kernel-6.5.13-5-pve/stable 6.5.13-5 amd64
  Proxmox Kernel Image

Can I just install that without breaking something?
the kernel packages got renamed a while back, proxmox-kernel as prefix is how they are called now (they also provide the old package name as compat alias).
 
My nodes are constantly crashing (every few minutes or hours), different hardware
1 node - Mac mini 2012
2 nodes - HP Prodesk 400 G4 (9th gen intel i5 - 9500T)

this is the log from both machines:

Code:
Apr 26 11:57:16 hp-mini-9500T-1 corosync[1143]:   [KNET  ] link: host: 2 link: 0 is down
Apr 26 11:57:16 hp-mini-9500T-1 corosync[1143]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Apr 26 11:57:16 hp-mini-9500T-1 corosync[1143]:   [KNET  ] host: host: 2 has no active links
Apr 26 11:57:17 hp-mini-9500T-1 corosync[1143]:   [TOTEM ] Token has not been received in 2737 ms
Apr 26 11:57:18 hp-mini-9500T-1 corosync[1143]:   [TOTEM ] A processor failed, forming new configuration: token timed out (3650>
Apr 26 11:57:22 hp-mini-9500T-1 corosync[1143]:   [QUORUM] Sync members[1]: 1
Apr 26 11:57:22 hp-mini-9500T-1 corosync[1143]:   [QUORUM] Sync left[1]: 2
Apr 26 11:57:22 hp-mini-9500T-1 corosync[1143]:   [TOTEM ] A new membership (1.3b84) was formed. Members left: 2
Apr 26 11:57:22 hp-mini-9500T-1 corosync[1143]:   [TOTEM ] Failed to receive the leave message. failed: 2
Apr 26 11:57:22 hp-mini-9500T-1 pmxcfs[1053]: [dcdb] notice: members: 1/1053
Apr 26 11:57:22 hp-mini-9500T-1 pmxcfs[1053]: [status] notice: members: 1/1053
Apr 26 11:57:22 hp-mini-9500T-1 corosync[1143]:   [QUORUM] This node is within the non-primary component and will NOT provide a>
Apr 26 11:57:22 hp-mini-9500T-1 corosync[1143]:   [QUORUM] Members[1]: 1
Apr 26 11:57:22 hp-mini-9500T-1 corosync[1143]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 26 11:57:22 hp-mini-9500T-1 pmxcfs[1053]: [status] notice: node lost quorum
Apr 26 11:57:22 hp-mini-9500T-1 pmxcfs[1053]: [dcdb] crit: received write while not quorate - trigger resync
Apr 26 11:57:22 hp-mini-9500T-1 pmxcfs[1053]: [dcdb] crit: leaving CPG group
Apr 26 11:57:23 hp-mini-9500T-1 pve-ha-lrm[1239]: lost lock 'ha_agent_hp-mini-9500T-1_lock - cfs lock update failed - Permissio>
Apr 26 11:57:23 hp-mini-9500T-1 pmxcfs[1053]: [dcdb] notice: start cluster connection
Apr 26 11:57:23 hp-mini-9500T-1 pmxcfs[1053]: [dcdb] crit: cpg_join failed: 14
Apr 26 11:57:23 hp-mini-9500T-1 pmxcfs[1053]: [dcdb] crit: can't initialize service
Apr 26 11:57:23 hp-mini-9500T-1 pve-ha-crm[1201]: status change slave => wait_for_quorum
Apr 26 11:57:28 hp-mini-9500T-1 pve-ha-lrm[1239]: status change active => lost_agent_lock
Apr 26 11:57:29 hp-mini-9500T-1 pmxcfs[1053]: [dcdb] notice: members: 1/1053
Apr 26 11:57:29 hp-mini-9500T-1 pmxcfs[1053]: [dcdb] notice: all data is up to date
Apr 26 11:58:12 hp-mini-9500T-1 pvescheduler[111238]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Apr 26 11:58:12 hp-mini-9500T-1 pvescheduler[111237]: replication: cfs-lock 'file-replication_cfg' error: no quorum!


Code:
Apr 26 12:09:21 mac-mini12 corosync[1232]:   [KNET  ] link: host: 1 link: 0 is down
Apr 26 12:09:21 mac-mini12 corosync[1232]:   [KNET  ] link: host: 3 link: 0 is down
Apr 26 12:09:21 mac-mini12 corosync[1232]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Apr 26 12:09:21 mac-mini12 corosync[1232]:   [KNET  ] host: host: 1 has no active links
Apr 26 12:09:21 mac-mini12 corosync[1232]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Apr 26 12:09:21 mac-mini12 corosync[1232]:   [KNET  ] host: host: 3 has no active links
Apr 26 12:09:23 mac-mini12 corosync[1232]:   [TOTEM ] Token has not been received in 2737 ms
Apr 26 12:09:24 mac-mini12 corosync[1232]:   [TOTEM ] A processor failed, forming new configuration: token timed out (3650ms), >
Apr 26 12:09:28 mac-mini12 corosync[1232]:   [QUORUM] Sync members[1]: 2
Apr 26 12:09:28 mac-mini12 corosync[1232]:   [QUORUM] Sync left[2]: 1 3
Apr 26 12:09:28 mac-mini12 corosync[1232]:   [TOTEM ] A new membership (2.3b95) was formed. Members left: 1 3
Apr 26 12:09:28 mac-mini12 corosync[1232]:   [TOTEM ] Failed to receive the leave message. failed: 1 3
Apr 26 12:09:28 mac-mini12 pmxcfs[1138]: [dcdb] notice: members: 2/1138
Apr 26 12:09:28 mac-mini12 pmxcfs[1138]: [status] notice: members: 2/1138
Apr 26 12:09:28 mac-mini12 corosync[1232]:   [QUORUM] This node is within the non-primary component and will NOT provide any se>
Apr 26 12:09:28 mac-mini12 corosync[1232]:   [QUORUM] Members[1]: 2
Apr 26 12:09:28 mac-mini12 corosync[1232]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 26 12:09:28 mac-mini12 pmxcfs[1138]: [status] notice: node lost quorum
Apr 26 12:09:28 mac-mini12 pmxcfs[1138]: [dcdb] crit: received write while not quorate - trigger resync
Apr 26 12:09:28 mac-mini12 pmxcfs[1138]: [dcdb] crit: leaving CPG group
Apr 26 12:09:28 mac-mini12 pve-ha-crm[1291]: lost lock 'ha_manager_lock - cfs lock update failed - Operation not permitted
Apr 26 12:09:29 mac-mini12 pmxcfs[1138]: [dcdb] notice: start cluster connection
Apr 26 12:09:29 mac-mini12 pmxcfs[1138]: [dcdb] crit: cpg_join failed: 14
Apr 26 12:09:29 mac-mini12 pve-ha-lrm[1399]: lost lock 'ha_agent_mac-mini12_lock - cfs lock update failed - Device or resource >
Apr 26 12:09:29 mac-mini12 pmxcfs[1138]: [dcdb] crit: can't initialize service
Apr 26 12:09:30 mac-mini12 pvestatd[1272]: storage 'NAS' is not online
Apr 26 12:09:30 mac-mini12 pve-ha-crm[1291]: status change master => lost_manager_lock
Apr 26 12:09:30 mac-mini12 pve-ha-crm[1291]: watchdog closed (disabled)
Apr 26 12:09:30 mac-mini12 pve-ha-crm[1291]: status change lost_manager_lock => wait_for_quorum
Apr 26 12:09:31 mac-mini12 pvestatd[1272]: status update time (10.722 seconds)
Apr 26 12:09:32 mac-mini12 pve-ha-lrm[1399]: status change active => lost_agent_lock
Apr 26 12:09:35 mac-mini12 pmxcfs[1138]: [dcdb] notice: members: 2/1138
Apr 26 12:09:35 mac-mini12 pmxcfs[1138]: [dcdb] notice: all data is up to date
Apr 26 12:09:36 mac-mini12 pvestatd[1272]: storage 'NAS' is not online
Apr 26 12:09:36 mac-mini12 pvestatd[1272]: status update time (5.725 seconds)
Apr 26 12:09:47 mac-mini12 pvestatd[1272]: storage 'NAS' is not online
Apr 26 12:09:47 mac-mini12 pvestatd[1272]: status update time (5.749 seconds)
Apr 26 12:09:56 mac-mini12 pvestatd[1272]: storage 'NAS' is not online
Apr 26 12:09:57 mac-mini12 pvestatd[1272]: status update time (5.741 seconds)
 
My nodes are constantly crashing (every few minutes or hours), different hardware
Do you have HA enabled guests? The Corosync logs indicate a network issue. Check the logs if anything is logged before regarding the NICs/network. Check your cables and if that does not help, please consider opening a new thread.
As a workaround, you can set all HA enabled guests to "ignore" state. Then after then minutes, the LRM on each node should switch back to idle mode.
Once in idle mode, the nodes will not fence (hard reset) themselves if the corosync connection is down for too long (1 minute). Giving you time to troubleshoot.
 
upgraded 3 hosts with ceph cluster yesterday, coming from 8.1.4
no ping after reboot

checked: ip link show
all interfaces DOWN

Broadcom BCM57416 & BCM57508

tried ifup eno1np0
Error: another instance of ifup is already running

found this
https://forum.proxmox.com/threads/network-wont-start-on-boot.83772/post-368240

- systemctl restart systemd-udevd
- systemctl restart networking

solves problem until next reboot

the startup jobs run into timeout at boot
- systemd-udev-settle.service/start running
- ifupdown2-pre.service/start running


no crashes so far with kernel 6.8
 
Last edited:
Do you have HA enabled guests? The Corosync logs indicate a network issue. Check the logs if anything is logged before regarding the NICs/network. Check your cables and if that does not help, please consider opening a new thread.
As a workaround, you can set all HA enabled guests to "ignore" state. Then after then minutes, the LRM on each node should switch back to idle mode.
Once in idle mode, the nodes will not fence (hard reset) themselves if the corosync connection is down for too long (1 minute). Giving you time to troubleshoot.
Indeed, looks like it was a faulty cable, switched a cable and uptime is currently looking good, bad coincidence that it started happening after the upgrade.
 
  • Like
Reactions: aaron
upgraded 3 hosts with ceph cluster yesterday, coming from 8.1.4
no ping after reboot

checked: ip link show
all interfaces DOWN

Broadcom BCM57416 & BCM57508

tried ifup eno1np0
Error: another instance of ifup is already running

found this
https://forum.proxmox.com/threads/network-wont-start-on-boot.83772/post-368240

- systemctl restart systemd-udevd
- systemctl restart networking

solves problem until next reboot

the startup jobs run into timeout at boot
- systemd-udev-settle.service/start running
- ifupdown2-pre.service/start running


no crashes so far with kernel 6.8
do you have an error form bnxt_en kernel module in your logs? then it could be the same error as here

https://forum.proxmox.com/threads/o...le-on-test-no-subscription.144557/post-652507

putting the kernel module on the blocklist should help
 
Has anyone provided an official answer to these questions?
Update: No official word, but I've not experienced any problems pinning the latest version of 6.5 in PVE 8.2.

If you upgraded from Proxmox 8.1, you should have it. Otherwise, you'll need to install the image.
Code:
# apt search proxmox-kernel-6.5.13-5
Sorting... Done
Full Text Search... Done
proxmox-kernel-6.5.13-5-pve/stable,now 6.5.13-5 amd64 [installed]
  Proxmox Kernel Image

proxmox-kernel-6.5.13-5-pve-signed/stable 6.5.13-5 amd64
  Proxmox Kernel Image (signed)

proxmox-kernel-6.5.13-5-pve-signed-template/stable 6.5.13-5 amd64
  Template for signed kernel package

You'd need to install one of those and probably reboot.

Someone whose system has working secure boot support should confirm, but I believe the "pre-signed" version is only for secure boot. If there are reasons to use it without secure boot, I'd really like to know.

To pin the kernel, use the Proxmox boot tool:
First, list the installed kernels that you can pin using the list command, below.
Then use the proxmox-boot tool kernel pin command (I don't show this, as mine's already done, using the kernel name exactly as it's listed (e.g., "6.5.13-5-pve" without quotes).

Code:
root@andromeda2:~# proxmox-boot-tool kernel list
Manually selected kernels:
None.

Automatically selected kernels:
6.5.13-5-pve
6.8.4-2-pve

Pinned kernel:
6.5.13-5-pve

It'll refresh, and when you list the kernels again it should list as pinned. Otherwise you'll need a reboot.

If you're going to be doing any compiling/building of modules be sure to grab the appropriate header package for your pinned kernel.
 
  • Like
Reactions: kesawi and merasil
I just finished the first batch of updates and had no issues on any of the systems I updated. With the possible issue around interface renaming, I did do the assign network naming changes that have been mentioned before and rebooted after applying those changes before doing the update.

So far my list of successful updates to 8.2 includes:
  • My Workstation - (ROG Gaming-E Wifi II with a 5950x and 128GB RAM, 2 x GPUs, 4 x SSDs, 2 x NVMEs and 4 x HDDs)
  • My Media Server - (Asus Motherboard with a 5600G and 32GB RAM, 1 x GPU, 2 x SSDs and 24 x HDDs)
  • 3 QNAP white boxes working in a Ceph Cluster (TS-453A, 2 x TS-451)
  • My Backup Server - (Asus Motherboard with a i3-3220 and 16GB RAM, no GPUs, 2 SSDs and 8 x HDDs)
 
Update: No official word, but I've not experienced any problems pinning the latest version of 6.5 in PVE 8.2.

If you upgraded from Proxmox 8.1, you should have it. Otherwise, you'll need to install the image.
Code:
# apt search proxmox-kernel-6.5.13-5
Sorting... Done
Full Text Search... Done
proxmox-kernel-6.5.13-5-pve/stable,now 6.5.13-5 amd64 [installed]
  Proxmox Kernel Image

proxmox-kernel-6.5.13-5-pve-signed/stable 6.5.13-5 amd64
  Proxmox Kernel Image (signed)

proxmox-kernel-6.5.13-5-pve-signed-template/stable 6.5.13-5 amd64
  Template for signed kernel package

You'd need to install one of those and probably reboot.

Someone whose system has working secure boot support should confirm, but I believe the "pre-signed" version is only for secure boot. If there are reasons to use it without secure boot, I'd really like to know.

To pin the kernel, use the Proxmox boot tool:
First, list the installed kernels that you can pin using the list command, below.
Then use the proxmox-boot tool kernel pin command (I don't show this, as mine's already done, using the kernel name exactly as it's listed (e.g., "6.5.13-5-pve" without quotes).

Code:
root@andromeda2:~# proxmox-boot-tool kernel list
Manually selected kernels:
None.

Automatically selected kernels:
6.5.13-5-pve
6.8.4-2-pve

Pinned kernel:
6.5.13-5-pve

It'll refresh, and when you list the kernels again it should list as pinned. Otherwise you'll need a reboot.

If you're going to be doing any compiling/building of modules be sure to grab the appropriate header package for your pinned kernel.
At least pinning that Kernel helped as a workaround… but if its an Kernel issue there should be a lot of more complains or am i wrong?
 
At least pinning that Kernel helped as a workaround… but if its an Kernel issue there should be a lot of more complains or am i wrong?
Kernel 6.8 hit Ubuntu LTS 24.04 this week.

Third party hardware vendors are going to need a little time to get working drivers released.

I've been around for a few major PVE updates, and this one seemed to be particularly rough on hardware compatibility. Kernel 6.8 is a huge change.
 
I tried using Proxmox Automated Installation but found that the proxmox-auto-install-assistant command was not included in either nodes updated with PVE 8.2 or in nodes freshly installed with the PVE 8.2 ISO.

I was able to resolve this by installing the command using apt install proxmox-auto-install-assistant. Could you please clarify the following:
  • Is it intended that the command is not included in the installation from an update or ISO?
  • Are there any plans to include this command in future releases for installation from updates or ISOs?
  • If there are no such plans, could you please document the necessity of installing the command in the manual?
 
  • Like
Reactions: d.oishi
Getting random complete server freeze after the update:
cpu: epyc 3
mobo: supermicro h12-ssl
nic: intel X520-DA2
kernel: 6.8.4-2-pve

8 nodes, licensed proxmox, hyperconverged ceph (17.2.7-pve3)
happened 5 times in ~24 hours
random load, random timings, random servers

What is the safe way revert to kernel 6.5 ?


Thank you.
I got also freeze/crash, only on nodes with hyperconverged ceph.

I'm using encrypted nvme osd on theses servers.

epyc v3 with levono server (I post info && log previously in this thread).

do you have any log ?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!