Proxmox VE 8.2 released!

Kuonel · Apr 26, 2024

Getting random complete server freeze after the update:
cpu: epyc 3
mobo: supermicro h12-ssl
nic: intel X520-DA2
kernel: 6.8.4-2-pve

8 nodes, licensed proxmox, hyperconverged ceph (17.2.7-pve3)
happened 5 times in ~24 hours
random load, random timings, random servers

What is the safe way revert to kernel 6.5 ?

Thank you.

Zackptg5 · Apr 26, 2024

Kuonel said:
Getting random complete server freeze after the update:
cpu: epyc 3
mobo: supermicro h12-ssl
nic: intel X520-DA2
kernel: 6.8.4-2-pve

8 nodes, licensed proxmox, hyperconverged ceph (17.2.7-pve3)
happened 5 times in ~24 hours
random load, random timings, random servers

What is the safe way revert to kernel 6.5 ?

Thank you.

https://pve.proxmox.com/wiki/Host_Bootloader#sysboot_kernel_pin

yakatz · Apr 26, 2024

I also had major network issues due to name changes of Intel X722 network interfaces.

kesawi · Apr 26, 2024

SInisterPisces said:
What are the implications of running the latest 6.5.13 kernel with Proxmox 8.2? I'm experimenting with Intel iGPU SR-IOV support, and that still requires the latest 6.5.13 release.

Is booting with 6.5 going to break any of the new features?

Has anyone provided an official answer to these questions?

Zackptg5 · Apr 26, 2024

kesawi said:
Has anyone provided an official answer to these questions?

It's probably fine considering stock debian (of which proxmox OS is based) runs of kernel 6.5 out of the box and all ubuntu releases short of new LTS do as well (proxmox kernel based off ubuntu one). No issues so far for me

Kuonel · Apr 26, 2024

I've switched to kernel 6.5.13-5-pve
I'll let you know how it goes after hitting it with office hours load

fabian · Apr 26, 2024

SInisterPisces said:
I needed to do a clean install of Proxmox 8.2 because I broke something.

I'd like to test a DKMS module under 8.2 with the 6.5.13 kernel, but I don't see it in the repos with the Enterprise repo enabled, or the pvetest repo.

Is there a way I can get it back?

EDIT: I was searching for "pve-kernel," but it shows up when searching for "proxmox-kernel" as:

Code:

proxmox-kernel-6.5.13-5-pve/stable 6.5.13-5 amd64 Proxmox Kernel Image

Can I just install that without breaking something?

the kernel packages got renamed a while back, proxmox-kernel as prefix is how they are called now (they also provide the old package name as compat alias).

Dimk · Apr 26, 2024

My nodes are constantly crashing (every few minutes or hours), different hardware
1 node - Mac mini 2012
2 nodes - HP Prodesk 400 G4 (9th gen intel i5 - 9500T)

this is the log from both machines:

Code:

Apr 26 11:57:16 hp-mini-9500T-1 corosync[1143]:   [KNET  ] link: host: 2 link: 0 is down
Apr 26 11:57:16 hp-mini-9500T-1 corosync[1143]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Apr 26 11:57:16 hp-mini-9500T-1 corosync[1143]:   [KNET  ] host: host: 2 has no active links
Apr 26 11:57:17 hp-mini-9500T-1 corosync[1143]:   [TOTEM ] Token has not been received in 2737 ms
Apr 26 11:57:18 hp-mini-9500T-1 corosync[1143]:   [TOTEM ] A processor failed, forming new configuration: token timed out (3650>
Apr 26 11:57:22 hp-mini-9500T-1 corosync[1143]:   [QUORUM] Sync members[1]: 1
Apr 26 11:57:22 hp-mini-9500T-1 corosync[1143]:   [QUORUM] Sync left[1]: 2
Apr 26 11:57:22 hp-mini-9500T-1 corosync[1143]:   [TOTEM ] A new membership (1.3b84) was formed. Members left: 2
Apr 26 11:57:22 hp-mini-9500T-1 corosync[1143]:   [TOTEM ] Failed to receive the leave message. failed: 2
Apr 26 11:57:22 hp-mini-9500T-1 pmxcfs[1053]: [dcdb] notice: members: 1/1053
Apr 26 11:57:22 hp-mini-9500T-1 pmxcfs[1053]: [status] notice: members: 1/1053
Apr 26 11:57:22 hp-mini-9500T-1 corosync[1143]:   [QUORUM] This node is within the non-primary component and will NOT provide a>
Apr 26 11:57:22 hp-mini-9500T-1 corosync[1143]:   [QUORUM] Members[1]: 1
Apr 26 11:57:22 hp-mini-9500T-1 corosync[1143]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 26 11:57:22 hp-mini-9500T-1 pmxcfs[1053]: [status] notice: node lost quorum
Apr 26 11:57:22 hp-mini-9500T-1 pmxcfs[1053]: [dcdb] crit: received write while not quorate - trigger resync
Apr 26 11:57:22 hp-mini-9500T-1 pmxcfs[1053]: [dcdb] crit: leaving CPG group
Apr 26 11:57:23 hp-mini-9500T-1 pve-ha-lrm[1239]: lost lock 'ha_agent_hp-mini-9500T-1_lock - cfs lock update failed - Permissio>
Apr 26 11:57:23 hp-mini-9500T-1 pmxcfs[1053]: [dcdb] notice: start cluster connection
Apr 26 11:57:23 hp-mini-9500T-1 pmxcfs[1053]: [dcdb] crit: cpg_join failed: 14
Apr 26 11:57:23 hp-mini-9500T-1 pmxcfs[1053]: [dcdb] crit: can't initialize service
Apr 26 11:57:23 hp-mini-9500T-1 pve-ha-crm[1201]: status change slave => wait_for_quorum
Apr 26 11:57:28 hp-mini-9500T-1 pve-ha-lrm[1239]: status change active => lost_agent_lock
Apr 26 11:57:29 hp-mini-9500T-1 pmxcfs[1053]: [dcdb] notice: members: 1/1053
Apr 26 11:57:29 hp-mini-9500T-1 pmxcfs[1053]: [dcdb] notice: all data is up to date
Apr 26 11:58:12 hp-mini-9500T-1 pvescheduler[111238]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Apr 26 11:58:12 hp-mini-9500T-1 pvescheduler[111237]: replication: cfs-lock 'file-replication_cfg' error: no quorum!

Code:

Apr 26 12:09:21 mac-mini12 corosync[1232]:   [KNET  ] link: host: 1 link: 0 is down
Apr 26 12:09:21 mac-mini12 corosync[1232]:   [KNET  ] link: host: 3 link: 0 is down
Apr 26 12:09:21 mac-mini12 corosync[1232]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Apr 26 12:09:21 mac-mini12 corosync[1232]:   [KNET  ] host: host: 1 has no active links
Apr 26 12:09:21 mac-mini12 corosync[1232]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Apr 26 12:09:21 mac-mini12 corosync[1232]:   [KNET  ] host: host: 3 has no active links
Apr 26 12:09:23 mac-mini12 corosync[1232]:   [TOTEM ] Token has not been received in 2737 ms
Apr 26 12:09:24 mac-mini12 corosync[1232]:   [TOTEM ] A processor failed, forming new configuration: token timed out (3650ms), >
Apr 26 12:09:28 mac-mini12 corosync[1232]:   [QUORUM] Sync members[1]: 2
Apr 26 12:09:28 mac-mini12 corosync[1232]:   [QUORUM] Sync left[2]: 1 3
Apr 26 12:09:28 mac-mini12 corosync[1232]:   [TOTEM ] A new membership (2.3b95) was formed. Members left: 1 3
Apr 26 12:09:28 mac-mini12 corosync[1232]:   [TOTEM ] Failed to receive the leave message. failed: 1 3
Apr 26 12:09:28 mac-mini12 pmxcfs[1138]: [dcdb] notice: members: 2/1138
Apr 26 12:09:28 mac-mini12 pmxcfs[1138]: [status] notice: members: 2/1138
Apr 26 12:09:28 mac-mini12 corosync[1232]:   [QUORUM] This node is within the non-primary component and will NOT provide any se>
Apr 26 12:09:28 mac-mini12 corosync[1232]:   [QUORUM] Members[1]: 2
Apr 26 12:09:28 mac-mini12 corosync[1232]:   [MAIN  ] Completed service synchronization, ready to provide service.
Apr 26 12:09:28 mac-mini12 pmxcfs[1138]: [status] notice: node lost quorum
Apr 26 12:09:28 mac-mini12 pmxcfs[1138]: [dcdb] crit: received write while not quorate - trigger resync
Apr 26 12:09:28 mac-mini12 pmxcfs[1138]: [dcdb] crit: leaving CPG group
Apr 26 12:09:28 mac-mini12 pve-ha-crm[1291]: lost lock 'ha_manager_lock - cfs lock update failed - Operation not permitted
Apr 26 12:09:29 mac-mini12 pmxcfs[1138]: [dcdb] notice: start cluster connection
Apr 26 12:09:29 mac-mini12 pmxcfs[1138]: [dcdb] crit: cpg_join failed: 14
Apr 26 12:09:29 mac-mini12 pve-ha-lrm[1399]: lost lock 'ha_agent_mac-mini12_lock - cfs lock update failed - Device or resource >
Apr 26 12:09:29 mac-mini12 pmxcfs[1138]: [dcdb] crit: can't initialize service
Apr 26 12:09:30 mac-mini12 pvestatd[1272]: storage 'NAS' is not online
Apr 26 12:09:30 mac-mini12 pve-ha-crm[1291]: status change master => lost_manager_lock
Apr 26 12:09:30 mac-mini12 pve-ha-crm[1291]: watchdog closed (disabled)
Apr 26 12:09:30 mac-mini12 pve-ha-crm[1291]: status change lost_manager_lock => wait_for_quorum
Apr 26 12:09:31 mac-mini12 pvestatd[1272]: status update time (10.722 seconds)
Apr 26 12:09:32 mac-mini12 pve-ha-lrm[1399]: status change active => lost_agent_lock
Apr 26 12:09:35 mac-mini12 pmxcfs[1138]: [dcdb] notice: members: 2/1138
Apr 26 12:09:35 mac-mini12 pmxcfs[1138]: [dcdb] notice: all data is up to date
Apr 26 12:09:36 mac-mini12 pvestatd[1272]: storage 'NAS' is not online
Apr 26 12:09:36 mac-mini12 pvestatd[1272]: status update time (5.725 seconds)
Apr 26 12:09:47 mac-mini12 pvestatd[1272]: storage 'NAS' is not online
Apr 26 12:09:47 mac-mini12 pvestatd[1272]: status update time (5.749 seconds)
Apr 26 12:09:56 mac-mini12 pvestatd[1272]: storage 'NAS' is not online
Apr 26 12:09:57 mac-mini12 pvestatd[1272]: status update time (5.741 seconds)

emmetre · Apr 26, 2024

Error with HP gen8,
DMAR is already set (completely freeze of node)
Reverted to old kernel (6.5)

aaron · Apr 26, 2024

Dimk said:
My nodes are constantly crashing (every few minutes or hours), different hardware

Do you have HA enabled guests? The Corosync logs indicate a network issue. Check the logs if anything is logged before regarding the NICs/network. Check your cables and if that does not help, please consider opening a new thread.
As a workaround, you can set all HA enabled guests to "ignore" state. Then after then minutes, the LRM on each node should switch back to idle mode.
Once in idle mode, the nodes will not fence (hard reset) themselves if the corosync connection is down for too long (1 minute). Giving you time to troubleshoot.

Sebastian1 · Apr 26, 2024

upgraded 3 hosts with ceph cluster yesterday, coming from 8.1.4
no ping after reboot

checked: ip link show
all interfaces DOWN

Broadcom BCM57416 & BCM57508

tried ifup eno1np0
Error: another instance of ifup is already running

found this
https://forum.proxmox.com/threads/network-wont-start-on-boot.83772/post-368240

- systemctl restart systemd-udevd
- systemctl restart networking

solves problem until next reboot

the startup jobs run into timeout at boot
- systemd-udev-settle.service/start running
- ifupdown2-pre.service/start running

no crashes so far with kernel 6.8

Dimk · Apr 26, 2024

aaron said:
Do you have HA enabled guests? The Corosync logs indicate a network issue. Check the logs if anything is logged before regarding the NICs/network. Check your cables and if that does not help, please consider opening a new thread.
As a workaround, you can set all HA enabled guests to "ignore" state. Then after then minutes, the LRM on each node should switch back to idle mode.
Once in idle mode, the nodes will not fence (hard reset) themselves if the corosync connection is down for too long (1 minute). Giving you time to troubleshoot.

Indeed, looks like it was a faulty cable, switched a cable and uptime is currently looking good, bad coincidence that it started happening after the upgrade.

Daniel Keller · Apr 26, 2024

Sebastian1 said:
upgraded 3 hosts with ceph cluster yesterday, coming from 8.1.4
no ping after reboot

checked: ip link show
all interfaces DOWN

Broadcom BCM57416 & BCM57508

tried ifup eno1np0
Error: another instance of ifup is already running

found this
https://forum.proxmox.com/threads/network-wont-start-on-boot.83772/post-368240

- systemctl restart systemd-udevd
- systemctl restart networking

solves problem until next reboot

the startup jobs run into timeout at boot
- systemd-udev-settle.service/start running
- ifupdown2-pre.service/start running

no crashes so far with kernel 6.8

do you have an error form bnxt_en kernel module in your logs? then it could be the same error as here

https://forum.proxmox.com/threads/o...le-on-test-no-subscription.144557/post-652507

putting the kernel module on the blocklist should help

SInisterPisces · Apr 26, 2024

kesawi said:
Has anyone provided an official answer to these questions?

Update: No official word, but I've not experienced any problems pinning the latest version of 6.5 in PVE 8.2.

If you upgraded from Proxmox 8.1, you should have it. Otherwise, you'll need to install the image.

Code:

# apt search proxmox-kernel-6.5.13-5
Sorting... Done
Full Text Search... Done
proxmox-kernel-6.5.13-5-pve/stable,now 6.5.13-5 amd64 [installed]
  Proxmox Kernel Image

proxmox-kernel-6.5.13-5-pve-signed/stable 6.5.13-5 amd64
  Proxmox Kernel Image (signed)

proxmox-kernel-6.5.13-5-pve-signed-template/stable 6.5.13-5 amd64
  Template for signed kernel package

You'd need to install one of those and probably reboot.

Someone whose system has working secure boot support should confirm, but I believe the "pre-signed" version is only for secure boot. If there are reasons to use it without secure boot, I'd really like to know.

To pin the kernel, use the Proxmox boot tool:
First, list the installed kernels that you can pin using the list command, below.
Then use the proxmox-boot tool kernel pin command (I don't show this, as mine's already done, using the kernel name exactly as it's listed (e.g., "6.5.13-5-pve" without quotes).

Code:

root@andromeda2:~# proxmox-boot-tool kernel list
Manually selected kernels:
None.

Automatically selected kernels:
6.5.13-5-pve
6.8.4-2-pve

Pinned kernel:
6.5.13-5-pve

It'll refresh, and when you list the kernels again it should list as pinned. Otherwise you'll need a reboot.

If you're going to be doing any compiling/building of modules be sure to grab the appropriate header package for your pinned kernel.

Astraea · Apr 26, 2024

I just finished the first batch of updates and had no issues on any of the systems I updated. With the possible issue around interface renaming, I did do the assign network naming changes that have been mentioned before and rebooted after applying those changes before doing the update.

So far my list of successful updates to 8.2 includes:

My Workstation - (ROG Gaming-E Wifi II with a 5950x and 128GB RAM, 2 x GPUs, 4 x SSDs, 2 x NVMEs and 4 x HDDs)
My Media Server - (Asus Motherboard with a 5600G and 32GB RAM, 1 x GPU, 2 x SSDs and 24 x HDDs)
3 QNAP white boxes working in a Ceph Cluster (TS-453A, 2 x TS-451)
My Backup Server - (Asus Motherboard with a i3-3220 and 16GB RAM, no GPUs, 2 SSDs and 8 x HDDs)

merasil · Apr 26, 2024

SInisterPisces said:
Update: No official word, but I've not experienced any problems pinning the latest version of 6.5 in PVE 8.2.

If you upgraded from Proxmox 8.1, you should have it. Otherwise, you'll need to install the image.

Code:

# apt search proxmox-kernel-6.5.13-5 Sorting... Done Full Text Search... Done proxmox-kernel-6.5.13-5-pve/stable,now 6.5.13-5 amd64 [installed] Proxmox Kernel Image proxmox-kernel-6.5.13-5-pve-signed/stable 6.5.13-5 amd64 Proxmox Kernel Image (signed) proxmox-kernel-6.5.13-5-pve-signed-template/stable 6.5.13-5 amd64 Template for signed kernel package

You'd need to install one of those and probably reboot.

Someone whose system has working secure boot support should confirm, but I believe the "pre-signed" version is only for secure boot. If there are reasons to use it without secure boot, I'd really like to know.

To pin the kernel, use the Proxmox boot tool:
First, list the installed kernels that you can pin using the list command, below.
Then use the proxmox-boot tool kernel pin command (I don't show this, as mine's already done, using the kernel name exactly as it's listed (e.g., "6.5.13-5-pve" without quotes).

Code:

root@andromeda2:~# proxmox-boot-tool kernel list Manually selected kernels: None. Automatically selected kernels: 6.5.13-5-pve 6.8.4-2-pve Pinned kernel: 6.5.13-5-pve

It'll refresh, and when you list the kernels again it should list as pinned. Otherwise you'll need a reboot.

If you're going to be doing any compiling/building of modules be sure to grab the appropriate header package for your pinned kernel.

At least pinning that Kernel helped as a workaround… but if its an Kernel issue there should be a lot of more complains or am i wrong?

SInisterPisces · Apr 26, 2024

merasil said:
At least pinning that Kernel helped as a workaround… but if its an Kernel issue there should be a lot of more complains or am i wrong?

Kernel 6.8 hit Ubuntu LTS 24.04 this week.

Third party hardware vendors are going to need a little time to get working drivers released.

I've been around for a few major PVE updates, and this one seemed to be particularly rough on hardware compatibility. Kernel 6.8 is a huge change.

d.oishi · Apr 27, 2024

I tried using Proxmox Automated Installation but found that the proxmox-auto-install-assistant command was not included in either nodes updated with PVE 8.2 or in nodes freshly installed with the PVE 8.2 ISO.

I was able to resolve this by installing the command using apt install proxmox-auto-install-assistant. Could you please clarify the following:

Is it intended that the command is not included in the installation from an update or ISO?
Are there any plans to include this command in future releases for installation from updates or ISOs?
If there are no such plans, could you please document the necessity of installing the command in the manual?

Neobin · Apr 27, 2024

d.oishi said:
could you please document the necessity of installing the command in the manual?

Assistant Tool
The proxmox-auto-install-assistant tool provides the prepare-iso sub-command which can be used to prepare a new enough, but otherwise standard ISO of a Proxmox project for automated installation. You will have to install it first:
apt install proxmox-auto-install-assistant

https://pve.proxmox.com/wiki/Automated_Installation#Assistant_Tool

spirit · Apr 27, 2024

Kuonel said:
Getting random complete server freeze after the update:
cpu: epyc 3
mobo: supermicro h12-ssl
nic: intel X520-DA2
kernel: 6.8.4-2-pve

8 nodes, licensed proxmox, hyperconverged ceph (17.2.7-pve3)
happened 5 times in ~24 hours
random load, random timings, random servers

What is the safe way revert to kernel 6.5 ?

Thank you.

I got also freeze/crash, only on nodes with hyperconverged ceph.

I'm using encrypted nvme osd on theses servers.

epyc v3 with levono server (I post info && log previously in this thread).

do you have any log ?

Proxmox VE 8.2 released!

New Member

New Member

Member

Member

New Member

New Member

Proxmox Staff Member

Member

New Member

Proxmox Staff Member

New Member

Member

Renowned Member

Well-Known Member

Renowned Member

Active Member

Well-Known Member

New Member

Distinguished Member

Distinguished Member

We value your privacy