Proxmox VE 8.1 released!

iprigger · Nov 27, 2023

Hi All,

I have a (maybe) weird issue on a pair of dell servers (PowerEdge R240):

When I try to boot the 6.5.11-4-pve kernel it freezes at "Loading initial ramdisk" - no further messages are showing up, the server isn't coming up.

The older Kernels work without issue (and one older HP Server in the infrastructure does not have this issue).

Any idea what could have gone wrong?

Thanks
Tobias

t.lamprecht · Nov 27, 2023

iprigger said:
I have a (maybe) weird issue on a pair of dell servers (PowerEdge R240):

When I try to boot the 6.5.11-4-pve kernel it freezes at "Loading initial ramdisk" - no further messages are showing up, the server isn't coming up.

The older Kernels work without issue (and one older HP Server in the infrastructure does not have this issue).

Any idea what could have gone wrong?

This is likely an issue with the handover to the right frame buffer inside the initrd which blocks further output to the monitor, while the server most likely comes up just fine, see https://forum.proxmox.com/threads/pve-8-0-and-8-1-hangs-on-boot.137033/#post-609213

iprigger · Nov 27, 2023

t.lamprecht said:
This is likely an issue with the handover to the right frame buffer inside the initrd which blocks further output to the monitor, while the server most likely comes up just fine, see https://forum.proxmox.com/threads/pve-8-0-and-8-1-hangs-on-boot.137033/#post-609213

Hi,

In my case the servers are NOT reachable via the network... just sayin'

Tobias

holger_p. · Nov 28, 2023

juliokele said:
If AutoTrim is enabled on your SSD ZFS pool, try turning it off and reboot your host.
zpool autotrim=off yourpool
...

Pretty good finding; appreciate it. I changed the autotrim to off on my two SSD pools and the systemload went back to normal after restarting my system with the new 6.5 kernel.

IT ProCare · Nov 28, 2023

iprigger said:
Hi,

In my case the servers are NOT reachable via the network... just sayin'

Tobias

In our case it's the same:

https://forum.proxmox.com/threads/dell-r240-fail-to-start-after-kernel-update-6-5-11-3pve.136826/

NorViking · Nov 28, 2023

Cluster unstable after upgrade 8.0 -> 8.-1

I have a "small" cluster at home using 5 x Intel NUCs + 1 x homemade machine.

After I upgraded and rebooted the machines they all started rebooting about every 15 minutes, even if they are set in maintenance mode.
I downgraded the kernel 6.5.11-4-pve -> 6.2.16-19-pve on 3 machines and now they "only" reboot once a day.
So in short the machines running kernel 6.5.11-4-pve reboot every 15 minuted while in maintenance mode. The machines running 6.2.16-19-pve reboot once a day.

syslog shows no errors before the restart.

Also HA stopped working. Starting a vm would set it in HA state "started", but it would not start.

Solved that with this workaround:

Code:

cd /etc/pve/ha/
 mv resources.cfg resources.cfg.tmp
touch resources.cfg
# waited a few seconds
mv resources.cfg.tmp resources.cfg

And the vms could be started again.

spirit · Nov 28, 2023

NorViking said:
Cluster unstable after upgrade 8.0 -> 8.-1

I have a "small" cluster at home using 5 x Intel NUCs + 1 x homemade machine.

They are known bug with Nuc intel nics and offloading (search in the forum).

But they have sporadic hang if you don't turn off offloading with for example

"
ethtool -K eth0 gso off gro off tso off tx off rx off rxvlan off txvlan off sg off
"

That could explain the reboots, if HA is enabled and network hang.

SonGon · Nov 28, 2023

This 8.1 is fantastic!
Is it possible to have in future support for Telegram notification ? Gotify is only android and we only use IOS will be so nice to have Telegram support too.

Last question

is it safe to do a "zpool upgrade" of my pools ?

Thanks for your work.

iprigger · Nov 28, 2023

spirit said:
bug with Nuc intel nics and offloading (search in the forum).

But they have sporadic hang if you don't turn off offloading with for example

IT ProCare said:
In our case it's the same:

https://forum.proxmox.com/threads/dell-r240-fail-to-start-after-kernel-update-6-5-11-3pve.136826/

Hi,

Any solution / work-arounds for that (other than never ever buying a server that's name rhymes with "hell" again)?

Tobias

xombra · Nov 28, 2023

Hello all
When upgraded proxmox from 8.0 to 8.1 i had running a debian12 VM in production with quemu 8.0.2, server become unstable, dell r730xd.
When review changlog from Proxmox 8.1 qemu version is 8.1.2, so, stopped VM, changed the version it in meta file and for now its all ok.
Regards

Gilberto Ferreira · Nov 28, 2023

Nice touch!

iprigger · Nov 28, 2023

Gilberto Ferreira said:
Nice touch!
View attachment 58957

that is indeed nice (I just did that manually before... so no real issue

)

patrick7 · Nov 28, 2023

Upgraded and now the server sends backup e-mails, even if the backup is perfectly successful (we use on error only). Why?

nMax · Nov 29, 2023

Will amd-pstate-epp be coming to proxmox kernels?

Kernel 6.5 is the version where it was made default but that does not seem to be the case in proxmox nor can I activate it by adding "amd_pstate=active" into grub.. Maybe there are some reasons the pstate epp driver should not be used in proxmox?

It would be really nice to get this supported to allow for more energy efficiency, or to squeze a little more performance of such amd cpus. In my testing the "epp performance performance" driver outpeforms "ahcpi performance" in peformance and slightly in efficiency and "powersave balance-performance" gets better single thread performance and higher efficiency yet but lower multicore (the rest just increase efficiency further at cost of performance), which seem to correlate with phoronix testing done on the different drivers.

spirit · Nov 29, 2023

nMax said:
Will amd-pstate-epp be coming to proxmox kernels?

Kernel 6.5 is the version where it was made default but that does not seem to be the case in proxmox nor can I activate it by adding "amd_pstate=active" into grub.. Maybe there are some reasons the pstate epp driver should not be used in proxmox?

It would be really nice to get this supported to allow for more energy efficiency, or to squeze a little more performance of such amd cpus. In my testing the "epp performance performance" driver outpeforms "ahcpi performance" in peformance and slightly in efficiency and "powersave balance-performance" gets better single thread performance and higher efficiency yet but lower multicore (the rest just increase efficiency further at cost of performance), which seem to correlate with phoronix testing done on the different drivers.

amp_state=active with epp is the default for me (AMD EPYC 7543 32-Core Processor)

Code:

cat /boot/config-6.5.11-1-pve |grep PSTATE
CONFIG_X86_INTEL_PSTATE=y
CONFIG_X86_AMD_PSTATE=y
CONFIG_X86_AMD_PSTATE_DEFAULT_MODE=3

Code:

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver 
amd-pstate

Code:

#cpupower frequency-info
analyzing CPU 0:
  driver: amd-pstate
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency: 20.0 us
  hardware limits: 400 MHz - 3.74 GHz
  available cpufreq governors: conservative ondemand userspace powersave performance schedutil
  current policy: frequency should be within 400 MHz and 3.74 GHz.
                  The governor "performance" may decide which speed to use
                  within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 2.77 GHz (asserted by call to kernel)
  boost state support:
    Supported: yes
    Active: yes
    AMD PSTATE Highest Performance: 255. Maximum Frequency: 3.74 GHz.
    AMD PSTATE Nominal Performance: 191. Nominal Frequency: 2.80 GHz.
    AMD PSTATE Lowest Non-linear Performance: 103. Lowest Non-linear Frequency: 1.51 GHz.
    AMD PSTATE Lowest Performance: 28. Lowest Frequency: 400 MHz.

fiona · Nov 29, 2023

Hi,

patrick7 said:
Upgraded and now the server sends backup e-mails, even if the backup is perfectly successful (we use on error only). Why?

are you using notification mode Email (legacy) or Auto with a mail address configured? If no mail address is set and you use mode Auto, the new notification system will be used. There is already a pending patch to make this more obvious in the UI. Otherwise, please share the backup job configuration (from /etc/pve/jobs.cfg).

cRaZy-bisCuiT · Nov 29, 2023

Has anyone tried to passthrough AMD iGPUs/GPUs? Is the reset bug problem fixed or did at least the situation improve improved?

I'm asking with having in mind doing a passthrough with a 5750GE / Cezanne iGPU.

nMax · Nov 29, 2023

spirit said:

amp_state=active with epp is the default for me (AMD EPYC 7543 32-Core Processor)

Code:

cat /boot/config-6.5.11-1-pve |grep PSTATE
CONFIG_X86_INTEL_PSTATE=y
CONFIG_X86_AMD_PSTATE=y
CONFIG_X86_AMD_PSTATE_DEFAULT_MODE=3

Code:

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
amd-pstate

Code:

#cpupower frequency-info
analyzing CPU 0:
  driver: amd-pstate
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency: 20.0 us
  hardware limits: 400 MHz - 3.74 GHz
  available cpufreq governors: conservative ondemand userspace powersave performance schedutil
  current policy: frequency should be within 400 MHz and 3.74 GHz.
                  The governor "performance" may decide which speed to use
                  within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 2.77 GHz (asserted by call to kernel)
  boost state support:
    Supported: yes
    Active: yes
    AMD PSTATE Highest Performance: 255. Maximum Frequency: 3.74 GHz.
    AMD PSTATE Nominal Performance: 191. Nominal Frequency: 2.80 GHz.
    AMD PSTATE Lowest Non-linear Performance: 103. Lowest Non-linear Frequency: 1.51 GHz.
    AMD PSTATE Lowest Performance: 28. Lowest Frequency: 400 MHz.

That is not amd_pstate=active, but rather amd_pstate=passive. If you would have amd pstate epp then you should see:

Code:

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
amd-pstate-epp

&

Code:

cpupower frequency-info
analyzing CPU 10:
  driver: amd-pstate-epp
  ..
  ..

This is also visible if you look at the available scedulers and the energy performance preference:

Code:

cd /sys/devices/system/cpu/cpu0/cpufreq/
..
energy_performance_available_preferences (typically: performance balance_performance balance_power power)
energy_performance_preference (the set EPP)
scaling_available_governors (only: performance powersave)
scaling_governor (the set governor)
..

When amd-pstate-epp is used, ONDEMAND, SCHEDUTIL and such other governors will NOT be available (only the EPP specific performance and powersave, see available_governors).

Relevant resources: Phoronix, docs.kernel.org

Interesting however that your EPYC defaults to amd_pstate=passive while my Ryzen 5600X defaults to acpi-cpufreq.. Or maybe I changed it but forgot but dunno why I would do that..

EDIT:
However CONFIG_X86_AMD_PSTATE_DEFAULT_MODE=3 does indeed mean it should default to the Active (EPP), so it seems it is not supported or something

t.lamprecht · Nov 29, 2023

FYI: There's an updated kernel, proxmox-kernel-6.5.11-6-pve (and its signed variant), that includes a targeted fix for the longer-standing dnode ZFS issue. This fix, having been accepted upstream, has been backported and is already released, even as upstream continues working on their 2.2.2 release.

With this update, we could not reproduce the issue using the reproducer script, which is designed to specifically trigger the edge case where this problem could occur.

Please note that while we have updated both the kernel for the module and the user space packages to avoid potential confusion about mismatching versions (as reported by, for example, zfs -V), the actual fix is specific to the kernel module and thus requires a reboot to be applied.

The kernel is available on pvetest and pve-no-subscription. As we've only made the minimal changes required to fix the issue, this dramatically reduces the potential for regressions. Therefore, it should become available on pve-enterprise soon (in a few days at most), assuming no issues arise. After that, we will proceed to rebuild the ISO. Additionally, we're working on a backport for older but still supported kernel releases and their ZFS versions.

Jannoke · Nov 29, 2023

arubenstein said:
I just would like to be abundantly clear on the upgrade path. We've a cluster running 8.0.4, with ceph (quincy). 3 nodes.

Besides the normal precautions (vigorous backups with verify's and testing), this is really as simple as saying "upgrade" button and letting it fly? And then seperately so the ceph upgrade as shown in the docs?

During upgrading of ceph, it's ok that some hosts are running quincy and others are running reef for a short period while upgrading?

If you follow exactly the guide on ceph upgrade it is all good. I made the upgrade. Yes, it's ok to have some nodes running older version. It will report it on ceph status page also, but works. Make the upgrade as guide says and restart the daemons.

Proxmox VE 8.1 released!

Renowned Member

Proxmox Staff Member

Renowned Member

New Member

Active Member

New Member

Distinguished Member

New Member

Renowned Member

New Member

Renowned Member

Renowned Member

Well-Known Member

New Member

Distinguished Member

Proxmox Staff Member

Member

New Member

Proxmox Staff Member

Renowned Member

We value your privacy