Proxmox VE 8.1 released!

Hi All,

I have a (maybe) weird issue on a pair of dell servers (PowerEdge R240):

When I try to boot the 6.5.11-4-pve kernel it freezes at "Loading initial ramdisk" - no further messages are showing up, the server isn't coming up.

The older Kernels work without issue (and one older HP Server in the infrastructure does not have this issue).

Any idea what could have gone wrong?

Thanks
Tobias
 
I have a (maybe) weird issue on a pair of dell servers (PowerEdge R240):

When I try to boot the 6.5.11-4-pve kernel it freezes at "Loading initial ramdisk" - no further messages are showing up, the server isn't coming up.

The older Kernels work without issue (and one older HP Server in the infrastructure does not have this issue).

Any idea what could have gone wrong?
This is likely an issue with the handover to the right frame buffer inside the initrd which blocks further output to the monitor, while the server most likely comes up just fine, see https://forum.proxmox.com/threads/pve-8-0-and-8-1-hangs-on-boot.137033/#post-609213
 
If AutoTrim is enabled on your SSD ZFS pool, try turning it off and reboot your host.
zpool autotrim=off yourpool
...
Pretty good finding; appreciate it. I changed the autotrim to off on my two SSD pools and the systemload went back to normal after restarting my system with the new 6.5 kernel.
 
Cluster unstable after upgrade 8.0 -> 8.-1

I have a "small" cluster at home using 5 x Intel NUCs + 1 x homemade machine.

After I upgraded and rebooted the machines they all started rebooting about every 15 minutes, even if they are set in maintenance mode.
I downgraded the kernel 6.5.11-4-pve -> 6.2.16-19-pve on 3 machines and now they "only" reboot once a day.
So in short the machines running kernel 6.5.11-4-pve reboot every 15 minuted while in maintenance mode. The machines running 6.2.16-19-pve reboot once a day.

syslog shows no errors before the restart.

Also HA stopped working. Starting a vm would set it in HA state "started", but it would not start.

Solved that with this workaround:
Code:
cd /etc/pve/ha/
 mv resources.cfg resources.cfg.tmp
touch resources.cfg
# waited a few seconds
mv resources.cfg.tmp resources.cfg
And the vms could be started again.
 
Last edited:
Cluster unstable after upgrade 8.0 -> 8.-1

I have a "small" cluster at home using 5 x Intel NUCs + 1 x homemade machine.

They are known bug with Nuc intel nics and offloading (search in the forum).

But they have sporadic hang if you don't turn off offloading with for example

"
ethtool -K eth0 gso off gro off tso off tx off rx off rxvlan off txvlan off sg off
"

That could explain the reboots, if HA is enabled and network hang.
 
This 8.1 is fantastic!
Is it possible to have in future support for Telegram notification ? Gotify is only android and we only use IOS will be so nice to have Telegram support too.

Last question :) is it safe to do a "zpool upgrade" of my pools ?

Thanks for your work.
 
Hello all
When upgraded proxmox from 8.0 to 8.1 i had running a debian12 VM in production with quemu 8.0.2, server become unstable, dell r730xd.
When review changlog from Proxmox 8.1 qemu version is 8.1.2, so, stopped VM, changed the version it in meta file and for now its all ok.
Regards

1701191723168.png
 
Upgraded and now the server sends backup e-mails, even if the backup is perfectly successful (we use on error only). Why?
 
Will amd-pstate-epp be coming to proxmox kernels?

Kernel 6.5 is the version where it was made default but that does not seem to be the case in proxmox nor can I activate it by adding "amd_pstate=active" into grub.. Maybe there are some reasons the pstate epp driver should not be used in proxmox?

It would be really nice to get this supported to allow for more energy efficiency, or to squeze a little more performance of such amd cpus. In my testing the "epp performance performance" driver outpeforms "ahcpi performance" in peformance and slightly in efficiency and "powersave balance-performance" gets better single thread performance and higher efficiency yet but lower multicore (the rest just increase efficiency further at cost of performance), which seem to correlate with phoronix testing done on the different drivers.
 
Will amd-pstate-epp be coming to proxmox kernels?

Kernel 6.5 is the version where it was made default but that does not seem to be the case in proxmox nor can I activate it by adding "amd_pstate=active" into grub.. Maybe there are some reasons the pstate epp driver should not be used in proxmox?

It would be really nice to get this supported to allow for more energy efficiency, or to squeze a little more performance of such amd cpus. In my testing the "epp performance performance" driver outpeforms "ahcpi performance" in peformance and slightly in efficiency and "powersave balance-performance" gets better single thread performance and higher efficiency yet but lower multicore (the rest just increase efficiency further at cost of performance), which seem to correlate with phoronix testing done on the different drivers.
amp_state=active with epp is the default for me (AMD EPYC 7543 32-Core Processor)

Code:
cat /boot/config-6.5.11-1-pve |grep PSTATE
CONFIG_X86_INTEL_PSTATE=y
CONFIG_X86_AMD_PSTATE=y
CONFIG_X86_AMD_PSTATE_DEFAULT_MODE=3

Code:
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver 
amd-pstate

Code:
#cpupower frequency-info
analyzing CPU 0:
  driver: amd-pstate
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency: 20.0 us
  hardware limits: 400 MHz - 3.74 GHz
  available cpufreq governors: conservative ondemand userspace powersave performance schedutil
  current policy: frequency should be within 400 MHz and 3.74 GHz.
                  The governor "performance" may decide which speed to use
                  within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 2.77 GHz (asserted by call to kernel)
  boost state support:
    Supported: yes
    Active: yes
    AMD PSTATE Highest Performance: 255. Maximum Frequency: 3.74 GHz.
    AMD PSTATE Nominal Performance: 191. Nominal Frequency: 2.80 GHz.
    AMD PSTATE Lowest Non-linear Performance: 103. Lowest Non-linear Frequency: 1.51 GHz.
    AMD PSTATE Lowest Performance: 28. Lowest Frequency: 400 MHz.
 
Hi,
Upgraded and now the server sends backup e-mails, even if the backup is perfectly successful (we use on error only). Why?
are you using notification mode Email (legacy) or Auto with a mail address configured? If no mail address is set and you use mode Auto, the new notification system will be used. There is already a pending patch to make this more obvious in the UI. Otherwise, please share the backup job configuration (from /etc/pve/jobs.cfg).
 
Has anyone tried to passthrough AMD iGPUs/GPUs? Is the reset bug problem fixed or did at least the situation improve improved?

I'm asking with having in mind doing a passthrough with a 5750GE / Cezanne iGPU.
 
amp_state=active with epp is the default for me (AMD EPYC 7543 32-Core Processor)

Code:
cat /boot/config-6.5.11-1-pve |grep PSTATE
CONFIG_X86_INTEL_PSTATE=y
CONFIG_X86_AMD_PSTATE=y
CONFIG_X86_AMD_PSTATE_DEFAULT_MODE=3

Code:
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
amd-pstate

Code:
#cpupower frequency-info
analyzing CPU 0:
  driver: amd-pstate
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency: 20.0 us
  hardware limits: 400 MHz - 3.74 GHz
  available cpufreq governors: conservative ondemand userspace powersave performance schedutil
  current policy: frequency should be within 400 MHz and 3.74 GHz.
                  The governor "performance" may decide which speed to use
                  within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 2.77 GHz (asserted by call to kernel)
  boost state support:
    Supported: yes
    Active: yes
    AMD PSTATE Highest Performance: 255. Maximum Frequency: 3.74 GHz.
    AMD PSTATE Nominal Performance: 191. Nominal Frequency: 2.80 GHz.
    AMD PSTATE Lowest Non-linear Performance: 103. Lowest Non-linear Frequency: 1.51 GHz.
    AMD PSTATE Lowest Performance: 28. Lowest Frequency: 400 MHz.

That is not amd_pstate=active, but rather amd_pstate=passive. If you would have amd pstate epp then you should see:
Code:
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
amd-pstate-epp
&
Code:
cpupower frequency-info
analyzing CPU 10:
  driver: amd-pstate-epp
  ..
  ..

This is also visible if you look at the available scedulers and the energy performance preference:
Code:
cd /sys/devices/system/cpu/cpu0/cpufreq/
..
energy_performance_available_preferences (typically: performance balance_performance balance_power power)
energy_performance_preference (the set EPP)
scaling_available_governors (only: performance powersave)
scaling_governor (the set governor)
..

When amd-pstate-epp is used, ONDEMAND, SCHEDUTIL and such other governors will NOT be available (only the EPP specific performance and powersave, see available_governors).

Relevant resources: Phoronix, docs.kernel.org

Interesting however that your EPYC defaults to amd_pstate=passive while my Ryzen 5600X defaults to acpi-cpufreq.. Or maybe I changed it but forgot but dunno why I would do that..

EDIT:
However CONFIG_X86_AMD_PSTATE_DEFAULT_MODE=3 does indeed mean it should default to the Active (EPP), so it seems it is not supported or something
 
Last edited:
FYI: There's an updated kernel, proxmox-kernel-6.5.11-6-pve (and its signed variant), that includes a targeted fix for the longer-standing dnode ZFS issue. This fix, having been accepted upstream, has been backported and is already released, even as upstream continues working on their 2.2.2 release.

With this update, we could not reproduce the issue using the reproducer script, which is designed to specifically trigger the edge case where this problem could occur.

Please note that while we have updated both the kernel for the module and the user space packages to avoid potential confusion about mismatching versions (as reported by, for example, zfs -V), the actual fix is specific to the kernel module and thus requires a reboot to be applied.

The kernel is available on pvetest and pve-no-subscription. As we've only made the minimal changes required to fix the issue, this dramatically reduces the potential for regressions. Therefore, it should become available on pve-enterprise soon (in a few days at most), assuming no issues arise. After that, we will proceed to rebuild the ISO. Additionally, we're working on a backport for older but still supported kernel releases and their ZFS versions.
 
I just would like to be abundantly clear on the upgrade path. We've a cluster running 8.0.4, with ceph (quincy). 3 nodes.

Besides the normal precautions (vigorous backups with verify's and testing), this is really as simple as saying "upgrade" button and letting it fly? And then seperately so the ceph upgrade as shown in the docs?

During upgrading of ceph, it's ok that some hosts are running quincy and others are running reef for a short period while upgrading?
If you follow exactly the guide on ceph upgrade it is all good. I made the upgrade. Yes, it's ok to have some nodes running older version. It will report it on ceph status page also, but works. Make the upgrade as guide says and restart the daemons.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!