[SOLVED] [A lot of Errors] Systemd + Kernel on newer Hardware

Ramalama · Feb 23, 2021

Hi, to make it short:

The Hardware:
MB: Asrock Rack x570D4i-2T (Agesa 1.1.0.0)
CPU: Ryzen 7 5800x

The Bugs/Errors i get when booting:
-> cgroup2: Unknown parameter 'memory_recursiveprot'
---> Something that is incompatible with the actual systemd and kernel 5.4 (and is implemented since kernel 5.7)
---> https://github.com/systemd/systemd/blob/main/NEWS (Search for "memory_recursiveprot")

-> proc: Bad value for 'hidepid'
---> https://github.com/systemd/systemd/blob/main/NEWS (Search for "hidepid") (support since Kernel 5.8)

-> EDAC amd64: Error: F0 not found, device 0x1650 (broken BIOS?)
---> Ryzen 5000 (ECC/Edac) model fixes are implemented since Kernel 5.10
---> This is actually less iimportant, cause ecc works anyway.

-> snd_hda_intel 0000:2e:00.4: no codecs found!
---> Fixed in newer Kernels, since 5.7.
---> Not important at all, who needs sound on a hypervisor xD, but the message is annoying and could be fixed with a newer kernel.

-> do_IRQ: 1.55 No irq handler for vector
---> https://bbs.archlinux.org/viewtopic.php?id=256227&p=3
---> As far i understand, that are some issues with an older kernel + bios problems, that are related with iommu.

-> Failed to start Import ZFS pool XXXX
---> Not really an error, the proxmox team made just 3 services for the same thing...
---> So the error comes, because the pool is already imported, so its not really an error...

With all of that above, i can live, but now there is a systemd error that drives me absolutely crazy on almost every boot:
- console-setup.service: Failed to set invocation ID for unit: File exists
- systemd-tmpfiles-setup.service: Failed to set invocation ID for unit: File exists
- systemd-timesyncd.service: Failed to set invocation ID for unit: File exists
- time-sync.target: Failed to set invocation ID for unit: File exists
- systemd-update-utmp.service: Failed to set invocation ID for unit: File exists
- rsyslog.service: Failed to set invocation ID for unit: File exists
- pve-lxc-syscalld.service: Failed to set invocation ID for unit: File exists
---> https://github.com/systemd/systemd/issues/18184
---> RDRAND issues with older Agesa i guess.
---> is you have this issues either: add "SYSTEMD_RDRAND=0" to "/etc/default/grub"
---> Here is an example (only example) of my cmdline:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt nvme_core.default_ps_max_latency_us=1200 textonly video=astdrmfb video=efifb

ff SYSTEMD_RDRAND=0"

However, in the end it's not all only AGESA issues, we need a newer kernel either. 5.10+ would be optimal.
Glad god & Thanks to the devs, we have at least 5.4 and not the default 4.19 from debian.

Just please if the next LTS Kernel comes, jump on it xD
Thanks everyone & hopefully i could help some with the systemd issue!

Aside from all the hell i gone through, proxmox runs really stable, once its booted

And the overall performance is really good!

Cheers

t.lamprecht · Feb 23, 2021

Hi,

thanks for the feedback some comments to that.

Ramalama said:
-> cgroup2: Unknown parameter 'memory_recursiveprot'
---> Something that is incompatible with the actual systemd and kernel 5.4 (and is implemented since kernel 5.7)
---> https://github.com/systemd/systemd/blob/main/NEWS (Search for "memory_recursiveprot")

why do you enable it then? this is not a bug.

Ramalama said:
-> proc: Bad value for 'hidepid'
---> https://github.com/systemd/systemd/blob/main/NEWS (Search for "hidepid") (support since Kernel 5.8)

Unavailability of new features are not a bug, once a new Kernel is available with the next major release this is available too.

Ramalama said:
-> EDAC amd64: Error: F0 not found, device 0x1650 (broken BIOS?)
---> Ryzen 5000 (ECC/Edac) model fixes are implemented since Kernel 5.10
---> This is actually less iimportant, cause ecc works anyway.

If ECC would not work anyway this could be considered as problem, as ECC is a major feature for virtualization platforms.
But as it works, I see no immediate reason to start backporting stuff (which can be its own problem)

Ramalama said:
-> Failed to start Import ZFS pool XXXX
---> Not really an error, the proxmox team made just 3 services for the same thing...
---> So the error comes, because the pool is already imported, so its not really an error...

Huh? We do not make three services for the same thing...
But there was some improvement to "is a pool already mounted" detection in a libpve-storage-perl version 6.3-7, so maybe worth to try out that one.

Ramalama said:
-> do_IRQ: 1.55 No irq handler for vector
---> https://bbs.archlinux.org/viewtopic.php?id=256227&p=3
---> As far i understand, that are some issues with an older kernel + bios problems, that are related with iommu.

This is a recent Arch Linux thread, so old kernel is hardly the problem, or at least not yet fixed in newer ones.

Ramalama said:
With all of that above, i can live, but now there is a systemd error that drives me absolutely crazy on almost every boot:
- console-setup.service: Failed to set invocation ID for unit: File exists
- systemd-tmpfiles-setup.service: Failed to set invocation ID for unit: File exists
- systemd-timesyncd.service: Failed to set invocation ID for unit: File exists
- time-sync.target: Failed to set invocation ID for unit: File exists
- systemd-update-utmp.service: Failed to set invocation ID for unit: File exists
- rsyslog.service: Failed to set invocation ID for unit: File exists
- pve-lxc-syscalld.service: Failed to set invocation ID for unit: File exists
---> https://github.com/systemd/systemd/issues/18184
---> RDRAND issues with older Agesa i guess.
---> is you have this issues either: add "SYSTEMD_RDRAND=0" to "/etc/default/grub"
---> Here is an example (only example) of my cmdline:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt nvme_core.default_ps_max_latency_us=1200 textonly video=astdrmfb video=efifbff SYSTEMD_RDRAND=0"

Older AMD CPU models had this problem and RDRAND was "disabled" (removed from CPUID info so software did not saw that it would be supported) there, but there's no such patch in current linux tree from Linus, nor is there any patch posted to LKML I could find, here I'd actually perfer to hear this from an amd dev before starting to patch around in CPUID flags, which can be a bit of an delicate matter...

But thanks for posting a workaround, tip: you can use the [icode][/icode] for one liners and [code][/code] BBCode forum tags for multi line text which the forum should not touch (e.g., make smileys out of it) and should be formatted in monospace.

Ramalama said:
However, in the end it's not all only AGESA issues, we need a newer kernel either. 5.10+ would be optimal.
Glad god & Thanks to the devs, we have at least 5.4 and not the default 4.19 from debian.

Just please if the next LTS Kernel comes, jump on it xD
Thanks everyone & hopefully i could help some with the systemd issue!

Stable updates nowadays seem lots of backporting, and if there are issues then often just pinging the right devs on a patch, suggesting stable inclusion will do the trick for those times something was not deemed worthy for a backport (or forgotten to think about that).

Proxmox VE 6.x will stay on the 5.4 LTS branch, but we may make a newer one available as opt-in, nothing set in stone yet, though.

Ramalama · Feb 23, 2021

t.lamprecht said:
Hi,

thanks for the feedback some comments to that.

why do you enable it then? this is not a bug.

?
Before i ask, what the hell i have enabled, what do i need to disable? xD
Because it's almost a vanilla Proxmox install, the only thing i do, is:

Code:

- blacklist nvidia/radeon/nouveau/ixgbevf modules (gpu passthrough + x550 sriov virtualization)
- load this modules: vfio vfio_iommu_type1 vfio_pci vfio_virqfd overlay aufs (probably i don't need aufs, but overlay for docker in a container)
- GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt nvme_core.default_ps_max_latency_us=1200 textonly video=astdrmfb video=efifb:off SYSTEMD_RDRAND=0"

Then i have some custom scripts, that i wrapped into a Service:

Code:

[Unit]
Description=Script to enable SR-IOV on boot

[Service]
Type=oneshot
# Starting SR-IOV
ExecStart=/usr/bin/bash -c '/usr/bin/echo 2 > /sys/class/net/enp35s0f0/device/sriov_numvfs'
# Setting static MAC for VFs
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set enp35s0f0 vf 0 mac d0:50:99:db:fb:75'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set enp35s0f0 vf 1 mac d0:50:99:db:fb:76'
# Allow OPNsense to change mac & more
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set enp35s0f0 vf 0 trust on'
ExecStart=/usr/bin/bash -c '/usr/bin/ip link set enp35s0f0 vf 1 trust on'

# Add VM Mac-Addr to dev + myfixes
ExecStart=/usr/bin/bash -c '/root/scripts/vf_add_maddr.sh'
ExecStart=/usr/bin/bash -c '/root/scripts/checkservices.sh'

[Install]
WantedBy=multi-user.target

Code:

#!/usr/bin/bash

# vf_add_maddr.sh script
# script to register mac address of container or VM to the forwarding db of the bridge
# add it to contab for every minutes
#
CTCONFDIR=/etc/pve/nodes/proxmox/lxc
VMCONFDIR=/etc/pve/nodes/proxmox/qemu-server
IFBRIDGE=enp35s0f0
LBRIDGE=vmbr0

echo "=== start ==="

MAC_LIST_VMS=" $(cat ${VMCONFDIR}/*.conf | grep bridge | grep -Eo '([[:xdigit:]]{1,2}[:-]){5}[[:xdigit:]]{1,2}' | tr '[:upper:]' '[:lower:]') $(cat ${CTCONFDIR}/*.conf | grep hwaddr | grep -Eo '([[:xdigit:]]{1,2}[:-]){5}[[:xdigit:]]{1,2}' | tr '[:upper:]' '[:lower:]')"
MAC_ADD2LIST="$(cat /sys/class/net/$LBRIDGE/address)"
MAC_LIST="$MAC_LIST_VMS $MAC_ADD2LIST"

for mactoregister in ${MAC_LIST}
do
        if (/usr/sbin/bridge fdb show | grep "${IFBRIDGE} self permanent" | grep -q $mactoregister)
          then
                echo "MAC $mactoregister already configured"
          else
                echo "MAC $mactoregister not configured"
                echo "i add it with : /usr/sbin/bridge fdb add $mactoregister dev ${IFBRIDGE}"
                /usr/sbin/bridge fdb add $mactoregister dev ${IFBRIDGE}
                echo "Return code : $? : done for $mactoregister"
        fi
done

echo "=== end ==="

Code:

#!/usr/bin/bash
# checkservices.sh script

systemctl is-active --quiet pve-firewall.service || systemctl restart pve-firewall.service
systemctl is-active --quiet pvestatd || systemctl restart pvestatd
systemctl is-active --quiet cron || systemctl restart cron
systemctl is-active --quiet pveproxy.service || systemctl restart pveproxy.service
systemctl is-active --quiet spiceproxy || systemctl restart spiceproxy
systemctl is-active --quiet pve-lxc-syscalld || systemctl restart pve-lxc-syscalld
systemctl is-active --quiet smartmontools || systemctl restart smartmontools

And as last thing, i compiled/installed ixgbe 5.10.2 from intel, because the 5.1 driver in the 5.4 Kernel, doesn't support 2,5GB/s on x550-t2.
Sometimes it works with 5.1 ixgbe, i get a link with 2,5GB/s, but it randomly disconnects and is hella unstable.
5.10.2 with LRO disabled flag, works with 2,5GB/s!

Thats all my modifications to the default proxmox install xD

t.lamprecht said:
Unavailability of new features are not a bug, once a new Kernel is available with the next major release this is available too.

?
Thats the systemd that comes with debian/proxmox, not an updated one from buster-backports or something....
So the question should be then, why are new features included in systemd that aren't included in Kernel 5.4 or 4.19.
That means, that we get halfbacked not checked updates that could cause issues. (systemd version is 241)

t.lamprecht said:
Huh? We do not make three services for the same thing...
But there was some improvement to "is a pool already mounted" detection in a libpve-storage-perl version 6.3-7, so maybe worth to try out that one.

Eham, (Sorry it's 2 now, was pretty sure there were 3 xD), but still 2:
- zfs-import-cache.service
- zfs-import@.service

Okay, i found my error

HDD-Win doesn't exist, i forgot that i deleted that pool xD
And i don't remember if i destroyed that pool via cli or gui xD
So i can simply delete that Service or deactivate it. Sorry, this is my fault, didn't checked :-(

t.lamprecht said:
Older AMD CPU models had this problem and RDRAND was "disabled" (removed from CPUID info so software did not saw that it would be supported) there, but there's no such patch in current linux tree from Linus, nor is there any patch posted to LKML I could find, here I'd actually perfer to hear this from an amd dev before starting to patch around in CPUID flags, which can be a bit of an delicate matter...

But thanks for posting a workaround, tip: you can use the [icode][/icode] for one liners and [code][/code] BBCode forum tags for multi line text which the forum should not touch (e.g., make smileys out of it) and should be formatted in monospace.

Stable updates nowadays seem lots of backporting, and if there are issues then often just pinging the right devs on a patch, suggesting stable inclusion will do the trick for those times something was not deemed worthy for a backport (or forgotten to think about that).

Proxmox VE 6.x will stay on the 5.4 LTS branch, but we may make a newer one available as opt-in, nothing set in stone yet, though.

About RDRAND, for me it looks clearly like bios/agesa/hardware Problem.
I did contacted Asrock Rack again, will see if they reply at all, because i sended them already a message (a month ago) and they ignored it simply xD

About the opt-in kernel, that would be just amazing

I have tryed already the pve-edge kernel from that github repo, and well, it booted, but almost nothing worked, so i keep my fingers away from that kernel xD
But i seen that many people had success with pve-edge. So yeah. As far i remember, many drivers were missing on my system.
Then i tryed to compile my own pve-5.4 Kernel (vanilla without modifications), the compilation itself was flawless. Then i installed that kernel on my Proxmox host and i had same issues as with pve-edge, almost nothing worked.

However, an opt-in version would be amazing!
If that ever comes, im here to try it

Cheers

peteb · Feb 24, 2021

Hi,

I have very similar hardware:

MB: Asrock Rack X570D4U-2L2T
CPU: Ryzen 5950x

I updated PVE this morning with the new kernel (5.4.98) and it has taken my system from being stable to completely unstable. The previous kernel was (5.4.78).
I suspect this is mainly caused by the unstable Intel NIC drivers included in the new kernel as mentioned above.

I also have the other error messages as mentioned above too.

How do I revert to the previous kernel by default on boot?

How do I get a PVE kernel with the stable Intel drivers?

How do I update in the future and skip the newer kernel only and include the other updates - as these new kernels are unstable on new hardware?

On PVE I have a router, web server and various vm's running and everything is either unstable or not working after the kernel upgrade.

I've been using PVE for years and this is the first time I have experienced such large issues after an upgrade.

You can see the NICs behaving badly in the screenshot below and also the ipcc failures - not sure what these are but I need this machine running properly (as it's my router too) so I have just reinstalled PVE from the iso as I made the mistake of upgrading ZFS when the new kernel was just installed which means that the previous kernels can't boot it.

Reading here: https://forum.openmediavault.org/index.php?thread/38190-zfs-packages-update-issue-omv-5-6-0-1/
They are having issues with the new kernel (PVE) and with the upgraded ZFS. I believe my instability was also related to the upgraded ZFS file system.

Pete

Ramalama · Feb 24, 2021

We have the same board, D4I & D4U is identical, just another form factor.

The kernel is okay, your nic will come up again, it just takes forever after booting.
I have the same issue.
Check my grub commandline, to disable the rdrand systemd issue, then update initramfs & update grub.

Additionaly, you can install pve-headers and compile the new intel ixgbe nic drivers, just don't forget to compile the driver with LRO disabled. (Google simply)

I don't have time right now to get all the info together, cause im on work. But can do this later in the evening.

Cheers

t.lamprecht · Feb 24, 2021

Please open a new thread, besides some shared HW this seems to be completely unrelated.

peteb said:
You can see the NICs behaving badly in the screenshot below

how so? I see nothing wrong. The links come up and the bridge then too? The messages seem pretty normal and will also be there in older logs of working boots, if you check. For example those are mine from my last boot, all working fine:

Bash:

[   23.152695] vmbr0: port 1(eno1) entered blocking state
[   23.154090] vmbr0: port 1(eno1) entered disabled state
[   28.308208] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[   28.309966] vmbr0: port 1(eno1) entered blocking state
[   28.310809] vmbr0: port 1(eno1) entered forwarding state
[   28.312123] IPv6: ADDRCONF(NETDEV_CHANGE): vmbr0: link becomes ready

peteb said:
below and also the ipcc failures

the latter seems actually relevant, they come normally from the fact that the essential pve-cluster (pmxcfs) service (which is contrary what the name suggest also relevant also for non-clusters) failed to start.

As a starter: check

Bash:

systemctl list-units --failed
# check below for actual errors:
journalctl -b

peteb · Feb 24, 2021

Thanks Ramalama and Thomas for responding.

Normally I would spend a lot more time on this getting it sorted out but since I have the kids at home on lockdown and the internet is needed for school, I had to wipe the update and reinstall PVE from the iso. So right now I am going to wait a long time before upgrading any PVE packages.

t.lamprecht · Feb 24, 2021

Ramalama said:
Thats the systemd that comes with debian/proxmox, not an updated one from buster-backports or something....
So the question should be then, why are new features included in systemd that aren't included in Kernel 5.4 or 4.19.
That means, that we get halfbacked not checked updates that could cause issues. (systemd version is 241)

The questions are rather, where does your 'hidepid' comes from, that is a mount option, and it's surely not setup by Proxmox VE installer. Our and Debian packages are all well tested and made for each other, please don't throw out such accusations just because you made some configuration changes, or used a non official way to install which brings along those changes, that do not work out. Systemd is meant to be able to boot from different kernel versions in general anyway.

Proxmox VE defaults to cgroupv1, cgroupv2 is not yet deemed to be fully supported (albeit it should be almost there) and needs to be activated manually. It's not even clear to me in what combinations those are bugs, you just stated those terms and referenced some newer external software versions not yet packaged by Proxmox VE...

Ramalama said:
I have tryed already the pve-edge kernel from that github repo, and well, it booted, but almost nothing worked, so i keep my fingers away from that kernel xD

I'd be a bit wary to touch those external repos, especially when near anything production like...
Actually you could just try out the master branch of the pve-kernel git repository, it uses already the ubuntu-hirsute kernel as bases which is basing of 5.10: https://git.proxmox.com/?p=pve-kernel.git;a=summary
It's rather POC, but It should build on PVE 6.

Ramalama said:
# checkservices.sh script

Please avoid systemctl restart UNIT as restarts are often a rather "big hammer" and may interrupt various things, rather use systemctl try-reload-or-restart UNIT which favours reload if available for a unit, which is normally not as invasive, but can fallback to restart. Albeit I'm not sure why that's there at all, if you have frequent service failure it seems like an issue specific with your systems environment which you may want to investigate separately, and if that cannot be fixed then at least solve this by adding a systemd unit overwrite which adds a restart policy, which is much nicer integrated.

Ramalama said:
Before i ask, what the hell i have enabled, what do i need to disable? xD

As said, that stuff is not something active when installing from Proxmox VE, so either the installation happened over another medium or those things were changed... I'd start to check /etc/fstab for the hidepid stuff, see if it is included in the mount options there and what value it has (0, 1 and 2 should work out). For the rest maybe best would be to see how this was installed, the current pveversion -v and check all places you tinkered around.

Ramalama · Feb 24, 2021

t.lamprecht said:
Please open a new thread, besides some shared HW this seems to be completely unrelated.

the latter seems actually relevant, they come normally from the fact that the essential pve-cluster (pmxcfs) service (which is contrary what the name suggest also relevant also for non-clusters) failed to start.

As a starter: check

Bash:

systemctl list-units --failed # check below for actual errors: journalctl -b

He runs into same rdrand issues as me, many services are failing to start and cause a lot of errors.

@peteb
do this:

edit /etc/default/grub
edit this line that it looks identical:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt textonly video=astdrmfb video=efifb:off SYSTEMD_RDRAND=0"

this content:
amd_iommu=on iommu=pt video=astdrmfb video=efifb:off
you need only if you do gpu passthrough or sr-iov, if you don't, remove that. But you can leave it and it will be fine either.

when you edited /etc/default/grub

you update grub & initramfs:
update-initramfs -u
update-grub

then you reboot and everything should be working fine again

Ramalama · Feb 24, 2021

t.lamprecht said:
The questions are rather, where does your 'hidepid' comes from, that is a mount option, and it's surely not setup by Proxmox VE installer. Our and Debian packages are all well tested and made for each other, please don't throw out such accusations just because you made some configuration changes, or used a non official way to install which brings along those changes, that do not work out. Systemd is meant to be able to boot from different kernel versions in general anyway.

Proxmox VE defaults to cgroupv1, cgroupv2 is not yet deemed to be fully supported (albeit it should be almost there) and needs to be activated manually. It's not even clear to me in what combinations those are bugs, you just stated those terms and referenced some newer external software versions not yet packaged by Proxmox VE...

i searched in the whole system for 'hidepid', i can't find it in any service file....
the only file it appears is:
/usr/src/linux-headers-5.4.98-1-pve/include/linux/pid_namespace.h: HIDEPID_OFF = 0,

I think its something hardcoded in systemd itself. However i wrote above all the modifications, so i really don't know where this comes from, seems like it doesn't appear for everyone.

About cgroupv1/v2, which never packages? the 5.10.2 intel nic driver? Thats only one module thats located here:
/lib/modules/5.4.98-1-pve/updates/drivers/net/ethernet/intel/ixgbe/ixgbe.ko

Im starting to suspect, that 'hidepid' and cgroupv2 comes probably from LXC containers? Is that possible?
Cause i have one LXC container with Debian 10 (+many buster-backports packages) and multiple containers with ubuntu 20.04.

t.lamprecht said:
I'd be a bit wary to touch those external repos, especially when near anything production like...
Actually you could just try out the master branch of the pve-kernel git repository, it uses already the ubuntu-hirsute kernel as bases which is basing of 5.10: https://git.proxmox.com/?p=pve-kernel.git;a=summary
It's rather POC, but It should build on PVE 6.

Thanks! I will try it out

t.lamprecht said:
Please avoid systemctl restart UNIT as restarts are often a rather "big hammer" and may interrupt various things, rather use systemctl try-reload-or-restart UNIT which favours reload if available for a unit, which is normally not as invasive, but can fallback to restart. Albeit I'm not sure why that's there at all, if you have frequent service failure it seems like an issue specific with your systems environment which you may want to investigate separately, and if that cannot be fixed then at least solve this by adding a systemd unit overwrite which adds a restart policy, which is much nicer integrated.

Sorry, that checkservices script, was from the times where i didn't knowed about the RDRAND issues...
That services failed to load at boot time, so i made that cheap script to restart that services again.
You must see it this way, the rdrand issue is hardcore, i needed to check everytime i have booted up "journalctl -b" for which service failed... Some services/targets you can't fix, so i needed to boot again and try my luck... and some on some boots only 1-2 unimportant services failed, thats where the checkservices.sh script helped me xD
Now with the RDRAND cmdline parameter, all works and i don't need that checkservices script at all xD

t.lamprecht said:
As said, that stuff is not something active when installing from Proxmox VE, so either the installation happened over another medium or those things were changed... I'd start to check /etc/fstab for the hidepid stuff, see if it is included in the mount options there and what value it has (0, 1 and 2 should work out). For the rest maybe best would be to see how this was installed, the current pveversion -v and check all places you tinkered around.

grep -Ri hidepid /etc
Says that im not using it.

Code:

/etc/fstab
# <file system> <mount point> <type> <options> <dump> <pass>
/dev/pve/root / ext4 errors=remount-ro 0 1
UUID=71FB-26B5 /boot/efi vfat defaults 0 1
/dev/pve/swap none swap sw 0 0
proc /proc proc defaults 0 0

I wrote above already, that im suspecting, that this comes from LXC containers, but anyway here the versions xD

Code:

pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.98-1-pve)
pve-manager: 6.3-4 (running version: 6.3-4/0a38c56f)
pve-kernel-5.4: 6.3-5
pve-kernel-helper: 6.3-5
pve-kernel-5.4.98-1-pve: 5.4.98-1
pve-kernel-5.4.86-1-pve: 5.4.86-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-4
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.8-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-5
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.2.0-1
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-5
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.3-pve1

Cheers & Thanks @t.lamprecht

t.lamprecht · Feb 24, 2021

See our documentation about how to actually edit the command line, the user runs ZFS which is not using GRUB on recent Proxmox VE if booted with UEFI (which is likely for such a new system):

https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysboot_edit_kernel_cmdline

Ramalama · Feb 24, 2021

t.lamprecht said:
See our documentation about how to actually edit the command line, the user runs ZFS which is not using GRUB on recent Proxmox VE if booted with UEFI (which is likely for such a new system):

https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysboot_edit_kernel_cmdline

Well, this board is different...
Asrock Rack sets it as default into uefi mode, with secureboot enable, but without loaded certificates/keys/db...
So in the end its same as uefi with secureboot disabled....

However, i have installed proxmox (pve 6.3) default with the iso, in uefi mode (through ipmi/bmc). And debian/proxmox installed in uefi mode, but with the grub bootloader.

Hope that answers, why im using grub, but thanks for your tip! i know now how to make it secureboot workable now xD

t.lamprecht · Feb 24, 2021

That was for the other poster, who mentioned ZFS.
Proxmox VE currently uses GRUB for legacy BIOS boot and all UEFI boots besides for ZFS as root-pool.

Ramalama · Feb 24, 2021

Ah okay, but doesn't matter that link helped me to understand how i can make proxmox secure boot'able afterwards
Thanks anyway!

Ramalama · Mar 12, 2021

Small update:
https://github.com/fabianishere/pve-edge-kernel/issues/59

I compiled a 5.11.5 kernel, optimized for zen3 now. If anyone wants to use, feel free.
Please don't use it on intel hardware or anything else as zen3.

About all the bugs that I mentioned here (rdrand / chgroups / edac / etc...)
- it's all fixed, i have a not a single error anymore, it's just perfect.
Even the lowest nvme sleepstate (980 pro), seems to be fixed, need a bit time to confirm, but knock on wood. Still doubt a bit that this kernel is a solution to everything, but so far all my issues seems to be fixed.

Thanks everyone for the help!

peteb · Mar 12, 2021

@Ramalama Great work!
I'll give your new kernel a try when I get a chance on my AMD Ryzen Zen 3 setup.
Which version of QEMU are you running it with? (pve-qemu-kvm)
Did you include support in the kernel for ZFS 2.0.3 as well?

To the Proxmox DEVS - it would be really good if you could include what @Ramalama did with his kernel compile - noting that AMD Ryzen 5000 series Zen 3 is a great small server CPU as it supports ECC and has great performance and low power consumption - making the platform a good candidate for Proxmox VE! Asrock Rack make specific server motherboards for these CPU's too, with IPMI (BMC) and multi 10Gb LAN ports.

t.lamprecht · Mar 12, 2021

We already have a 5.11 based build since a bit, as I suggested to @Ramalama to try weeks ago in another thread:
https://git.proxmox.com/?p=pve-kernel.git;a=commitdiff;h=bf23bcb74eed1a35381124272086d5c4943e6be6

t.lamprecht · Mar 12, 2021

Also, if anyone wants to actually advance the project with patches/fixes/... -> https://pve.proxmox.com/wiki/Developer_Documentation

Ramalama · Mar 12, 2021

peteb said:
@Ramalama Great work!
I'll give your new kernel a try when I get a chance on my AMD Ryzen Zen 3 setup.
Which version of QEMU are you running it with? (pve-qemu-kvm)
Did you include support in the kernel for ZFS 2.0.3 as well?

To the Proxmox DEVS - it would be really good if you could include what @Ramalama did with his kernel compile - noting that AMD Ryzen 5000 series Zen 3 is a great small server CPU as it supports ECC and has great performance and low power consumption - making the platform a good candidate for Proxmox VE! Asrock Rack make specific server motherboards for these CPU's too, with IPMI (BMC) and multi 10Gb LAN ports.

Yes, zfs is the most important thing.

Basically the kernel is the same as the proxmox default.
It's not exactly my kernel, just fabians work, bit slimmed and compiled with gcc-11 instead of 10 +zen3 flags.

The only bigger difference is the governor.
While proxmox uses the performance governor for a very good reason, 5.11 uses the schedutil governor.

It basically clocks up and down, while performance stays on max speed. As far i understood schedutil is a faster version of ondemand governor.
However, im running with it and giving it a try. Usually im a fan of the performance governor either. But let's see.

---
Exactly @t.lamprecht
But i told you either that i tryed the kernels before and tryed to compile and use etc and got more problems.
But maybe i was stupid before, I don't know.
But in the end you have right, and im finally happy xD

About the install of the new kernel.
1. Remove rdrand cmdline parameter if anyone uses it. Because with the new kernel it breaks systemd.
This is totally weird, with the broken rdrand on 5.4, i need this to boot...
With fixed rdrand on 5.11, i need to remove it to boot. But this is good in any way. So delete SYSTEMD_RDRAND=0 if you used it.

2. Apparmor, you need to make the change! The one that fabian mention in the readme. Cause without lxc will fail.
However, all lxc containers works perfectly fine even docker inside lxc.

3.

pve-efiboot-tool kernel add 5.11.5-1
pve-efiboot-tool refresh

Is needed.

4. After you booted into new kernel, rebuild initrd again:
update-initramfs -u -k all
And reboot again.
I don't know if it's needed, but i needed it at least, my first boot into 5.11 was a mess xD

That's basically it.

@t.lamprecht
I can try to contribute a bit, but my programming knowledge is for the butt xD
What i could help with, is providing some compile rules, to get rid of unnecessary stuff in the kernel itself... Like isdn/wifi/bt and whatever else that no one needs.

I mean i could try to provide patches (backports for rdrand & edac). But edac i tryed once to backport and fully failed. And with rdrand, I don't even know where to start searching. It was a surprise for me that this works. Before i thinked that i need agesa 1.2.0.0 xD

Cheers

Ramalama · Mar 12, 2021

Ah the only issue, i walked into till now is opnsense (freebsd).
It won't boot anymore with cpu = host.
Had to set it to anything else, doesn't matter what, default or epyc works perfectly fine.

But this is highly not a 5.11 issue and not proxmox either, freebsd is weird anyway and i runned into multiple bugs with opnsense already. Even unbound on opnsense crashes every second/third day and this has absolutely nothing with the kernel or proxmox todo. Or opnsense shows that it uses 0-3% cpu usage, while it really uses 50% cpu with virtual functions. And omg, the list is long. Somehow im super dissapointed, it was once really good, but somehow freebsd can't keep up.

I playing with vyos and it works like 500 times more stable and is a lot faster in every network performance related workload. But sadly doesn't has a nice ui and to manage dhcp leases or dns aliases over the cli, is frustrating. And omg to manage the firewall over the cli isn't simple either with vyos. That's where pf/opnsense excels. However, the basic difference is linux vs freebsd here. And i get it that pf is a nice packetfilter and bsd a nice license... But that doesn't provides any benefit if it consumes in the end like 5000% more cpu and has much slower network performance, because of bugs... Aside from crashing unbound that doesn't restarts itself...

However, sorry for the offtopic, but wanted to mention, that "host" on freebsd doesn't work with my compiled kernel. While everything else with "host" runs perfectly fine!

About any performance increases/decreases, i can test and compare, but so far i see no difference.
Everything was fast as hell with 5.4, everything is fast as hell with 5.11 either.

Cheers

[SOLVED] [A lot of Errors] Systemd + Kernel on newer Hardware

Renowned Member

Proxmox Staff Member

Renowned Member

Active Member

Renowned Member

Proxmox Staff Member

Active Member

Proxmox Staff Member

Renowned Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member

Renowned Member

Active Member

Proxmox Staff Member

Proxmox Staff Member

Renowned Member

Renowned Member

We value your privacy