Greetings,
I'm having this same issue on two Dell Optiplex 390 DT (two identical intel sandy bridge systems except for ram and hard drive)
The main node in the cluster a Dell Optiplex 7010 MT, doesn't have this issue.
It happened to me twice on each of the 2 affected nodes after updating to pve 6.
From the gaps in the logging data, it happens after around give or take 12 hours, this however might just be coincidence.
The nodes will be compeletely unresponsive, no webgui, no ssh and if i plug in a monitor it will go into standby, and a keyboard or mouse doesnt seem to initialise (capslock doesnt toggle off and on) when plugged in after the freeze occurs, In this state there is no response to the power button either
They boot back up like nothing happened after holding the power button to force them off.
and also for me, there will be @^ at the time of the crash in /var/log/syslog if i check it afterwards
pveversion -v is identical on all 3 nodes
proxmox-ve: 6.0-2 (running kernel: 5.0.21-2-pve)
pve-manager: 6.0-7 (running version: 6.0-7/28984024)
pve-kernel-5.0: 6.0-8
pve-kernel-helper: 6.0-8
pve-kernel-4.15: 5.4-9
pve-kernel-5.0.21-2-pve: 5.0.21-6
pve-kernel-4.15.18-21-pve: 4.15.18-48
pve-kernel-4.13.13-2-pve: 4.13.13-33
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.12-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-5
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-9
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-65
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-7
pve-cluster: 6.0-7
pve-container: 3.0-7
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.0-5
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-9
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve2
some observations:
I previously discussed this problem with some people, and they recommended me to try turning off all power management options i could in the bios, which i did for one node. said node has now been running for 24 hours without a problem.
i noticed that both the ASRock B450M Pro4 that Mathias and
MrSoupman have and my Dell optiplex 390 DT have realtek RTL8111 on board network chipset, while my main node Dell Optiplex 7010 MT which doesnt have the issue has an intel chipset
Which ties in to the first observation, on my intel system disabling powerstates also seemed to make the system more stable, is this really a ryzen related problem?, (a lot of posts i can find on the ryzen c6 problem also are using boards with realtek chipsets)