Sadly, the `pveupgrade` command did not help. the node rebootet itslef again, but there was a corelation with high iops of system discs (durning the upgrade theis utilization in iostat was still at 100%. Can that be a reason why the node reboots itself?
sure, here is the graph. We can see around 04:00 AM a peak when was the reboot:
in the syslog was nothing more then info about a reboot
Mar 7 04:06:40 pve1a systemd: Starting Daily PVE download activities...
Mar 7 04:06:40 pve1a kernel: [691500.915949] kvm : vcpu4, guest rIP...
If it may be of some use, we have observerd that when we do an apt-update/upgrade on whatever node, the node reboots suddenly itself also. So it works as follows:
apt-get update --fix-missing
upgrade takes horribly long to be done (2h)
1. HA state
Yes, the HA manager is set up and running
master pve2a (active, Mon Mar 7 17:24:18 2022)
lrm pve1a (active, Mon Mar 7 17:24:14 2022)
lrm pve2a (active, Mon Mar 7 17:24:21 2022)
lrm pve3a (active, Mon Mar 7 17:24:20 2022)
service vm:100 (pve3...
We are solving a long term issue with a 3 node proxmox CEPH cluster. From time to time one of the nodes in the cluster just accidentally reboots.
The physical setup is as seen here:
Lets focus at pve1 configuration which represents the same config at the other nodes.
Here is the pveversion...