Recent content by radekcz

  1. R

    Long term reboots of CEPH cluster

    Sadly, the `pveupgrade` command did not help. the node rebootet itslef again, but there was a corelation with high iops of system discs (durning the upgrade theis utilization in iostat was still at 100%. Can that be a reason why the node reboots itself?
  2. R

    Long term reboots of CEPH cluster

    sure, here is the graph. We can see around 04:00 AM a peak when was the reboot: in the syslog was nothing more then info about a reboot Mar 7 04:06:40 pve1a systemd[1]: Starting Daily PVE download activities... Mar 7 04:06:40 pve1a kernel: [691500.915949] kvm [7128]: vcpu4, guest rIP...
  3. R

    Long term reboots of CEPH cluster

    If it may be of some use, we have observerd that when we do an apt-update/upgrade on whatever node, the node reboots suddenly itself also. So it works as follows: apt-update apt-upgrade reboot apt-get update --fix-missing apt upgrade upgrade takes horribly long to be done (2h)
  4. R

    Long term reboots of CEPH cluster

    1. HA state Yes, the HA manager is set up and running #ha-manager status quorum OK master pve2a (active, Mon Mar 7 17:24:18 2022) lrm pve1a (active, Mon Mar 7 17:24:14 2022) lrm pve2a (active, Mon Mar 7 17:24:21 2022) lrm pve3a (active, Mon Mar 7 17:24:20 2022) service vm:100 (pve3...
  5. R

    Long term reboots of CEPH cluster

    We are solving a long term issue with a 3 node proxmox CEPH cluster. From time to time one of the nodes in the cluster just accidentally reboots. The physical setup is as seen here: Lets focus at pve1 configuration which represents the same config at the other nodes. Here is the pveversion...