I tried to give all the logs without expose some of our internal (info ip\mac), this is just occurs again today, ill try to investigate it further .
what i don't understand why a network issue (if it is the case )trigger a server reboot, and moreover reboot of most of the cluster
We dont have anything special, or DC is online without issues and was up when the error occurred. and it was running for around a month (i upgraded v6.4->v7 and rebooted each host)
we faced something like this few month ago due to power failure, it took few days to the system to stabilize (had...
I have checked the switch for errors there was none (all nodes connected to the same switch)
The only thing i was able to think about is that there was some load\freezee on the switch( but there was noting special logged ), i am waiting for some hardware in order to migrate the corosync ring...
For some reason most of the cluster is crashed (servers rebooted) it became stable after the reboot but there was a small downtime .
i tried to find the reason in the loges but i could not understand what caused it
here are the logs of the cluster from one of the nodes that was not rebooted (on...
I am in a process to automate offline backup (will be based on weekly drive change connected via usb3 to one of the proxmox server)
my goal is to make offline backup to be stored on secure location offsite (most of our IP is stored in the same server room )
I have already set up already have...
We upgraded the proxmox to 7. with latest ceph. and almost everything back to normal..
Currently our main network is the same network with the corosync and consists of 11 nodes, (4 of them with ceph)
We plan to double the server count in the near future and i thinking of moving the corosync...
i found the problem that cased the random reboots
chrony service was not installed on any of the upgraded proxmox servers, i had to manually install it on all of them.
since then there is no random crash, but there image from the post above still bothers me, some servers are out of sync with...
added nesting 1. rebooted and still not working:
INFO[2021-10-13T13:16:34.123990881Z] Starting up
DEBU[2021-10-13T13:16:34.124551737Z] Listener created for HTTP on unix (/var/run/docker.sock)
DEBU[2021-10-13T13:16:34.125297020Z] Golang's threads...
i upgraded entire cluster from proxmox 6.4 -> 7 with Ceph Nautilus->Octopus->Pacific
the problem: we few centos 7 lxc that we cannot upgrade to 8.
i used the workaround from one of the (post in this forum) and added
systemd.unified_cgroup_hierarchy=0 to the grub.
i applied it to two hosts in...
Well, is 'aufs' loaded on the host kernel (lsmod)? Have you tried with overlay2?
lxc host is ubuntu 18.04
after long upgrade of proxmox and ceph
this is the ouput of dockerd -D:
DEBU[2021-10-12T12:59:20.229834269Z] [graphdriver] priority list: [btrfs zfs overlay2 aufs overlay devicemapper vfs]
ERRO[2021-10-12T12:59:20.230967397Z] AUFS was not found in /proc/filesystems storage-driver=aufs...
As part of upgrading to proxmox 7 i done the folling steps:
upgraded all nodes to latest 6.4-13
upgrading cepth to Octopus , based on (https://pve.proxmox.com/wiki/Ceph_Nautilus_to_Octopus)
after running ( ceph osd pool set POOLNAME pg_autoscale_mode on) on a pool it...