Search results

  1. I

    Unexplained cluster crash after upgrade from 7. -> 7.1-8

    sure, ill post it, to be clear you need journalctl -u corosync -u pve-cluster --since "XXXXX" --until "YYYYYYY" > log_$(hostname) for all servers ? do i need to add anything else?
  2. I

    Unexplained cluster crash after upgrade from 7. -> 7.1-8

    I have tried to change the switch and cable and i could not find any improvement, around once an hour usually at any round hour and 50 minuts (01:50 02:50 .. etch) Dec 26 04:25:57 pve-blade-102 corosync[2238]: [KNET ] link: host: 1 link: 0 is down Dec 26 04:50:56 pve-blade-102...
  3. I

    Unexplained cluster crash after upgrade from 7. -> 7.1-8

    Another event occurred after debug enabled : Dec 23 15:50:00 pve-blade-102 corosync[2238]: [KNET ] link: host: 1 link: 0 is down
  4. I

    Unexplained cluster crash after upgrade from 7. -> 7.1-8

    done , ill post again once we have another error on host 1
  5. I

    Unexplained cluster crash after upgrade from 7. -> 7.1-8

    How i can enable debugging for logs? I think the problematic host is pve-srv-102, ill try to inspect the network cable and card on Sunday, and replace them,
  6. I

    Unexplained cluster crash after upgrade from 7. -> 7.1-8

    looks like it, host1 all the time. the logs you asked for yesterday : journalctl -u corosync -u pve-cluster --since yesterday >/mnt/pve/nfs_home/pve_logs/log_$(hostname) i see that host 1 have errors and sometimes it recovers and when it not the cluster crash initiated host1 had issues in...
  7. I

    Unexplained cluster crash after upgrade from 7. -> 7.1-8

    pve-srv2 (ceph) Dec 22 15:20:56 pve-srv2 pmxcfs[3940]: [status] notice: received log Dec 22 15:30:45 pve-srv2 pmxcfs[3940]: [status] notice: received log Dec 22 15:31:10 pve-srv2 pmxcfs[3940]: [status] notice: received log Dec 22 15:31:51 pve-srv2 pmxcfs[3940]: [status] notice: received log Dec...
  8. I

    Unexplained cluster crash after upgrade from 7. -> 7.1-8

    here are the logs for some servers (one of each type of hardware configuration) i can provide the logs of the rest but it looks the same pve-ws2: Dec 22 15:16:51 pve-ws2 pmxcfs[1146]: [status] notice: received log Dec 22 15:18:21 pve-ws2 pmxcfs[1146]: [status] notice: received log Dec 22...
  9. I

    Unexplained cluster crash after upgrade from 7. -> 7.1-8

    each server have 3 interfaces: 1GB used for corosync, ssh and basic network access. 10/40GB internal ceph 10/40GB ceph clients pve version from servers with ceph: proxmox-ve: 7.1-1 (running kernel: 5.13.19-2-pve) pve-manager: 7.1-8 (running version: 7.1-8/5b267f33) pve-kernel-helper: 7.1-6...
  10. I

    cluster crash with unknown reason. logs attached

    i have read it already,. i dont understand why the entire cluster goes reboot. for now ill try to disable HA . to try to isolate the issue but i dont think it is the case.
  11. I

    Unexplained cluster crash after upgrade from 7. -> 7.1-8

    Our system was stable in the last few months, but after upgrade 7 to 7.1-8 we have 3-4 random crash every day (we had issue two months ago with corosync stability but after replacing the switch the cluster worked very well under high load. without any issue) This Monday i took advantage of...
  12. I

    cluster crash with unknown reason. logs attached

    I tried to give all the logs without expose some of our internal (info ip\mac), this is just occurs again today, ill try to investigate it further . what i don't understand why a network issue (if it is the case )trigger a server reboot, and moreover reboot of most of the cluster
  13. I

    cluster crash with unknown reason. logs attached

    We dont have anything special, or DC is online without issues and was up when the error occurred. and it was running for around a month (i upgraded v6.4->v7 and rebooted each host) we faced something like this few month ago due to power failure, it took few days to the system to stabilize (had...
  14. I

    cluster crash with unknown reason. logs attached

    I have checked the switch for errors there was none (all nodes connected to the same switch) The only thing i was able to think about is that there was some load\freezee on the switch( but there was noting special logged ), i am waiting for some hardware in order to migrate the corosync ring...
  15. I

    cluster crash with unknown reason. logs attached

    For some reason most of the cluster is crashed (servers rebooted) it became stable after the reboot but there was a small downtime . i tried to find the reason in the loges but i could not understand what caused it here are the logs of the cluster from one of the nodes that was not rebooted (on...
  16. I

    backup entire cluster

    I am in a process to automate offline backup (will be based on weekly drive change connected via usb3 to one of the proxmox server) my goal is to make offline backup to be stored on secure location offsite (most of our IP is stored in the same server room ) I have already set up already have...
  17. I

    Adding second corosync ring best practice

    We upgraded the proxmox to 7. with latest ceph. and almost everything back to normal.. Currently our main network is the same network with the corosync and consists of 11 nodes, (4 of them with ceph) We plan to double the server count in the near future and i thinking of moving the corosync...
  18. I

    random proxmox crash after upgrade to proxmox 7

    i found the problem that cased the random reboots chrony service was not installed on any of the upgraded proxmox servers, i had to manually install it on all of them. since then there is no random crash, but there image from the post above still bothers me, some servers are out of sync with...
  19. I

    random proxmox crash after upgrade to proxmox 7

    i found this? look at first and last lrm can it be the reason?
  20. I

    lxc with docker have issues on proxmox 7 (aufs failed: driver not supported)

    added nesting 1. rebooted and still not working: dockerd -D INFO[2021-10-13T13:16:34.123990881Z] Starting up DEBU[2021-10-13T13:16:34.124551737Z] Listener created for HTTP on unix (/var/run/docker.sock) DEBU[2021-10-13T13:16:34.125297020Z] Golang's threads...