Cluster reboot

devis

Member
Mar 2, 2023
31
2
8
There is a cluster containing 24 servers
A DDOS occurred on one of the nodes in the cluster, on port 80 of one of the machines, at the time of the DDOS attack the machine was turned off, waited for it to stop completely and transferred to another node, but on the original node a high LA of over 200 continued to be observed, until the switch stopped blocked the IP of this machine (external switch), after blocking the IP on the switch, the entire cluster rebooted for unknown reasons, please tell me how we can understand what caused the reboot of the entire cluster, if, logically, the node that was under a DDOS attack should have rebooted.

Bash:
Cluster information
-------------------
Name:             Cluster-1
Config Version:   32
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Wed Jan  3 12:05:11 2024
Quorum provider:  corosync_votequorum
Nodes:            24
Node ID:          0x00000004
Ring ID:          1.e20
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   24
Highest expected: 24
Total votes:      24
Quorum:           13
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.0.12.73
0x00000002          1 10.0.12.72
0x00000003          1 10.0.12.75
0x00000004          1 10.0.12.74 (local)
0x00000005          1 10.0.12.35
0x00000006          1 10.0.12.36
0x00000007          1 10.0.12.37
0x00000008          1 10.0.12.38
0x00000009          1 10.0.12.44
0x0000000a          1 10.0.12.45
0x0000000b          1 10.0.12.46
0x0000000c          1 10.0.12.47
0x0000000d          1 10.0.12.49
0x0000000e          1 10.0.12.48
0x0000000f          1 10.0.12.50
0x00000010          1 10.0.12.51
0x00000011          1 10.0.12.60
0x00000012          1 10.0.12.61
0x00000013          1 10.0.12.62
0x00000014          1 10.0.12.63
0x00000015          1 10.0.12.68
0x00000016          1 10.0.12.69
0x00000017          1 10.0.12.70
0x00000018          1 10.0.12.71
 
Hello please post:

  • /etc/network/interfaces
  • /etc/pve/corosync.conf
  • and the log: journalctl -u corosync
 
Hello from all nodes?
Corosync is a global config, the rest can be limited to the node that rebooted unexpectly. (24 nodes is a little bit too much for a first look)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!