Cluster reboot

devis · Jan 3, 2024

There is a cluster containing 24 servers
A DDOS occurred on one of the nodes in the cluster, on port 80 of one of the machines, at the time of the DDOS attack the machine was turned off, waited for it to stop completely and transferred to another node, but on the original node a high LA of over 200 continued to be observed, until the switch stopped blocked the IP of this machine (external switch), after blocking the IP on the switch, the entire cluster rebooted for unknown reasons, please tell me how we can understand what caused the reboot of the entire cluster, if, logically, the node that was under a DDOS attack should have rebooted.

Bash:

Cluster information
-------------------
Name:             Cluster-1
Config Version:   32
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Wed Jan  3 12:05:11 2024
Quorum provider:  corosync_votequorum
Nodes:            24
Node ID:          0x00000004
Ring ID:          1.e20
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   24
Highest expected: 24
Total votes:      24
Quorum:           13
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.0.12.73
0x00000002          1 10.0.12.72
0x00000003          1 10.0.12.75
0x00000004          1 10.0.12.74 (local)
0x00000005          1 10.0.12.35
0x00000006          1 10.0.12.36
0x00000007          1 10.0.12.37
0x00000008          1 10.0.12.38
0x00000009          1 10.0.12.44
0x0000000a          1 10.0.12.45
0x0000000b          1 10.0.12.46
0x0000000c          1 10.0.12.47
0x0000000d          1 10.0.12.49
0x0000000e          1 10.0.12.48
0x0000000f          1 10.0.12.50
0x00000010          1 10.0.12.51
0x00000011          1 10.0.12.60
0x00000012          1 10.0.12.61
0x00000013          1 10.0.12.62
0x00000014          1 10.0.12.63
0x00000015          1 10.0.12.68
0x00000016          1 10.0.12.69
0x00000017          1 10.0.12.70
0x00000018          1 10.0.12.71

jsterr · Jan 11, 2024

Hello please post:

/etc/network/interfaces
/etc/pve/corosync.conf
and the log: journalctl -u corosync

devis · Jan 18, 2024

jsterr said:
Hello please post:

/etc/network/interfaces

/etc/pve/corosync.conf

and the log: journalctl -u corosync

Hello from all nodes?

jsterr · Jan 18, 2024

devis said:
Hello from all nodes?

Corosync is a global config, the rest can be limited to the node that rebooted unexpectly. (24 nodes is a little bit too much for a first look)

Cluster reboot

devis

Member

jsterr

Famous Member

devis

Member

jsterr

Famous Member

We value your privacy