[SOLVED] Killed my Cluster (Splitbrain)

Mischmosch

New Member
Jul 5, 2024
11
1
3
Hello everyone,

I blew up my cluster during my cleanup work.
In the German sector, I picked up the term "Spritbrain".

The cluster now consists of two servers. Each server is accessible, but the other server is shown as offline.
Bildschirmfoto 2024-07-10 um 14.29.22.png
Bildschirmfoto 2024-07-10 um 14.29.48.png

Code:
root@vault1:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Sun 2024-07-07 23:00:13 CEST; 13h ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 1134 (corosync)
      Tasks: 9 (limit: 18930)
     Memory: 130.2M
        CPU: 3min 39.905s
     CGroup: /system.slice/corosync.service
             └─1134 /usr/sbin/corosync -f

Jul 07 23:00:13 vault1 corosync[1134]:   [QB    ] server name: quorum
Jul 07 23:00:13 vault1 corosync[1134]:   [TOTEM ] Configuring link 0
Jul 07 23:00:13 vault1 corosync[1134]:   [TOTEM ] Configured link number 0: local addr: 192.168.1.200, port=5405
Jul 07 23:00:13 vault1 corosync[1134]:   [KNET  ] link: Resetting MTU for link 0 because host 1 joined
Jul 07 23:00:13 vault1 corosync[1134]:   [QUORUM] Sync members[1]: 1
Jul 07 23:00:13 vault1 corosync[1134]:   [QUORUM] Sync joined[1]: 1
Jul 07 23:00:13 vault1 corosync[1134]:   [TOTEM ] A new membership (1.a) was formed. Members joined: 1
Jul 07 23:00:13 vault1 corosync[1134]:   [QUORUM] Members[1]: 1
Jul 07 23:00:13 vault1 corosync[1134]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jul 07 23:00:13 vault1 systemd[1]: Started corosync.service - Corosync Cluster Engine.

Code:
root@vault2:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Sun 2024-07-07 23:01:16 CEST; 13h ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 964 (corosync)
      Tasks: 9 (limit: 18730)
     Memory: 132.6M
        CPU: 4min 39.472s
     CGroup: /system.slice/corosync.service
             └─964 /usr/sbin/corosync -f

Jul 07 23:01:16 vault2 corosync[964]:   [KNET  ] host: host: 3 has no active links
Jul 07 23:01:16 vault2 corosync[964]:   [KNET  ] link: Resetting MTU for link 0 because host 2 joined
Jul 07 23:01:16 vault2 corosync[964]:   [QUORUM] Sync members[1]: 2
Jul 07 23:01:16 vault2 corosync[964]:   [QUORUM] Sync joined[1]: 2
Jul 07 23:01:16 vault2 corosync[964]:   [TOTEM ] A new membership (2.9e) was formed. Members joined: 2
Jul 07 23:01:16 vault2 corosync[964]:   [QUORUM] Members[1]: 2
Jul 07 23:01:16 vault2 corosync[964]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jul 07 23:01:16 vault2 systemd[1]: Started corosync.service - Corosync Cluster Engine.
Jul 07 23:03:44 vault2 corosync[964]:   [QUORUM] This node is within the primary component and will provide service.
Jul 07 23:03:44 vault2 corosync[964]:   [QUORUM] Members[1]: 2

Can anyone help me?

Thanks and best regards
 

Attachments

  • Bildschirmfoto 2024-07-10 um 14.29.22.png
    Bildschirmfoto 2024-07-10 um 14.29.22.png
    76.7 KB · Views: 4
make sure that the corosync links are configured correctly and that the nodes can talk to eachother over that network..