[SOLVED] Killed my Cluster (Splitbrain)

Mischmosch

New Member
Jul 5, 2024
11
1
3
Hello everyone,

I blew up my cluster during my cleanup work.
In the German sector, I picked up the term "Spritbrain".

The cluster now consists of two servers. Each server is accessible, but the other server is shown as offline.
Bildschirmfoto 2024-07-10 um 14.29.22.png
Bildschirmfoto 2024-07-10 um 14.29.48.png

Code:
root@vault1:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Sun 2024-07-07 23:00:13 CEST; 13h ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 1134 (corosync)
      Tasks: 9 (limit: 18930)
     Memory: 130.2M
        CPU: 3min 39.905s
     CGroup: /system.slice/corosync.service
             └─1134 /usr/sbin/corosync -f

Jul 07 23:00:13 vault1 corosync[1134]:   [QB    ] server name: quorum
Jul 07 23:00:13 vault1 corosync[1134]:   [TOTEM ] Configuring link 0
Jul 07 23:00:13 vault1 corosync[1134]:   [TOTEM ] Configured link number 0: local addr: 192.168.1.200, port=5405
Jul 07 23:00:13 vault1 corosync[1134]:   [KNET  ] link: Resetting MTU for link 0 because host 1 joined
Jul 07 23:00:13 vault1 corosync[1134]:   [QUORUM] Sync members[1]: 1
Jul 07 23:00:13 vault1 corosync[1134]:   [QUORUM] Sync joined[1]: 1
Jul 07 23:00:13 vault1 corosync[1134]:   [TOTEM ] A new membership (1.a) was formed. Members joined: 1
Jul 07 23:00:13 vault1 corosync[1134]:   [QUORUM] Members[1]: 1
Jul 07 23:00:13 vault1 corosync[1134]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jul 07 23:00:13 vault1 systemd[1]: Started corosync.service - Corosync Cluster Engine.

Code:
root@vault2:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; preset: enabled)
     Active: active (running) since Sun 2024-07-07 23:01:16 CEST; 13h ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 964 (corosync)
      Tasks: 9 (limit: 18730)
     Memory: 132.6M
        CPU: 4min 39.472s
     CGroup: /system.slice/corosync.service
             └─964 /usr/sbin/corosync -f

Jul 07 23:01:16 vault2 corosync[964]:   [KNET  ] host: host: 3 has no active links
Jul 07 23:01:16 vault2 corosync[964]:   [KNET  ] link: Resetting MTU for link 0 because host 2 joined
Jul 07 23:01:16 vault2 corosync[964]:   [QUORUM] Sync members[1]: 2
Jul 07 23:01:16 vault2 corosync[964]:   [QUORUM] Sync joined[1]: 2
Jul 07 23:01:16 vault2 corosync[964]:   [TOTEM ] A new membership (2.9e) was formed. Members joined: 2
Jul 07 23:01:16 vault2 corosync[964]:   [QUORUM] Members[1]: 2
Jul 07 23:01:16 vault2 corosync[964]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jul 07 23:01:16 vault2 systemd[1]: Started corosync.service - Corosync Cluster Engine.
Jul 07 23:03:44 vault2 corosync[964]:   [QUORUM] This node is within the primary component and will provide service.
Jul 07 23:03:44 vault2 corosync[964]:   [QUORUM] Members[1]: 2

Can anyone help me?

Thanks and best regards
 

Attachments

  • Bildschirmfoto 2024-07-10 um 14.29.22.png
    Bildschirmfoto 2024-07-10 um 14.29.22.png
    76.7 KB · Views: 4
make sure that the corosync links are configured correctly and that the nodes can talk to eachother over that network..
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!