Cluster / Corosync nicht mehr zum laufen zu bekommen

TUBEOF

Member
Jan 18, 2022
5
0
6
24
Moin zusammen,

heute Nacht habe ich in mein 2er Host Cluster geschaut und es war leider komplett rot, nichts ging mehr.
Die beiden Hosts laufen nun erstmal mittels pvecm expected 1 wieder, das Cluster bekomme ich aber einfach nicht mehr ans laufen.
Ich weiß, dass min. 3 Hosts empfohlen sind, aber derzeit sind es erstmal zwei, ein dritter kommt aber zeitnah.

Die erste Ursache zwecks Verbindung natürlich direkt ausgeschlossen, beide Hosts können sich gegenseitig erreichen, auch der SSH Zugang jeweils klappt problemlos.

Ich hatte pve-cluster sowie corosync einmal neugestartet, jedoch ohne Erfolg.
Und das hier ebenfalls kein Erfolg:
Bash:
systemctl stop pve-cluster
systemctl stop corosync
sleep 30
pmxcfs -l
sleep 30
killall -9 pmxcfs
systemctl start pve-cluster

In meiner /etc/pve/.members finde ich diese Inhalte:

Host 1:
Code:
{
"nodename": "v-pve01",
"version": 3,
"cluster": { "name": "pve-nbg01", "version": 2, "nodes": 2, "quorate": 1 },
"nodelist": {
  "v-pve01": { "id": 1, "online": 1, "ip": "185.XXX.XXX.235"},
  "v-pve02": { "id": 2, "online": 0}
  }
}

Host 2:
Code:
{
"nodename": "v-pve02",
"version": 3,
"cluster": { "name": "pve-nbg01", "version": 2, "nodes": 2, "quorate": 1 },
"nodelist": {
  "v-pve01": { "id": 1, "online": 0},
  "v-pve02": { "id": 2, "online": 1, "ip": "159.XXX.XXX.49"}
  }
}

Meine Frage hier wäre: Ist es normal, dass dort nur die IP des lokalen Hosts steht? Ich wäre der Meinung, dass dort die IPs von allen Hosts stehen müssen, oder nicht?

Meine corosync.conf sieht, würde ich behaupten in Ordnung ist und ist auch auf beiden Hosts gleich:

Code:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: v-pve01
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 185.XXX.XXX.235
  }
  node {
    name: v-pve02
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 159.XXX.XXX.49
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: pve-nbg01
  config_version: 2
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}


Was kann ich tun, um das Cluster wird ans laufen zu bekommen und wieder grün wird. Ich bin leider auch nach sehr, sehr lange Recherche nicht auf die zielführende Lösung gestoßen.
 
Last edited:
Ach genau, was mich noch verwundert hatte war pvecm status
Dort steht in der Liste auch nur der jeweilige lokale Node, wie hier in dem Beispiel:

Code:
Cluster information
-------------------
Name:             pve-nbg01
Config Version:   2
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Tue Dec 30 18:02:11 2025
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          1.60a
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   1
Highest expected: 1
Total votes:      1
Quorum:           1 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 185.XXX.XXX.235 (local)

Ist das normal, dass es wegen dem Cluster-Zustand aktuell nur der lokale Node angezeigt wird?
 
Das da nur ein Node steht liegt daran, dass Du das Cluster auf expected 1 gestellt hast :). Membership information ist nur einer, da Corosync noch keine weiteren Memberships geformt hat. Daher steht da auch nur einer.

Bitte gib mal folgende Infos (jeweils in [CODE]-Blöcken):
  • journalctl -u corosync -n 50 für beide Nodes separat
In der /etc/pve/.members sollten für jede Node in der nodelist alle Informationen stehen (ID, Online-Status und IP).
 
Last edited:
  • Like
Reactions: UdoB
Das wäre einmal von Host 1:

Code:
Dec 30 19:28:46 v-pve01 systemd[1]: Starting corosync.service - Corosync Cluster Engine...
Dec 30 19:28:46 v-pve01 (corosync)[1343407]: corosync.service: Referenced but unset environment variable evaluates to an empty string: COROSYNC_OPTIONS
Dec 30 19:28:46 v-pve01 corosync[1343407]:   [MAIN  ] Corosync Cluster Engine  starting up
Dec 30 19:28:46 v-pve01 corosync[1343407]:   [MAIN  ] Corosync built-in features: dbus monitoring watchdog augeas systemd xmlconf vqsim nozzle snmp pie relro bindnow
Dec 30 19:28:46 v-pve01 corosync[1343407]:   [TOTEM ] Initializing transport (Kronosnet).
Dec 30 19:28:46 v-pve01 corosync[1343407]:   [TOTEM ] totemknet initialized
Dec 30 19:28:46 v-pve01 corosync[1343407]:   [KNET  ] pmtud: MTU manually set to: 0
Dec 30 19:28:46 v-pve01 corosync[1343407]:   [KNET  ] common: crypto_nss.so has been loaded from /usr/lib/x86_64-linux-gnu/kronosnet/crypto_nss.so
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [SERV  ] Service engine loaded: corosync configuration map access [0]
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [QB    ] server name: cmap
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [SERV  ] Service engine loaded: corosync configuration service [1]
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [QB    ] server name: cfg
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [QB    ] server name: cpg
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [SERV  ] Service engine loaded: corosync profile loading service [4]
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [SERV  ] Service engine loaded: corosync resource monitoring service [6]
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [WD    ] Watchdog not enabled by configuration
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [WD    ] resource load_15min missing a recovery key.
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [WD    ] resource memory_used missing a recovery key.
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [WD    ] no resources configured.
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [SERV  ] Service engine loaded: corosync watchdog service [7]
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [QUORUM] Using quorum provider corosync_votequorum
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [QB    ] server name: votequorum
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [QB    ] server name: quorum
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [TOTEM ] Configuring link 0
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [TOTEM ] Configured link number 0: local addr: 185.XXX.XXX.235, port=5405
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [KNET  ] link: Resetting MTU for link 0 because host 1 joined
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 0)
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [KNET  ] host: host: 2 has no active links
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [KNET  ] host: host: 2 has no active links
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [KNET  ] host: host: 2 has no active links
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [QUORUM] Sync members[1]: 1
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [QUORUM] Sync joined[1]: 1
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [TOTEM ] A new membership (1.60f) was formed. Members joined: 1
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [QUORUM] Members[1]: 1
Dec 30 19:28:47 v-pve01 corosync[1343407]:   [MAIN  ] Completed service synchronization, ready to provide service.
Dec 30 19:28:47 v-pve01 systemd[1]: Started corosync.service - Corosync Cluster Engine.

Und einmal Host 2:

Code:
Dec 30 19:28:50 v-pve02 systemd[1]: Starting corosync.service - Corosync Cluster Engine...
Dec 30 19:28:50 v-pve02 (corosync)[413702]: corosync.service: Referenced but unset environment variable evaluates to an empty string: COROSYNC_OPTIONS
Dec 30 19:28:50 v-pve02 corosync[413702]:   [MAIN  ] Corosync Cluster Engine  starting up
Dec 30 19:28:50 v-pve02 corosync[413702]:   [MAIN  ] Corosync built-in features: dbus monitoring watchdog augeas systemd xmlconf vqsim nozzle snmp pie relro bindnow
Dec 30 19:28:50 v-pve02 corosync[413702]:   [TOTEM ] Initializing transport (Kronosnet).
Dec 30 19:28:50 v-pve02 corosync[413702]:   [TOTEM ] totemknet initialized
Dec 30 19:28:50 v-pve02 corosync[413702]:   [KNET  ] pmtud: MTU manually set to: 0
Dec 30 19:28:50 v-pve02 corosync[413702]:   [KNET  ] common: crypto_nss.so has been loaded from /usr/lib/x86_64-linux-gnu/kronosnet/crypto_nss.so
Dec 30 19:28:51 v-pve02 corosync[413702]:   [SERV  ] Service engine loaded: corosync configuration map access [0]
Dec 30 19:28:51 v-pve02 corosync[413702]:   [QB    ] server name: cmap
Dec 30 19:28:51 v-pve02 corosync[413702]:   [SERV  ] Service engine loaded: corosync configuration service [1]
Dec 30 19:28:51 v-pve02 corosync[413702]:   [QB    ] server name: cfg
Dec 30 19:28:51 v-pve02 corosync[413702]:   [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Dec 30 19:28:51 v-pve02 corosync[413702]:   [QB    ] server name: cpg
Dec 30 19:28:51 v-pve02 corosync[413702]:   [SERV  ] Service engine loaded: corosync profile loading service [4]
Dec 30 19:28:51 v-pve02 corosync[413702]:   [SERV  ] Service engine loaded: corosync resource monitoring service [6]
Dec 30 19:28:51 v-pve02 corosync[413702]:   [WD    ] Watchdog not enabled by configuration
Dec 30 19:28:51 v-pve02 corosync[413702]:   [WD    ] resource load_15min missing a recovery key.
Dec 30 19:28:51 v-pve02 corosync[413702]:   [WD    ] resource memory_used missing a recovery key.
Dec 30 19:28:51 v-pve02 corosync[413702]:   [WD    ] no resources configured.
Dec 30 19:28:51 v-pve02 corosync[413702]:   [SERV  ] Service engine loaded: corosync watchdog service [7]
Dec 30 19:28:51 v-pve02 corosync[413702]:   [QUORUM] Using quorum provider corosync_votequorum
Dec 30 19:28:51 v-pve02 corosync[413702]:   [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
Dec 30 19:28:51 v-pve02 corosync[413702]:   [QB    ] server name: votequorum
Dec 30 19:28:51 v-pve02 corosync[413702]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Dec 30 19:28:51 v-pve02 corosync[413702]:   [QB    ] server name: quorum
Dec 30 19:28:51 v-pve02 corosync[413702]:   [TOTEM ] Configuring link 0
Dec 30 19:28:51 v-pve02 corosync[413702]:   [TOTEM ] Configured link number 0: local addr: 159.XXX.XXX.49, port=5405
Dec 30 19:28:51 v-pve02 corosync[413702]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Dec 30 19:28:51 v-pve02 corosync[413702]:   [KNET  ] host: host: 1 has no active links
Dec 30 19:28:51 v-pve02 corosync[413702]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Dec 30 19:28:51 v-pve02 corosync[413702]:   [KNET  ] host: host: 1 has no active links
Dec 30 19:28:51 v-pve02 corosync[413702]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Dec 30 19:28:51 v-pve02 corosync[413702]:   [KNET  ] host: host: 1 has no active links
Dec 30 19:28:51 v-pve02 corosync[413702]:   [KNET  ] link: Resetting MTU for link 0 because host 2 joined
Dec 30 19:28:51 v-pve02 corosync[413702]:   [QUORUM] Sync members[1]: 2
Dec 30 19:28:51 v-pve02 corosync[413702]:   [QUORUM] Sync joined[1]: 2
Dec 30 19:28:51 v-pve02 corosync[413702]:   [TOTEM ] A new membership (2.615) was formed. Members joined: 2
Dec 30 19:28:51 v-pve02 corosync[413702]:   [QUORUM] Members[1]: 2
Dec 30 19:28:51 v-pve02 corosync[413702]:   [MAIN  ] Completed service synchronization, ready to provide service.
Dec 30 19:28:51 v-pve02 systemd[1]: Started corosync.service - Corosync Cluster Engine.

In der /etc/pve/.members sollten für jede Node in der nodelist alle Informationen stehen (ID, Online-Status und IP).
Ok, eventuell erklärt ja das schon das Problem? Mich wundert es, wie es überhaupt möglich ist, dass die IP auf einmal verschwunden ist.
Heißt: IP manuell dort einmal eintragen, jeweils auf beiden Hosts, korrekt? Irgendwelche Befehle vorher nötig damit der Host in Zustand X kommt?
Oder würde da schon das aktuelle pvecm expected 1 ausreichen?
 
Last edited: