Cluster und die Nodes sehen sich nicht mehr

josch12

Well-Known Member
Oct 20, 2017
37
1
48
36
Hallo,

gestern Abend ist aufgefallen das unser Cluster mit 7 Nodes krank ist.

wenn ich mich z.b auf Node 1 einlogge, sehe ich die Vms auf dem Node 1, die der anderen Nodes allerdings nicht.

proxmox.png


gehe ich per SSH auf einen Node, so kann ich die anderen pingen, bzw komme mittels SSH ohne Probleme drauf.

Ich sehe über die Gui auch die Auslastung der anderen Nodes

prox_auslastung.png


Der Status Vom Cluster ist ebenfalls normal, außer das ich keine Werte erhalte.

prox_cluster.png


Jemand eine Idee wo ich ansetzen könnte ?

am Cluster selbst wurde gestern nichts geändert, nur an einer VPN Verbindung vom RZ in unser Büro, hängt allerdings nicht damit zusammen ( sind schon einmal andere Netze )
 
steht im syslog irgendwas?

wenn der pvestatd irgendwie hängt können diese symptome auftreten, ist jetzt nur die frage warum der hängt
 
Hallo,

guter Ansatz ...

ich habe dort einige corosync Alerts gefunden

Aug 10 06:25:08 prox3 corosync[2040]: error [TOTEM ] Digest does not match
Aug 10 06:25:08 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:08 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data
Aug 10 06:25:08 prox3 corosync[2040]: [TOTEM ] Digest does not match
Aug 10 06:25:08 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:08 prox3 corosync[2040]: [TOTEM ] Invalid packet data
Aug 10 06:25:08 prox3 corosync[2040]: error [TOTEM ] Digest does not match
Aug 10 06:25:08 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:08 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data
Aug 10 06:25:08 prox3 corosync[2040]: [TOTEM ] Digest does not match
Aug 10 06:25:08 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:08 prox3 corosync[2040]: [TOTEM ] Invalid packet data
Aug 10 06:25:07 prox3 corosync[2040]: error [TOTEM ] Digest does not match
Aug 10 06:25:07 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:07 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data
Aug 10 06:25:07 prox3 corosync[2040]: [TOTEM ] Digest does not match
Aug 10 06:25:07 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:07 prox3 corosync[2040]: [TOTEM ] Invalid packet data
Aug 10 06:25:08 prox3 corosync[2040]: error [TOTEM ] Digest does not match
Aug 10 06:25:08 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:08 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data
Aug 10 06:25:08 prox3 corosync[2040]: [TOTEM ] Digest does not match
Aug 10 06:25:08 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:08 prox3 corosync[2040]: [TOTEM ] Invalid packet data
Aug 10 06:25:10 prox3 corosync[2040]: error [TOTEM ] Digest does not match
Aug 10 06:25:10 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:10 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data
Aug 10 06:25:10 prox3 corosync[2040]: [TOTEM ] Digest does not match
Aug 10 06:25:10 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:10 prox3 corosync[2040]: [TOTEM ] Invalid packet data
Aug 10 06:25:11 prox3 corosync[2040]: error [TOTEM ] Digest does not match
Aug 10 06:25:11 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:11 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data
Aug 10 06:25:11 prox3 corosync[2040]: [TOTEM ] Digest does not match
Aug 10 06:25:11 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:11 prox3 corosync[2040]: [TOTEM ] Invalid packet data
Aug 10 06:25:11 prox3 corosync[2040]: error [TOTEM ] Digest does not match
Aug 10 06:25:11 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:11 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data
Aug 10 06:25:11 prox3 corosync[2040]: [TOTEM ] Digest does not match
Aug 10 06:25:11 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:11 prox3 corosync[2040]: [TOTEM ] Invalid packet data
Aug 10 06:25:11 prox3 corosync[2040]: error [TOTEM ] Digest does not match
Aug 10 06:25:11 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:11 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data
Aug 10 06:25:11 prox3 corosync[2040]: [TOTEM ] Digest does not match
Aug 10 06:25:11 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:11 prox3 corosync[2040]: [TOTEM ] Invalid packet data
Aug 10 06:25:12 prox3 corosync[2040]: error [TOTEM ] Digest does not match
Aug 10 06:25:12 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:12 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data
Aug 10 06:25:12 prox3 corosync[2040]: [TOTEM ] Digest does not match
Aug 10 06:25:12 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:12 prox3 corosync[2040]: [TOTEM ] Invalid packet data
Aug 10 06:25:12 prox3 corosync[2040]: error [TOTEM ] Digest does not match
Aug 10 06:25:12 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:12 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data
Aug 10 06:25:12 prox3 corosync[2040]: [TOTEM ] Digest does not match
Aug 10 06:25:12 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:12 prox3 corosync[2040]: [TOTEM ] Invalid packet data
Aug 10 06:25:12 prox3 corosync[2040]: error [TOTEM ] Digest does not match
Aug 10 06:25:12 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:12 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data
Aug 10 06:25:12 prox3 corosync[2040]: [TOTEM ] Digest does not match
Aug 10 06:25:12 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:12 prox3 corosync[2040]: [TOTEM ] Invalid packet data
Aug 10 06:25:13 prox3 corosync[2040]: error [TOTEM ] Digest does not match
Aug 10 06:25:13 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:13 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data
Aug 10 06:25:13 prox3 corosync[2040]: [TOTEM ] Digest does not match
Aug 10 06:25:13 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:13 prox3 corosync[2040]: [TOTEM ] Invalid packet data
Aug 10 06:25:13 prox3 pveproxy[14053]: worker exit
Aug 10 06:25:13 prox3 corosync[2040]: error [TOTEM ] Digest does not match
Aug 10 06:25:13 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:13 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data
Aug 10 06:25:13 prox3 corosync[2040]: [TOTEM ] Digest does not match
Aug 10 06:25:13 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:13 prox3 corosync[2040]: [TOTEM ] Invalid packet data
Aug 10 06:25:13 prox3 corosync[2040]: error [TOTEM ] Digest does not match
Aug 10 06:25:13 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:13 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data
Aug 10 06:25:13 prox3 corosync[2040]: [TOTEM ] Digest does not match
Aug 10 06:25:13 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:13 prox3 corosync[2040]: [TOTEM ] Invalid packet data
Aug 10 06:25:14 prox3 corosync[2040]: error [TOTEM ] Digest does not match
Aug 10 06:25:14 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:14 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data
Aug 10 06:25:14 prox3 corosync[2040]: [TOTEM ] Digest does not match
Aug 10 06:25:14 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:14 prox3 corosync[2040]: [TOTEM ] Invalid packet data
Aug 10 06:25:14 prox3 corosync[2040]: error [TOTEM ] Digest does not match
Aug 10 06:25:14 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:14 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data
Aug 10 06:25:14 prox3 corosync[2040]: [TOTEM ] Digest does not match
Aug 10 06:25:14 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:14 prox3 corosync[2040]: [TOTEM ] Invalid packet data
Aug 10 06:25:14 prox3 corosync[2040]: error [TOTEM ] Digest does not match
Aug 10 06:25:14 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:14 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data
Aug 10 06:25:14 prox3 corosync[2040]: [TOTEM ] Digest does not match
Aug 10 06:25:14 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring.
Aug 10 06:25:14 prox3 corosync[2040]: [TOTEM ] Invalid packet data
 
Komplett durch Booten alle dann gehts wieder. Liegt ggf gerade an der Version oder Kernel.
 
Hoffe das wird in einem Update gefixt. Aber ich darf ja nichts sagen das Produkt ist ja zu 100% zuverlässig.
 
Komplett durch Booten alle dann gehts wieder. Liegt ggf gerade an der Version oder Kernel.
Ist aber keine Lösung. Solche Probleme sollten/müssen immer aufgearbeitet werden. Fast immer liegt es an einem Netzwerk oder HW Problem. Wir haben seit Jahren einige Cluster am laufen. Bis jetzt waren es immer defekte Netzwerkkarten, Switches und Kabeln. Einmal hatten wir ein Netzwerkkartentreiberproblem mit einer Netxen Karte. Das war damals ein Kernelproblem.

Welche Netzwerkkarten hast du verbaut? Siehst du zu diesem Zeitpunkt in dmesg was brauchbares?
 
Kann auch an der Software liegen so wieindem Fall es ggf. So ist. Wir haben auch seit Jahren ein Cluster laufen und durch fehlerhafte Updates in der Software gibt es einfach mehr Probleme solcher Art.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!