Hallo, gestern Abend ist aufgefallen das unser Cluster mit 7 Nodes krank ist. wenn ich mich z.b auf Node 1 einlogge, sehe ich die Vms auf dem Node 1, die der anderen Nodes allerdings nicht. gehe ich per SSH auf einen Node, so kann ich die anderen pingen, bzw komme mittels SSH ohne Probleme drauf. Ich sehe über die Gui auch die Auslastung der anderen Nodes Der Status Vom Cluster ist ebenfalls normal, außer das ich keine Werte erhalte. Jemand eine Idee wo ich ansetzen könnte ? am Cluster selbst wurde gestern nichts geändert, nur an einer VPN Verbindung vom RZ in unser Büro, hängt allerdings nicht damit zusammen ( sind schon einmal andere Netze )
steht im syslog irgendwas? wenn der pvestatd irgendwie hängt können diese symptome auftreten, ist jetzt nur die frage warum der hängt
Hallo, guter Ansatz ... ich habe dort einige corosync Alerts gefunden Aug 10 06:25:08 prox3 corosync[2040]: error [TOTEM ] Digest does not match Aug 10 06:25:08 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:08 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data Aug 10 06:25:08 prox3 corosync[2040]: [TOTEM ] Digest does not match Aug 10 06:25:08 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:08 prox3 corosync[2040]: [TOTEM ] Invalid packet data Aug 10 06:25:08 prox3 corosync[2040]: error [TOTEM ] Digest does not match Aug 10 06:25:08 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:08 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data Aug 10 06:25:08 prox3 corosync[2040]: [TOTEM ] Digest does not match Aug 10 06:25:08 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:08 prox3 corosync[2040]: [TOTEM ] Invalid packet data Aug 10 06:25:07 prox3 corosync[2040]: error [TOTEM ] Digest does not match Aug 10 06:25:07 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:07 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data Aug 10 06:25:07 prox3 corosync[2040]: [TOTEM ] Digest does not match Aug 10 06:25:07 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:07 prox3 corosync[2040]: [TOTEM ] Invalid packet data Aug 10 06:25:08 prox3 corosync[2040]: error [TOTEM ] Digest does not match Aug 10 06:25:08 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:08 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data Aug 10 06:25:08 prox3 corosync[2040]: [TOTEM ] Digest does not match Aug 10 06:25:08 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:08 prox3 corosync[2040]: [TOTEM ] Invalid packet data Aug 10 06:25:10 prox3 corosync[2040]: error [TOTEM ] Digest does not match Aug 10 06:25:10 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:10 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data Aug 10 06:25:10 prox3 corosync[2040]: [TOTEM ] Digest does not match Aug 10 06:25:10 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:10 prox3 corosync[2040]: [TOTEM ] Invalid packet data Aug 10 06:25:11 prox3 corosync[2040]: error [TOTEM ] Digest does not match Aug 10 06:25:11 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:11 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data Aug 10 06:25:11 prox3 corosync[2040]: [TOTEM ] Digest does not match Aug 10 06:25:11 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:11 prox3 corosync[2040]: [TOTEM ] Invalid packet data Aug 10 06:25:11 prox3 corosync[2040]: error [TOTEM ] Digest does not match Aug 10 06:25:11 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:11 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data Aug 10 06:25:11 prox3 corosync[2040]: [TOTEM ] Digest does not match Aug 10 06:25:11 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:11 prox3 corosync[2040]: [TOTEM ] Invalid packet data Aug 10 06:25:11 prox3 corosync[2040]: error [TOTEM ] Digest does not match Aug 10 06:25:11 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:11 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data Aug 10 06:25:11 prox3 corosync[2040]: [TOTEM ] Digest does not match Aug 10 06:25:11 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:11 prox3 corosync[2040]: [TOTEM ] Invalid packet data Aug 10 06:25:12 prox3 corosync[2040]: error [TOTEM ] Digest does not match Aug 10 06:25:12 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:12 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data Aug 10 06:25:12 prox3 corosync[2040]: [TOTEM ] Digest does not match Aug 10 06:25:12 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:12 prox3 corosync[2040]: [TOTEM ] Invalid packet data Aug 10 06:25:12 prox3 corosync[2040]: error [TOTEM ] Digest does not match Aug 10 06:25:12 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:12 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data Aug 10 06:25:12 prox3 corosync[2040]: [TOTEM ] Digest does not match Aug 10 06:25:12 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:12 prox3 corosync[2040]: [TOTEM ] Invalid packet data Aug 10 06:25:12 prox3 corosync[2040]: error [TOTEM ] Digest does not match Aug 10 06:25:12 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:12 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data Aug 10 06:25:12 prox3 corosync[2040]: [TOTEM ] Digest does not match Aug 10 06:25:12 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:12 prox3 corosync[2040]: [TOTEM ] Invalid packet data Aug 10 06:25:13 prox3 corosync[2040]: error [TOTEM ] Digest does not match Aug 10 06:25:13 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:13 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data Aug 10 06:25:13 prox3 corosync[2040]: [TOTEM ] Digest does not match Aug 10 06:25:13 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:13 prox3 corosync[2040]: [TOTEM ] Invalid packet data Aug 10 06:25:13 prox3 pveproxy[14053]: worker exit Aug 10 06:25:13 prox3 corosync[2040]: error [TOTEM ] Digest does not match Aug 10 06:25:13 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:13 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data Aug 10 06:25:13 prox3 corosync[2040]: [TOTEM ] Digest does not match Aug 10 06:25:13 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:13 prox3 corosync[2040]: [TOTEM ] Invalid packet data Aug 10 06:25:13 prox3 corosync[2040]: error [TOTEM ] Digest does not match Aug 10 06:25:13 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:13 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data Aug 10 06:25:13 prox3 corosync[2040]: [TOTEM ] Digest does not match Aug 10 06:25:13 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:13 prox3 corosync[2040]: [TOTEM ] Invalid packet data Aug 10 06:25:14 prox3 corosync[2040]: error [TOTEM ] Digest does not match Aug 10 06:25:14 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:14 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data Aug 10 06:25:14 prox3 corosync[2040]: [TOTEM ] Digest does not match Aug 10 06:25:14 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:14 prox3 corosync[2040]: [TOTEM ] Invalid packet data Aug 10 06:25:14 prox3 corosync[2040]: error [TOTEM ] Digest does not match Aug 10 06:25:14 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:14 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data Aug 10 06:25:14 prox3 corosync[2040]: [TOTEM ] Digest does not match Aug 10 06:25:14 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:14 prox3 corosync[2040]: [TOTEM ] Invalid packet data Aug 10 06:25:14 prox3 corosync[2040]: error [TOTEM ] Digest does not match Aug 10 06:25:14 prox3 corosync[2040]: alert [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:14 prox3 corosync[2040]: alert [TOTEM ] Invalid packet data Aug 10 06:25:14 prox3 corosync[2040]: [TOTEM ] Digest does not match Aug 10 06:25:14 prox3 corosync[2040]: [TOTEM ] Received message has invalid digest... ignoring. Aug 10 06:25:14 prox3 corosync[2040]: [TOTEM ] Invalid packet data
Hoffe das wird in einem Update gefixt. Aber ich darf ja nichts sagen das Produkt ist ja zu 100% zuverlässig.
Ist aber keine Lösung. Solche Probleme sollten/müssen immer aufgearbeitet werden. Fast immer liegt es an einem Netzwerk oder HW Problem. Wir haben seit Jahren einige Cluster am laufen. Bis jetzt waren es immer defekte Netzwerkkarten, Switches und Kabeln. Einmal hatten wir ein Netzwerkkartentreiberproblem mit einer Netxen Karte. Das war damals ein Kernelproblem. Welche Netzwerkkarten hast du verbaut? Siehst du zu diesem Zeitpunkt in dmesg was brauchbares?
Kann auch an der Software liegen so wieindem Fall es ggf. So ist. Wir haben auch seit Jahren ein Cluster laufen und durch fehlerhafte Updates in der Software gibt es einfach mehr Probleme solcher Art.