Since we had this reset twice, I analyzed the log from one node in more details:
I see
2024-03-06T10:00:12.396742+01:00 pve52 corosync[1172]: [KNET ] link: host: 9 link: 0 is down
2024-03-06T10:00:12.396906+01:00 pve52 corosync[1172]: [KNET ] host: host: 9 (passive) best link: 0 (pri...
Thank You for your reply.
I attached the corosync.conf. This is the same on all nodes. it is located in /etc/pve/corosync.conf has the same content as /etc/corosync/corosnyc.conf as expected.
On pve58 only link0 is up was expected. This is because it got a new network card on ,,link1" und...
Thank You for your time fabian.
here are the logs. node name in aon the filename.
node pve58 was the one switched off and rebooted without ring0 and ceph network. I changed some namings from storage-names
Dear Fabian,
Thank you for Your reply.
All of the hosts have this log entries in syslog:
on the lin [QUORUM] member 9 is missing.
2024-03-06T10:15:35.484881+01:00 pve40 corosync[1561]: [QUORUM] Sync members[6]: 1 2 3 6 7 8
2024-03-06T10:15:35.485344+01:00 pve40 corosync[1561]: [TOTEM ]...
Hey All,
I got a complete cluster reset (watchdog based reset of all nodes) in the following scenario.
Got a cluster of 7 hosts.
corosync has 2 rings:
ring0 network 192.168.xx.n/24 using a dedicated cupper switch
rint1 network 192.168.yy.n/24 using a vlan in a 10g fiber.
Here a part of...
Hi all.
We are in a process of upgrading an extenting our 4 node cluster.
When we setup a new node, is it possible to add this node to the existing versin 6.4 cluster.
We have ceph version 15.2 running on the existing cluster. So first installng ceph version 15 on the version 7 node should...
Hi Fabian,
Thanks for your reply,
Viewed the bug report and likely this could be right.
I will test the packages when they ,,arrive" and come back
Best regards
Lukas
Hey all,
I observed a strange reboot off all my cluster nodes as soon as on one specific host cororsync is restarted or this host rebooted.
I have 7 hosts in one cluster
Corosync has 2 links configured. ring0 is on a separate network on separate switch. ring1 is shared as VLAN over 10G fiber...
Hey All,
Das schein dann doch ein Bug in Check_mk.
Alle Cluster können korrekt abgefragt werden.
Sobald aber in einem Cluster ein Host down ist (auch beabsichtigt), läuft der special Agent auf den JSONDecode Error. Sobald alles hosts wieder Up, kommt ein korrekter Output.
Gruss Lukas
OK,.... was ich gefunden habe ist, dass ich beide cluster per curl erreichen kann und abfragen:
z.B. /nodes
Auf den ,,laufenden Cluster", der per cmk special-agent abgefragt werden kann....
curl --insecure --cookie "$(<cookie)" https://10.1.0.11:8006/api2/json/nodes/...
Leider nur ein:
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
ein mitgegebenes --debug dann noch diesen trace:
Traceback (most recent call last):
File "/omd/sites/mc/share/check_mk/agents/special/agent_proxmox_ve", line 10, in <module>
main()
File...
Hallo Stoiko Ivanov,
Danke für die Rückmeldung. cmk -d hostname fürhrt sowohl den agent check auf dem Host aus, der tadellos funktioniert, wie auch den oben beschriebenen ,,spezial Agent". Dieser fragt per https vom cmk-host den Proxmox Cluster über die Proxmox api ab.
Der Aufruf erwartet...
Ich versuche Proxmox VE 6.4 Cluster mit einem upgegradeten
check_mk der Version 2.0 zu monitoren.
Check_mk 2.0 liefert einen special Agent, der die Proxmox API nutzt.
Auf einem Cluster (beide Cluster haben den gleichen Patch Stand) bekomme ich auch brauchbare Antworten aus der API:
Auf dem...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.