Kein WebGUI mehr nach failed Cluster Join

proxifoxi

Member
Aug 17, 2021
201
16
23
Ich wollte gerade einen weiteren frisch installierten Server (iwkh-vh03) unserem Cluster hinzufügen, leider ging dabei scheinbar irgend etwas in die Hose :(

Eine WebGui Anmeldung ist nun auf dem Cluster (iwkh-vh01) nicht mehr möglich : da läuft ewig die Uhr (Bitte warten) und dann kommt "Anmeldung fehlgeschlagen"

per Shell komme ich noch drauf dort sehe ich im syslog

Code:
Jan 12 12:16:29 iwkh-vh01 pmxcfs[9088]: [dcdb] crit: cpg_send_message failed: 6
Jan 12 12:16:29 iwkh-vh01 pvescheduler[1158842]: replication: cfs-lock 'file-replication_cfg' error: got lock request timeout
Jan 12 12:16:29 iwkh-vh01 pmxcfs[9088]: [status] notice: cpg_send_message retry 100
Jan 12 12:16:29 iwkh-vh01 pmxcfs[9088]: [status] notice: cpg_send_message retried 100 times
Jan 12 12:16:29 iwkh-vh01 pmxcfs[9088]: [status] crit: cpg_send_message failed: 6
Jan 12 12:16:29 iwkh-vh01 pve-firewall[2428]: firewall update time (10.018 seconds)
Jan 12 12:16:30 iwkh-vh01 pmxcfs[9088]: [status] notice: cpg_send_message retry 10
Jan 12 12:16:31 iwkh-vh01 pmxcfs[9088]: [status] notice: cpg_send_message retry 20
Jan 12 12:16:31 iwkh-vh01 corosync[9093]:   [QUORUM] Sync members[2]: 1 2
Jan 12 12:16:31 iwkh-vh01 corosync[9093]:   [TOTEM ] A new membership (1.9cd) was formed. Members
Jan 12 12:16:32 iwkh-vh01 pmxcfs[9088]: [status] notice: cpg_send_message retry 30
Jan 12 12:16:33 iwkh-vh01 pmxcfs[9088]: [status] notice: cpg_send_message retry 40
Jan 12 12:16:34 iwkh-vh01 corosync[9093]:   [TOTEM ] Token has not been received in 2737 ms
Jan 12 12:16:34 iwkh-vh01 pmxcfs[9088]: [status] notice: cpg_send_message retry 50
Jan 12 12:16:35 iwkh-vh01 pmxcfs[9088]: [status] notice: cpg_send_message retry 60
Jan 12 12:16:36 iwkh-vh01 pmxcfs[9088]: [status] notice: cpg_send_message retry 70
Jan 12 12:16:37 iwkh-vh01 corosync[9093]:   [TOTEM ] Token has not been received in 5738 ms
Jan 12 12:16:37 iwkh-vh01 pmxcfs[9088]: [status] notice: cpg_send_message retry 80
Jan 12 12:16:38 iwkh-vh01 pmxcfs[9088]: [status] notice: cpg_send_message retry 90
Jan 12 12:16:39 iwkh-vh01 pmxcfs[9088]: [status] notice: cpg_send_message retry 100
Jan 12 12:16:39 iwkh-vh01 pmxcfs[9088]: [status] notice: cpg_send_message retried 100 times
Jan 12 12:16:39 iwkh-vh01 pmxcfs[9088]: [status] crit: cpg_send_message failed: 6
Jan 12 12:16:39 iwkh-vh01 pve-firewall[2428]: firewall update time (10.008 seconds)
Jan 12 12:16:40 iwkh-vh01 corosync[9093]:   [QUORUM] Sync members[2]: 1 2
Jan 12 12:16:40 iwkh-vh01 corosync[9093]:   [TOTEM ] A new membership (1.9e1) was formed. Members
Jan 12 12:16:40 iwkh-vh01 pmxcfs[9088]: [status] notice: cpg_send_message retry 10
Jan 12 12:16:41 iwkh-vh01 pmxcfs[9088]: [status] notice: cpg_send_message retry 20
Jan 12 12:16:42 iwkh-vh01 pmxcfs[9088]: [status] notice: cpg_send_message retry 30


und im daemon.log

Code:
Jan 12 12:18:00 iwkh-vh01 pve-firewall[2428]: firewall update time (20.017 seconds)
Jan 12 12:18:01 iwkh-vh01 pmxcfs[9088]: [status] notice: cpg_send_message retry 10
Jan 12 12:18:02 iwkh-vh01 pmxcfs[9088]: [status] notice: cpg_send_message retry 20
Jan 12 12:18:02 iwkh-vh01 corosync[9093]:   [TOTEM ] Token has not been received in 2738 ms
Jan 12 12:18:03 iwkh-vh01 pmxcfs[9088]: [status] notice: cpg_send_message retry 30
Jan 12 12:18:03 iwkh-vh01 pmxcfs[9088]: [dcdb] notice: cpg_send_message retry 10
Jan 12 12:18:04 iwkh-vh01 pmxcfs[9088]: [status] notice: cpg_send_message retry 40
Jan 12 12:18:04 iwkh-vh01 pmxcfs[9088]: [dcdb] notice: cpg_send_message retry 20
Jan 12 12:18:05 iwkh-vh01 pmxcfs[9088]: [status] notice: cpg_send_message retry 50
Jan 12 12:18:05 iwkh-vh01 corosync[9093]:   [TOTEM ] Token has not been received in 5739 ms
Jan 12 12:18:05 iwkh-vh01 pmxcfs[9088]: [dcdb] notice: cpg_send_message retry 30
Jan 12 12:18:06 iwkh-vh01 pmxcfs[9088]: [status] notice: cpg_send_message retry 60
Jan 12 12:18:06 iwkh-vh01 pmxcfs[9088]: [dcdb] notice: cpg_send_message retry 40
Jan 12 12:18:07 iwkh-vh01 pmxcfs[9088]: [status] notice: cpg_send_message retry 70
Jan 12 12:18:07 iwkh-vh01 pmxcfs[9088]: [dcdb] notice: cpg_send_message retry 50
Jan 12 12:18:08 iwkh-vh01 pmxcfs[9088]: [status] notice: cpg_send_message retry 80
Jan 12 12:18:08 iwkh-vh01 corosync[9093]:   [QUORUM] Sync members[2]: 1 2
Jan 12 12:18:08 iwkh-vh01 corosync[9093]:   [TOTEM ] A new membership (1.aa9) was formed. Members
Jan 12 12:18:08 iwkh-vh01 pmxcfs[9088]: [dcdb] notice: cpg_send_message retry 60
Jan 12 12:18:09 iwkh-vh01 pmxcfs[9088]: [status] notice: cpg_send_message retry 90
Jan 12 12:18:09 iwkh-vh01 pmxcfs[9088]: [dcdb] notice: cpg_send_message retry 70
Jan 12 12:18:10 iwkh-vh01 pmxcfs[9088]: [status] notice: cpg_send_message retry 100
Jan 12 12:18:10 iwkh-vh01 pmxcfs[9088]: [status] notice: cpg_send_message retried 100 times
Jan 12 12:18:10 iwkh-vh01 pmxcfs[9088]: [status] crit: cpg_send_message failed: 6
Jan 12 12:18:10 iwkh-vh01 pvestatd[2431]: status update time (110.513 seconds)
Jan 12 12:18:10 iwkh-vh01 pve-firewall[2428]: firewall update time (10.008 seconds)
Jan 12 12:18:10 iwkh-vh01 pmxcfs[9088]: [dcdb] notice: cpg_send_message retry 80
Jan 12 12:18:10 iwkh-vh01 corosync[9093]:   [TOTEM ] Token has not been received in 2738 ms
Jan 12 12:18:11 iwkh-vh01 pmxcfs[9088]: [status] notice: cpg_send_message retry 10
Jan 12 12:18:11 iwkh-vh01 pmxcfs[9088]: [dcdb] notice: cpg_send_message retry 90
Jan 12 12:18:12 iwkh-vh01 pmxcfs[9088]: [status] notice: cpg_send_message retry 20



Ich kann mich nun an keine Server mehr per WebGUI anmelden !

Das Problem ist wohl das folgende gewesen
iwkh-vh01 (hat den Cluster(clu01) erstellt) hier hatte root das Passwort : 123456
iwkh-vh02 (wurde dem cluster "clu01" hinzugefügt , hier hatte root vor dem Cluster joinen das Passwort : abcdef
das lief bis dato wunderbar

jetzt habe ich
iwkh-vh03 dem cluster "clu01" hinzugefügt, hatte eine Fehlemeldung "Join Failed" , hier hatte root vor dem joinen das Passwort : 1234abc!



Was kann ich nun machen um wieder auf das System zugreifen zu können ?

Bitte um Hilfe

von
eine sehr verzweifelten
Foxi
 
Last edited:
beim status vom pveproxy kommt diese Meldung

Code:
iwkh-vh01:~# systemctl status pveproxy
● pveproxy.service - PVE API Proxy Server
     Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2021-12-21 12:01:27 CET; 3 weeks 1 days ago
    Process: 2470 ExecStartPre=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
    Process: 2478 ExecStart=/usr/bin/pveproxy start (code=exited, status=0/SUCCESS)
    Process: 506932 ExecReload=/usr/bin/pveproxy restart (code=exited, status=0/SUCCESS)
   Main PID: 2679 (pveproxy)
      Tasks: 4 (limit: 76693)
     Memory: 206.1M
        CPU: 1h 23min 18.114s
     CGroup: /system.slice/pveproxy.service
             ├─   2679 pveproxy
             ├─ 871648 pveproxy worker
             ├─1004865 pveproxy worker
             └─1614922 pveproxy worker

Jan 12 14:44:30 iwkh-vh01 pveproxy[1614528]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1904.
Jan 12 14:44:35 iwkh-vh01 pveproxy[1614528]: worker exit
Jan 12 14:44:35 iwkh-vh01 pveproxy[2679]: worker 1614528 finished
Jan 12 14:44:35 iwkh-vh01 pveproxy[2679]: starting 1 worker(s)
Jan 12 14:44:35 iwkh-vh01 pveproxy[2679]: worker 1614732 started
Jan 12 14:44:35 iwkh-vh01 pveproxy[1614732]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1904.
Jan 12 14:44:40 iwkh-vh01 pveproxy[1614732]: worker exit
Jan 12 14:44:40 iwkh-vh01 pveproxy[2679]: worker 1614732 finished
Jan 12 14:44:40 iwkh-vh01 pveproxy[2679]: starting 1 worker(s)
Jan 12 14:44:40 iwkh-vh01 pveproxy[2679]: worker 1614922 started



Bekomtm man das irgendwie repariert ???

Bin echt am verzweifeln...und will auch nicht mehr kaputtmachen als ehh schon ist.. :(

grüße
Foxi
 
OK, habe es selbst gelöst bekommen *SCHWITZ * *FREU*

Lösung:
Ich habe mich auf jedem Host via shell angemeldet und dafür sorge getragen das die /etc/hosts Datei bei allen 3 gleich ist.
dann auf jedem Host noch ein
systemctl stop corosync
systemctl start corosync

und schon ist alles wieder erreichbar.. ;)
Manchmal ist die Lösung zu einfach.. ;)


Viele liebe Grüße
eure
Foxi
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!