[SOLVED] Proxmox VE cluster node shows unknown

Hivane

Member
Feb 11, 2014
38
0
6
Paris
www.hivane.net
Hello all,

I am having an issue within a 3 nodes cluster:
After a disk failure, I have re installed one of the nodes (using the same hostname/IP).
Before trying to make it join the cluster again, I removed it from the cluster using "pvecm delnode", but something went bad. I then, tried to remove it again, re-reinstall it using a different hostname, and add it to the cluster.

Now, the cluster has the three nodes, pvecm status shows a normal 3-nodes output, but I have issues when I log on the web interface: Only the node I connect to, appears as green, and all the others appears as unknown.

You can have a view of pvecm status & how it appears on the webinterface here:
https://twitter.com/acontios_net/status/1069844561944567808
and here:
https://twitter.com/acontios_net/status/1069852107057098752

On the first tweet, I am logged in on vdg-pve01-par6 webinterface, and on the second one, on vdg-pvefiler webinterface. All the VMs on pve01 are working fine.

How could I remove myself from that stucked situation ?

Thanks for your help !

Clément
 

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
2,030
205
63
Please check the output of `systemctl status -l pvestatd` - and restart the service, also check your journal `journalctl -r`
 

Hivane

Member
Feb 11, 2014
38
0
6
Paris
www.hivane.net
Hi Stoiko,

Node1:

● pvestatd.service - PVE Status Daemon
Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2018-12-03 18:50:30 CET; 1 day 14h ago
Main PID: 5000 (pvestatd)
Tasks: 1 (limit: 4915)
Memory: 91.6M
CPU: 18min 13.846s
CGroup: /system.slice/pvestatd.service
└─5000 pvestatd

Dec 04 06:18:29 vdg-pve01-par6 pvestatd[5000]: status update time (11.070 seconds)
Dec 04 06:18:37 vdg-pve01-par6 pvestatd[5000]: status update time (7.973 seconds)
Dec 04 06:18:49 vdg-pve01-par6 pvestatd[5000]: status update time (10.170 seconds)
Dec 04 06:18:57 vdg-pve01-par6 pvestatd[5000]: status update time (7.846 seconds)
Dec 04 06:19:05 vdg-pve01-par6 pvestatd[5000]: status update time (5.459 seconds)
Dec 04 06:31:05 vdg-pve01-par6 pvestatd[5000]: status update time (6.353 seconds)
Dec 04 06:31:17 vdg-pve01-par6 pvestatd[5000]: status update time (8.037 seconds)
Dec 04 06:31:25 vdg-pve01-par6 pvestatd[5000]: status update time (6.121 seconds)
Dec 04 06:31:41 vdg-pve01-par6 pvestatd[5000]: status update time (11.868 seconds)
Dec 04 09:52:14 vdg-pve01-par6 pvestatd[5000]: status update time (22.730 seconds)


Node 2:
● pvestatd.service - PVE Status Daemon
Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2018-12-04 06:18:14 CET; 1 day 2h ago
Main PID: 4753 (pvestatd)
Tasks: 1 (limit: 4915)
Memory: 88.0M
CPU: 11min 13.755s
CGroup: /system.slice/pvestatd.service
└─4753 pvestatd

Dec 04 06:18:13 vdg-pve02-par6 systemd[1]: Starting PVE Status Daemon...
Dec 04 06:18:14 vdg-pve02-par6 pvestatd[4753]: starting server
Dec 04 06:18:14 vdg-pve02-par6 systemd[1]: Started PVE Status Daemon.
Dec 04 09:52:18 vdg-pve02-par6 pvestatd[4753]: status update time (24.216 seconds)


Node3:
● pvestatd.service - PVE Status Daemon
Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2018-12-05 09:10:28 CET; 21s ago
Process: 17624 ExecStop=/usr/bin/pvestatd stop (code=exited, status=0/SUCCESS)
Process: 17631 ExecStart=/usr/bin/pvestatd start (code=exited, status=0/SUCCESS)
Main PID: 17651 (pvestatd)
Tasks: 1 (limit: 4915)
Memory: 72.4M
CPU: 1.266s
CGroup: /system.slice/pvestatd.service
└─17651 pvestatd

Dec 05 09:10:27 vdg-pvefiler systemd[1]: Starting PVE Status Daemon...
Dec 05 09:10:28 vdg-pvefiler pvestatd[17651]: starting server
Dec 05 09:10:28 vdg-pvefiler systemd[1]: Started PVE Status Daemon.
it didnt unfortunately solve any problem to restart it (on each of the nodes)
Journal does not show anything relevant
 

bizzarrone

Member
Nov 27, 2014
42
1
8
42
Milan
pinguinuzzo.wordpress.com
Good morning.
Just installed proxmox 5.3, updated just with Debian updates. No pve-nosubs repository. I created a a 3 nodes cluster.
I am using a separate network for the cluster 10 GB. hosts file correct.
Immediatly after creation of the cluster:

Code:
Quorum information
------------------
Date:             Wed Dec  5 10:15:03 2018
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          1/376
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.1.1.1 (local)
0x00000002          1 10.1.1.2
0x00000003          1 10.1.1.3
only 2 minutes after...

Code:
Quorum information
------------------
Date:             Wed Dec  5 10:03:15 2018
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          1/16
Quorate:          No

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      1
Quorum:           2 Activity blocked
Flags:
then I restarted corosync
Code:
systemctl restart corosync
then it turns ok.. after 2 minutes, the same error ...
 

bizzarrone

Member
Nov 27, 2014
42
1
8
42
Milan
pinguinuzzo.wordpress.com
Good morning again.
I discovered the multicast traffic is blocked after 2 minutes...
Code:
prox03 : multicast, seq=180, size=69 bytes, dist=0, time=0.385ms
prox02 : multicast, seq=180, size=69 bytes, dist=0, time=0.421ms
prox03 :   unicast, seq=181, size=69 bytes, dist=0, time=0.222ms
prox02 :   unicast, seq=181, size=69 bytes, dist=0, time=0.360ms
prox03 : multicast, seq=181, size=69 bytes, dist=0, time=0.394ms
prox02 : multicast, seq=181, size=69 bytes, dist=0, time=0.436ms
prox02 :   unicast, seq=182, size=69 bytes, dist=0, time=0.403ms
prox03 :   unicast, seq=182, size=69 bytes, dist=0, time=0.384ms
prox03 :   unicast, seq=183, size=69 bytes, dist=0, time=0.214ms
prox02 :   unicast, seq=183, size=69 bytes, dist=0, time=0.345ms
then

Code:
prox02 :   unicast, xmt/rcv/%loss = 378/378/0%, min/avg/max/std-dev = 0.151/0.390/0.604/0.043
prox02 : multicast, xmt/rcv/%loss = 378/180/52% (seq>=2 52%), min/avg/max/std-dev = 0.277/0.437/0.594/0.035
prox03 :   unicast, xmt/rcv/%loss = 378/378/0%, min/avg/max/std-dev = 0.188/0.342/0.491/0.065
prox03 : multicast, xmt/rcv/%loss = 378/180/52% (seq>=2 52%), min/avg/max/std-dev = 0.311/0.400/0.492/0.028
so I am turning into unicast traffic...
 

Hivane

Member
Feb 11, 2014
38
0
6
Paris
www.hivane.net
it didnt unfortunately solve any problem to restart it (on each of the nodes)
Journal does not show anything relevant
Solved the problem with the following step on each node::

Code:
killall corosync -9 
umount /etc/pve -l 
service pve-cluster stop
... wait a bit, re-installed the same pve-cluster package on each node, and re started everything.
 

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
2,030
205
63
Glad (both) Problems were resolved ! Please mark the thread as solved :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!