[SOLVED] Proxmox VE cluster node shows unknown

Hivane

Active Member
Feb 11, 2014
38
0
26
Paris
www.hivane.net
Hello all,

I am having an issue within a 3 nodes cluster:
After a disk failure, I have re installed one of the nodes (using the same hostname/IP).
Before trying to make it join the cluster again, I removed it from the cluster using "pvecm delnode", but something went bad. I then, tried to remove it again, re-reinstall it using a different hostname, and add it to the cluster.

Now, the cluster has the three nodes, pvecm status shows a normal 3-nodes output, but I have issues when I log on the web interface: Only the node I connect to, appears as green, and all the others appears as unknown.

You can have a view of pvecm status & how it appears on the webinterface here:
https://twitter.com/acontios_net/status/1069844561944567808
and here:
https://twitter.com/acontios_net/status/1069852107057098752

On the first tweet, I am logged in on vdg-pve01-par6 webinterface, and on the second one, on vdg-pvefiler webinterface. All the VMs on pve01 are working fine.

How could I remove myself from that stucked situation ?

Thanks for your help !

Clément
 
Hi Stoiko,

Node1:

● pvestatd.service - PVE Status Daemon
Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2018-12-03 18:50:30 CET; 1 day 14h ago
Main PID: 5000 (pvestatd)
Tasks: 1 (limit: 4915)
Memory: 91.6M
CPU: 18min 13.846s
CGroup: /system.slice/pvestatd.service
└─5000 pvestatd

Dec 04 06:18:29 vdg-pve01-par6 pvestatd[5000]: status update time (11.070 seconds)
Dec 04 06:18:37 vdg-pve01-par6 pvestatd[5000]: status update time (7.973 seconds)
Dec 04 06:18:49 vdg-pve01-par6 pvestatd[5000]: status update time (10.170 seconds)
Dec 04 06:18:57 vdg-pve01-par6 pvestatd[5000]: status update time (7.846 seconds)
Dec 04 06:19:05 vdg-pve01-par6 pvestatd[5000]: status update time (5.459 seconds)
Dec 04 06:31:05 vdg-pve01-par6 pvestatd[5000]: status update time (6.353 seconds)
Dec 04 06:31:17 vdg-pve01-par6 pvestatd[5000]: status update time (8.037 seconds)
Dec 04 06:31:25 vdg-pve01-par6 pvestatd[5000]: status update time (6.121 seconds)
Dec 04 06:31:41 vdg-pve01-par6 pvestatd[5000]: status update time (11.868 seconds)
Dec 04 09:52:14 vdg-pve01-par6 pvestatd[5000]: status update time (22.730 seconds)


Node 2:
● pvestatd.service - PVE Status Daemon
Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2018-12-04 06:18:14 CET; 1 day 2h ago
Main PID: 4753 (pvestatd)
Tasks: 1 (limit: 4915)
Memory: 88.0M
CPU: 11min 13.755s
CGroup: /system.slice/pvestatd.service
└─4753 pvestatd

Dec 04 06:18:13 vdg-pve02-par6 systemd[1]: Starting PVE Status Daemon...
Dec 04 06:18:14 vdg-pve02-par6 pvestatd[4753]: starting server
Dec 04 06:18:14 vdg-pve02-par6 systemd[1]: Started PVE Status Daemon.
Dec 04 09:52:18 vdg-pve02-par6 pvestatd[4753]: status update time (24.216 seconds)


Node3:
● pvestatd.service - PVE Status Daemon
Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2018-12-05 09:10:28 CET; 21s ago
Process: 17624 ExecStop=/usr/bin/pvestatd stop (code=exited, status=0/SUCCESS)
Process: 17631 ExecStart=/usr/bin/pvestatd start (code=exited, status=0/SUCCESS)
Main PID: 17651 (pvestatd)
Tasks: 1 (limit: 4915)
Memory: 72.4M
CPU: 1.266s
CGroup: /system.slice/pvestatd.service
└─17651 pvestatd

Dec 05 09:10:27 vdg-pvefiler systemd[1]: Starting PVE Status Daemon...
Dec 05 09:10:28 vdg-pvefiler pvestatd[17651]: starting server
Dec 05 09:10:28 vdg-pvefiler systemd[1]: Started PVE Status Daemon.

it didnt unfortunately solve any problem to restart it (on each of the nodes)
Journal does not show anything relevant
 
Good morning.
Just installed proxmox 5.3, updated just with Debian updates. No pve-nosubs repository. I created a a 3 nodes cluster.
I am using a separate network for the cluster 10 GB. hosts file correct.
Immediatly after creation of the cluster:

Code:
Quorum information
------------------
Date:             Wed Dec  5 10:15:03 2018
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          1/376
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.1.1.1 (local)
0x00000002          1 10.1.1.2
0x00000003          1 10.1.1.3

only 2 minutes after...

Code:
Quorum information
------------------
Date:             Wed Dec  5 10:03:15 2018
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          1/16
Quorate:          No

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      1
Quorum:           2 Activity blocked
Flags:

then I restarted corosync
Code:
systemctl restart corosync

then it turns ok.. after 2 minutes, the same error ...
 
Good morning again.
I discovered the multicast traffic is blocked after 2 minutes...
Code:
prox03 : multicast, seq=180, size=69 bytes, dist=0, time=0.385ms
prox02 : multicast, seq=180, size=69 bytes, dist=0, time=0.421ms
prox03 :   unicast, seq=181, size=69 bytes, dist=0, time=0.222ms
prox02 :   unicast, seq=181, size=69 bytes, dist=0, time=0.360ms
prox03 : multicast, seq=181, size=69 bytes, dist=0, time=0.394ms
prox02 : multicast, seq=181, size=69 bytes, dist=0, time=0.436ms
prox02 :   unicast, seq=182, size=69 bytes, dist=0, time=0.403ms
prox03 :   unicast, seq=182, size=69 bytes, dist=0, time=0.384ms
prox03 :   unicast, seq=183, size=69 bytes, dist=0, time=0.214ms
prox02 :   unicast, seq=183, size=69 bytes, dist=0, time=0.345ms

then

Code:
prox02 :   unicast, xmt/rcv/%loss = 378/378/0%, min/avg/max/std-dev = 0.151/0.390/0.604/0.043
prox02 : multicast, xmt/rcv/%loss = 378/180/52% (seq>=2 52%), min/avg/max/std-dev = 0.277/0.437/0.594/0.035
prox03 :   unicast, xmt/rcv/%loss = 378/378/0%, min/avg/max/std-dev = 0.188/0.342/0.491/0.065
prox03 : multicast, xmt/rcv/%loss = 378/180/52% (seq>=2 52%), min/avg/max/std-dev = 0.311/0.400/0.492/0.028

so I am turning into unicast traffic...
 
it didnt unfortunately solve any problem to restart it (on each of the nodes)
Journal does not show anything relevant

Solved the problem with the following step on each node::

Code:
killall corosync -9 
umount /etc/pve -l 
service pve-cluster stop

... wait a bit, re-installed the same pve-cluster package on each node, and re started everything.
 
Glad (both) Problems were resolved ! Please mark the thread as solved :)
 
I had the same problem and solved it when remove the folder called like the node

/etc/pve/nodes/NAME-OF-NODE

When I remove this folder, I resolve the problem.

I am working with Proxmox 6.1.3
 
Hi,

maybe I have the same problem. But what solution is the right way? Delete /etc/pve/nodes/NAME-OF-NODE or learn how to change to unicast?

I have 4 nodes. All are new pve6 (6.2-4) installations with latest updates. 3 nodes running for some days in cluster-mode and today I have add the node 4. After maybe an hour the new node 4 status in GUI is normal and green, but the other (older) 3 Nodes shown as unknown. For a short time all was ok, with the 4 node cluster. Where can I check the real issue und fix it without destroy my new small but nice prod-system? I hope my english is not to bad. thx
 
Please open a new thread instead of replying to one related to an much older version (PVE 5.x used a different version of corosync, PVE 6 uses corosync 3 which uses unicast by default)

When opening a new thread please provide the logs from corosync an pve-cluster and pvestatd
 
Please open a new thread instead of replying to one related to an much older version (PVE 5.x used a different version of corosync, PVE 6 uses corosync 3 which uses unicast by default)

When opening a new thread please provide the logs from corosync an pve-cluster and pvestatd

thx for your fast answer.
 
Please open a new thread instead of replying to one related to an much older version (PVE 5.x used a different version of corosync, PVE 6 uses corosync 3 which uses unicast by default)

When opening a new thread please provide the logs from corosync an pve-cluster and pvestatd

Problem solved. No new thread needed. It was a problem with an autofs-mount on my nodes. After a restart of each node, all is fine. Thx again for fast reply and for a wonderfull Proxmox ve. Have a nice day.
 
  • Like
Reactions: Stoiko Ivanov
Problem solved. No new thread needed. It was a problem with an autofs-mount on my nodes. After a restart of each node, all is fine. Thx again for fast reply and for a wonderfull Proxmox ve. Have a nice day.
Thanks for sharing your solution :) You too!
 
  • Like
Reactions: tdoubleb

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!