[SOLVED] Proxmox VE cluster node shows unknown

Hivane · Dec 4, 2018

Hello all,

I am having an issue within a 3 nodes cluster:
After a disk failure, I have re installed one of the nodes (using the same hostname/IP).
Before trying to make it join the cluster again, I removed it from the cluster using "pvecm delnode", but something went bad. I then, tried to remove it again, re-reinstall it using a different hostname, and add it to the cluster.

Now, the cluster has the three nodes, pvecm status shows a normal 3-nodes output, but I have issues when I log on the web interface: Only the node I connect to, appears as green, and all the others appears as unknown.

You can have a view of pvecm status & how it appears on the webinterface here:
https://twitter.com/acontios_net/status/1069844561944567808
and here:
https://twitter.com/acontios_net/status/1069852107057098752

On the first tweet, I am logged in on vdg-pve01-par6 webinterface, and on the second one, on vdg-pvefiler webinterface. All the VMs on pve01 are working fine.

How could I remove myself from that stucked situation ?

Thanks for your help !

Clément

Stoiko Ivanov · Dec 4, 2018

Please check the output of `systemctl status -l pvestatd` - and restart the service, also check your journal `journalctl -r`

Hivane · Dec 5, 2018

Hi Stoiko,

Node1:

● pvestatd.service - PVE Status Daemon
Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2018-12-03 18:50:30 CET; 1 day 14h ago
Main PID: 5000 (pvestatd)
Tasks: 1 (limit: 4915)
Memory: 91.6M
CPU: 18min 13.846s
CGroup: /system.slice/pvestatd.service
└─5000 pvestatd

Dec 04 06:18:29 vdg-pve01-par6 pvestatd[5000]: status update time (11.070 seconds)
Dec 04 06:18:37 vdg-pve01-par6 pvestatd[5000]: status update time (7.973 seconds)
Dec 04 06:18:49 vdg-pve01-par6 pvestatd[5000]: status update time (10.170 seconds)
Dec 04 06:18:57 vdg-pve01-par6 pvestatd[5000]: status update time (7.846 seconds)
Dec 04 06:19:05 vdg-pve01-par6 pvestatd[5000]: status update time (5.459 seconds)
Dec 04 06:31:05 vdg-pve01-par6 pvestatd[5000]: status update time (6.353 seconds)
Dec 04 06:31:17 vdg-pve01-par6 pvestatd[5000]: status update time (8.037 seconds)
Dec 04 06:31:25 vdg-pve01-par6 pvestatd[5000]: status update time (6.121 seconds)
Dec 04 06:31:41 vdg-pve01-par6 pvestatd[5000]: status update time (11.868 seconds)
Dec 04 09:52:14 vdg-pve01-par6 pvestatd[5000]: status update time (22.730 seconds)

Node 2:
● pvestatd.service - PVE Status Daemon
Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2018-12-04 06:18:14 CET; 1 day 2h ago
Main PID: 4753 (pvestatd)
Tasks: 1 (limit: 4915)
Memory: 88.0M
CPU: 11min 13.755s
CGroup: /system.slice/pvestatd.service
└─4753 pvestatd

Dec 04 06:18:13 vdg-pve02-par6 systemd[1]: Starting PVE Status Daemon...
Dec 04 06:18:14 vdg-pve02-par6 pvestatd[4753]: starting server
Dec 04 06:18:14 vdg-pve02-par6 systemd[1]: Started PVE Status Daemon.
Dec 04 09:52:18 vdg-pve02-par6 pvestatd[4753]: status update time (24.216 seconds)

Node3:
● pvestatd.service - PVE Status Daemon
Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2018-12-05 09:10:28 CET; 21s ago
Process: 17624 ExecStop=/usr/bin/pvestatd stop (code=exited, status=0/SUCCESS)
Process: 17631 ExecStart=/usr/bin/pvestatd start (code=exited, status=0/SUCCESS)
Main PID: 17651 (pvestatd)
Tasks: 1 (limit: 4915)
Memory: 72.4M
CPU: 1.266s
CGroup: /system.slice/pvestatd.service
└─17651 pvestatd

Dec 05 09:10:27 vdg-pvefiler systemd[1]: Starting PVE Status Daemon...
Dec 05 09:10:28 vdg-pvefiler pvestatd[17651]: starting server
Dec 05 09:10:28 vdg-pvefiler systemd[1]: Started PVE Status Daemon.

it didnt unfortunately solve any problem to restart it (on each of the nodes)
Journal does not show anything relevant

bizzarrone · Dec 5, 2018

Good morning.
Just installed proxmox 5.3, updated just with Debian updates. No pve-nosubs repository. I created a a 3 nodes cluster.
I am using a separate network for the cluster 10 GB. hosts file correct.
Immediatly after creation of the cluster:

Code:

Quorum information
------------------
Date:             Wed Dec  5 10:15:03 2018
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          1/376
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.1.1.1 (local)
0x00000002          1 10.1.1.2
0x00000003          1 10.1.1.3

only 2 minutes after...

Code:

Quorum information
------------------
Date:             Wed Dec  5 10:03:15 2018
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          1/16
Quorate:          No

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      1
Quorum:           2 Activity blocked
Flags:

then I restarted corosync

Code:

systemctl restart corosync

then it turns ok.. after 2 minutes, the same error ...

bizzarrone · Dec 5, 2018

Good morning again.
I discovered the multicast traffic is blocked after 2 minutes...

Code:

prox03 : multicast, seq=180, size=69 bytes, dist=0, time=0.385ms
prox02 : multicast, seq=180, size=69 bytes, dist=0, time=0.421ms
prox03 :   unicast, seq=181, size=69 bytes, dist=0, time=0.222ms
prox02 :   unicast, seq=181, size=69 bytes, dist=0, time=0.360ms
prox03 : multicast, seq=181, size=69 bytes, dist=0, time=0.394ms
prox02 : multicast, seq=181, size=69 bytes, dist=0, time=0.436ms
prox02 :   unicast, seq=182, size=69 bytes, dist=0, time=0.403ms
prox03 :   unicast, seq=182, size=69 bytes, dist=0, time=0.384ms
prox03 :   unicast, seq=183, size=69 bytes, dist=0, time=0.214ms
prox02 :   unicast, seq=183, size=69 bytes, dist=0, time=0.345ms

then

Code:

prox02 :   unicast, xmt/rcv/%loss = 378/378/0%, min/avg/max/std-dev = 0.151/0.390/0.604/0.043
prox02 : multicast, xmt/rcv/%loss = 378/180/52% (seq>=2 52%), min/avg/max/std-dev = 0.277/0.437/0.594/0.035
prox03 :   unicast, xmt/rcv/%loss = 378/378/0%, min/avg/max/std-dev = 0.188/0.342/0.491/0.065
prox03 : multicast, xmt/rcv/%loss = 378/180/52% (seq>=2 52%), min/avg/max/std-dev = 0.311/0.400/0.492/0.028

so I am turning into unicast traffic...

Hivane · Dec 5, 2018

Hivane said:
it didnt unfortunately solve any problem to restart it (on each of the nodes)
Journal does not show anything relevant

Solved the problem with the following step on each node::

Code:

killall corosync -9 
umount /etc/pve -l 
service pve-cluster stop

... wait a bit, re-installed the same pve-cluster package on each node, and re started everything.

Stoiko Ivanov · Dec 5, 2018

Glad (both) Problems were resolved ! Please mark the thread as solved

Hivane · Dec 5, 2018

Stoiko Ivanov said:
Please mark the thread as solved

Done!

bizzarrone · Dec 5, 2018

Here the instructions hot to use unicast.
Then reboot every 3 nodes. It works now.

https://pve.proxmox.com/wiki/Multic....29_instead_of_multicast.2C_if_all_else_fails

soporteit · Mar 20, 2020

I had the same problem and solved it when remove the folder called like the node

/etc/pve/nodes/NAME-OF-NODE

When I remove this folder, I resolve the problem.

I am working with Proxmox 6.1.3

tdoubleb · Jun 9, 2020

Hi,

maybe I have the same problem. But what solution is the right way? Delete /etc/pve/nodes/NAME-OF-NODE or learn how to change to unicast?

I have 4 nodes. All are new pve6 (6.2-4) installations with latest updates. 3 nodes running for some days in cluster-mode and today I have add the node 4. After maybe an hour the new node 4 status in GUI is normal and green, but the other (older) 3 Nodes shown as unknown. For a short time all was ok, with the 4 node cluster. Where can I check the real issue und fix it without destroy my new small but nice prod-system? I hope my english is not to bad. thx

Stoiko Ivanov · Jun 9, 2020

Please open a new thread instead of replying to one related to an much older version (PVE 5.x used a different version of corosync, PVE 6 uses corosync 3 which uses unicast by default)

When opening a new thread please provide the logs from corosync an pve-cluster and pvestatd

tdoubleb · Jun 9, 2020

Stoiko Ivanov said:
Please open a new thread instead of replying to one related to an much older version (PVE 5.x used a different version of corosync, PVE 6 uses corosync 3 which uses unicast by default)

When opening a new thread please provide the logs from corosync an pve-cluster and pvestatd

thx for your fast answer.

tdoubleb · Jun 9, 2020

Stoiko Ivanov said:
Please open a new thread instead of replying to one related to an much older version (PVE 5.x used a different version of corosync, PVE 6 uses corosync 3 which uses unicast by default)

When opening a new thread please provide the logs from corosync an pve-cluster and pvestatd

Problem solved. No new thread needed. It was a problem with an autofs-mount on my nodes. After a restart of each node, all is fine. Thx again for fast reply and for a wonderfull Proxmox ve. Have a nice day.

Stoiko Ivanov · Jun 9, 2020

tdoubleb said:
Problem solved. No new thread needed. It was a problem with an autofs-mount on my nodes. After a restart of each node, all is fine. Thx again for fast reply and for a wonderfull Proxmox ve. Have a nice day.

Thanks for sharing your solution

You too!

Search

Search

[SOLVED] Proxmox VE cluster node shows unknown

Hivane

Active Member

Stoiko Ivanov

Proxmox Staff Member

Hivane

Active Member

bizzarrone

Renowned Member

bizzarrone

Renowned Member

Hivane

Active Member

Stoiko Ivanov

Proxmox Staff Member

Hivane

Active Member

bizzarrone

Renowned Member

soporteit

Active Member

tdoubleb

New Member

Stoiko Ivanov

Proxmox Staff Member

tdoubleb

New Member

tdoubleb

New Member

Stoiko Ivanov

Proxmox Staff Member

We value your privacy