3 Node Cluster

Feb 29, 2016
10
1
23
42
Hi.

We have 2 node cluster and now we've added third node. To add node3 to cluster we used command pvecm add node1-IP-address, so now in GUI node1 and node3 sees all 3 nodes, but node2 sees only node1. pvecm status on node1, node2 and node3 shows all three nodes but when we are using GUI node 2 shows only node1 and node2.

What can be wrong?

Please find attached GUI and pvecm status screenshots.

Thanks for help.
 

Attachments

  • pmcluster.png
    pmcluster.png
    9.7 KB · Views: 6
  • pvecmstatus.png
    pvecmstatus.png
    13.2 KB · Views: 6
Last edited:
Did you try clearing your browser cache?
 
Yes we cleared browser cache, even tried to connect to GUI using other browser from which we have never connected to GUI. But today we have little changes, node2 sees node3 but its status is RED (see file attached). Powered on machines on node3 are shown like powered off, but I can see information in all tabs (SysLog, Summary, Services, etc.).
Tried to migrate test VM from node2 to node3 while connected to GUI via node3, got error "no such cluster node 'pmvi3' (500)". When I try to migrate while connected to GUI via node2 it dosen't show node3 on "Migrate" windows drop down menu.
 

Attachments

  • poxmoxmachines2.png
    poxmoxmachines2.png
    5.5 KB · Views: 3
From the command line on node 3, can you run
Code:
ssh -o "BatchMode yes" node2 ls

and also for all other combinations of host to host (3->1, 2->1, 2->3, 1->2, 1->3)? It may be your keys are not working to permit password-less SSH.
 
So here is the result:

node1 -> node2
root@pmvi1:~# ssh -o "BatchMode yes" 192.168.1.172 ls
apctest.output
pve-cluster-backup.tar.gz
ssh-backup.tar.gz

node1 -> node3
root@pmvi1:~# ssh -o "BatchMode yes" 192.168.1.174 ls

node2 -> node1
root@pmvi2:~# ssh -o "BatchMode yes" 192.168.1.170 ls
apctest.output

node2 -> node3
root@pmvi2:~# ssh -o "BatchMode yes" 192.168.1.174 ls

node3 -> node1
root@pmvi3:~# ssh -o "BatchMode yes" 192.168.1.172 ls
apctest.output
pve-cluster-backup.tar.gz
ssh-backup.tar.gz

node3 -> node2
root@pmvi3:~# ssh -o "BatchMode yes" 192.168.1.170 ls
apctest.output
 
Thanks for Your advice, but our subscription plan implies only support via Community Forum.
Thanks one more time! Will wait till someone from ProxMox stuff will reply.
 
Did you check the system logs for corosync / cluster related log messages? Especially regarding time synchronization..
 
Here is where you check out the time sync:
upload_2016-3-3_13-15-2.png
I have found that if the time is not within +-2mins of the Quorate, SSH pipes become inactive. Although I know for a fact this only causes an online node to display as 'offline' with red dot in GUI instead of green. Your issue is interesting. I suspect there are multiple issues at play here. Isolation will be difficult. For example my cluster in this screenshot, "pvecm status" shows me 3 active nodes, but I ran the "pvecm delnode pve5" and pve6 when they were both offline which should have completely removed them. Frustratingly the GUI tells me they are still expected as part of the cluster. How can this be? Corosync knows the real state of the cluster, but a bug exists in the webGUI.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!