Node unable to join cluster

komdat

New Member
Jun 6, 2011
14
0
1
Hi,

I have a Proxmox 2.1 cluster up and running for 2 Months. The last days I added a few nodes, and now the last one (the 11th) is not able to join the cluster. Network is fine, system clocks match, the other ten are running flawless. They're all blades devided up in 2 bladecenters which are connected to the same managed switches. I've already tried rebooting, removing the node and rejoining, joining at another node, etc.

When I'm trying to add the node I get this:
Code:
can't create shared ssh key database '/etc/pve/priv/authorized_keys'
copy corosync auth key
stopping pve-cluster service
Stopping pve cluster filesystem: pve-cluster.
backup old database
Starting pve cluster filesystem : pve-cluster.
Starting cluster: 
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Waiting for quorum... Timed-out waiting for cluster
[FAILED]
waiting for quorum...
I've waited for a really long time (>30 min), but nothing happens any more at this point.

Here is what pveversion -v sais:
Code:
pve-manager: 2.1-14 (pve-manager/2.1/f32f3f46)
running kernel: 2.6.32-14-pve
proxmox-ve-2.6.32: 2.1-74
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-14-pve: 2.6.32-74
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.92-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.8-1
pve-cluster: 1.0-27
qemu-server: 2.0-49
pve-firmware: 1.0-18
libpve-common-perl: 1.0-30
libpve-access-control: 1.0-24
libpve-storage-perl: 2.0-31
vncterm: 1.0-3
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.1-8
ksm-control-daemon: 1.1-1
The only useful lines in the syslog are:
Code:
Aug 30 12:28:56 proxmox-14 pmxcfs[1930]: [main] notice: teardown filesystem
Aug 30 12:28:58 proxmox-14 pmxcfs[1930]: [main] notice: exit proxmox configuration filesystem (0)
Aug 30 12:28:59 proxmox-14 pmxcfs[2367]: [status] notice: update cluster info (cluster name  xyz, version = 19)
Aug 30 12:28:59 proxmox-14 pmxcfs[2367]: [dcdb] notice: members: 11/2367
Aug 30 12:28:59 proxmox-14 pmxcfs[2367]: [dcdb] notice: all data is up to date
Aug 30 12:28:59 proxmox-14 pmxcfs[2367]: [dcdb] notice: members: 11/2367
Aug 30 12:28:59 proxmox-14 pmxcfs[2367]: [dcdb] notice: all data is up to date
Aug 30 12:29:05 proxmox-14 pvestatd[1762]: WARNING: ipcc_send_rec failed: Transport endpoint is not connected
Maybe one has an idea, google and I don't :-(

Best regards
 
hard to say - you only post fragments of the logs

Well, ok, so here is directly after the startup two attempts to join, a pvedaemon restart, and another attempt to join:
Code:
Aug 30 12:01:51 proxmox-14 pmxcfs[1429]: [main] notice: teardown filesystem
Aug 30 12:01:52 proxmox-14 pmxcfs[1429]: [main] notice: exit proxmox configuration filesystem (0)
Aug 30 12:01:53 proxmox-14 pmxcfs[1930]: [status] notice: update cluster info (cluster name  xyz, version = 19)
Aug 30 12:01:53 proxmox-14 pmxcfs[1930]: [dcdb] notice: members: 11/1930
Aug 30 12:01:53 proxmox-14 pmxcfs[1930]: [dcdb] notice: all data is up to date
Aug 30 12:01:53 proxmox-14 pmxcfs[1930]: [dcdb] notice: members: 11/1930
Aug 30 12:01:53 proxmox-14 pmxcfs[1930]: [dcdb] notice: all data is up to date
Aug 30 12:01:55 proxmox-14 pvestatd[1762]: WARNING: ipcc_send_rec failed: Transport endpoint is not connected
Aug 30 12:17:01 proxmox-14 /USR/SBIN/CRON[2167]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Aug 30 12:28:56 proxmox-14 pmxcfs[1930]: [main] notice: teardown filesystem
Aug 30 12:28:58 proxmox-14 pmxcfs[1930]: [main] notice: exit proxmox configuration filesystem (0)
Aug 30 12:28:59 proxmox-14 pmxcfs[2367]: [status] notice: update cluster info (cluster name  xyz, version = 19)
Aug 30 12:28:59 proxmox-14 pmxcfs[2367]: [dcdb] notice: members: 11/2367
Aug 30 12:28:59 proxmox-14 pmxcfs[2367]: [dcdb] notice: all data is up to date
Aug 30 12:28:59 proxmox-14 pmxcfs[2367]: [dcdb] notice: members: 11/2367
Aug 30 12:28:59 proxmox-14 pmxcfs[2367]: [dcdb] notice: all data is up to date
Aug 30 12:29:05 proxmox-14 pvestatd[1762]: WARNING: ipcc_send_rec failed: Transport endpoint is not connected
Aug 30 12:51:10 proxmox-14 pvedaemon[1734]: received terminate request
Aug 30 12:51:10 proxmox-14 pvedaemon[1734]: worker 1742 finished
Aug 30 12:51:10 proxmox-14 pvedaemon[1734]: worker 1740 finished
Aug 30 12:51:10 proxmox-14 pvedaemon[1734]: worker 1743 finished
Aug 30 12:51:10 proxmox-14 pvedaemon[1734]: server closing
Aug 30 12:51:13 proxmox-14 pvedaemon[2717]: starting server
Aug 30 12:51:13 proxmox-14 pvedaemon[2717]: starting 3 worker(s)
Aug 30 12:51:13 proxmox-14 pvedaemon[2717]: worker 2719 started
Aug 30 12:51:13 proxmox-14 pvedaemon[2717]: worker 2721 started
Aug 30 12:51:13 proxmox-14 pvedaemon[2717]: worker 2724 started
Aug 30 12:51:18 proxmox-14 pmxcfs[2367]: [main] notice: teardown filesystem
Aug 30 12:51:20 proxmox-14 pmxcfs[2367]: [main] notice: exit proxmox configuration filesystem (0)
Aug 30 12:51:20 proxmox-14 pmxcfs[2771]: [status] notice: update cluster info (cluster name  xyz, version = 19)
Aug 30 12:51:20 proxmox-14 pmxcfs[2771]: [dcdb] notice: members: 11/2771
Aug 30 12:51:20 proxmox-14 pmxcfs[2771]: [dcdb] notice: all data is up to date
Aug 30 12:51:20 proxmox-14 pmxcfs[2771]: [dcdb] notice: members: 11/2771
Aug 30 12:51:20 proxmox-14 pmxcfs[2771]: [dcdb] notice: all data is up to date
Aug 30 12:51:25 proxmox-14 pvestatd[1762]: WARNING: ipcc_send_rec failed: Transport endpoint is not connected
I've tried to also post the startup, but it was too much for the forum-sw.

Best regards
 
Hello,

i have here the same problem, only that it is my first cluster installation with Version 2 of Proxmox.

I have 2 new installed server with all the lastest update installed. On the first Server i create the cluster like it is described in the wiki.

But when i add the second server with "pvecm add IP-ADDRESS-CLUSTER" i get the same failure:

Code:
Starting cluster:
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Waiting for quorum... Timed-out waiting for cluster
[FAILED]
waiting for quorum...

Greetings,
Marcel
 
I can't see any attempt to join in that log - does not contain a single cman entry.


Believe me, I did. Maybe it didn't reach the point where something would be written?

I think this log entries:
Aug 30 12:01:51 proxmox-14 pmxcfs[1429]: [main] notice: teardown filesystem

belong to that line in the shell output:
Stopping pve cluster filesystem: pve-cluster.

then the cman schoul be started:
Starting cman... [ OK ]

And this line:
Waiting for quorum... Timed-out waiting for cluster

should produce that error:
Aug 30 12:29:05 proxmox-14 pvestatd[1762]: WARNING: ipcc_send_rec failed: Transport endpoint is not connected

I just don't have a clue what is the reason or how to fix it. And Google couldn't tell me either.
 
I just tried it again, and before the pvestatd error ps aux |grep cman showed me those three tasks:

root 6222 0.0 0.0 10884 1740 pts/0 S+ 16:59 0:00 /bin/bash /etc/init.d/cman start
root 6242 0.0 0.0 10884 892 pts/0 S+ 16:59 0:00 /bin/bash /etc/init.d/cman start
root 6243 0.0 0.0 24532 904 pts/0 S+ 16:59 0:00 cman_tool -t 45 -q wait
 
test IP multicast again - see http://pve.proxmox.com/wiki/Multicast_notes#test_if_multicast_is_working_between_two_nodes

I do read that you already did it but double check is sometimes a good idea.
I checked multicasting as described, and everything is fine. But the problem persists.

As it it a fresh installation, I'll now reinstall it. This should be the better way.

Thanks for your support, I hope the reinstallation does the trick. If not I'll post here again.

Best regards
 
I belive this can happen if you are in the /etc/pve/ folder as you try to join the cluster. System is unable to unmount the folder and remount it so error occures.
Reboot and try to join again with the "pvecm add 10.10.11.101 --force" option.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!