/etc/pve/local for node missing

bread-baker

Member
Mar 6, 2010
432
0
16
I added the 2-nd node to a cluster, and it is missing /etc/pve/node directory.

both use :
pve-manager: 2.0-12 (pve-manager/2.0/784729f4)
running kernel: 2.6.32-6-pve
proxmox-ve-2.6.32: 2.0-53
pve-kernel-2.6.32-6-pve: 2.6.32-53
lvm2: 2.02.86-1pve2
clvm: 2.02.86-1pve2
corosync-pve: 1.4.1-1
openais-pve: 1.1.4-1
libqb: 0.6.0-1
redhat-cluster-pve: 3.1.7-1
pve-cluster: 1.0-12
qemu-server: 2.0-9
pve-firmware: 1.0-13
libpve-common-perl: 1.0-8
libpve-access-control: 1.0-2
libpve-storage-perl: 2.0-8
vncterm: 1.0-2
vzctl: 3.0.29-3pve3
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 0.15.0-1
ksm-control-daemon: 1.1-1


(qemu-server is one upgrade behind on one of the nodes , but probably does not have anything to do with the issue ).


this is what I did to create the cluster:
Code:
root@fbc1 /usr/bin # pvecm create fbc
Restarting pve cluster filesystem: pve-cluster.
Starting cluster: 
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Waiting for quorum... [  OK  ]
   Starting fenced... [  OK  ]
   Starting dlm_controld... [  OK  ]
   Unfencing self... [  OK  ]
   Joining fence domain... [  OK  ]
root@fbc1 /usr/bin # pvecm status
Version: 6.2.0
Config Version: 1
Cluster Name: fbc
Cluster Id: 703
Cluster Member: Yes
Cluster Generation: 4
Membership state: Cluster-Member
Nodes: 1
Expected votes: 1
Total votes: 1
Node votes: 1
Quorum: 1  
Active subsystems: 5
Flags: 
Ports Bound: 0  
Node name: fbc1
Node ID: 1
Multicast addresses: 239.192.2.193 
Node addresses: 10.100.100.1

then to add a node: ( note this line: Waiting for quorum... Timed-out waiting for cluster' )
Code:
root@fbc186 ~ # pvecm add 10.100.100.1
root@10.100.100.1's password: 
copy corosync auth key
stopping pve-cluster service
Stopping pve cluster filesystem: pve-cluster.
backup old database
Starting pve cluster filesystem : pve-cluster.
Starting cluster: 
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Waiting for quorum... Timed-out waiting for cluster
[FAILED]
cluster not ready - no quorum?
root@fbc186 ~ # pvecm status
Version: 6.2.0
Config Version: 2
Cluster Name: fbc
Cluster Id: 703
Cluster Member: Yes
Cluster Generation: 4
Membership state: Cluster-Member
Nodes: 1
Expected votes: 2
Total votes: 1
Node votes: 1
Quorum: 2 Activity blocked
Active subsystems: 1
Flags: 
Ports Bound: 0  
Node name: fbc186
Node ID: 2
Multicast addresses: 239.192.2.193 
Node addresses: 10.100.100.186

this did not look correct on either node:
Code:
root@fbc186 ~ # pvecm nodes
Node  Sts   Inc   Joined               Name
   1   X      0                        fbc1
   2   M      4   2011-12-01 21:03:48  fbc186

root@fbc1 ~ # pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M      4   2011-12-01 21:01:24  fbc1
   2   X      0                        fbc186

so i rebooted fbc186 then this part looks ok:
root@fbc1 ~ # pvecm nodes
Node Sts Inc Joined Name
1 M 4 2011-12-01 21:01:24 fbc1
2 M 12 2011-12-01 21:14:27 fbc186

[/code]

here is ls -la /etc/pve both nodes:
Code:
root@fbc186 /etc/pve # ls -la /etc/pve
total 5
drwxr-x---  2 root www-data    0 Dec 31  1969 .
drwxr-xr-x 84 root root     4096 Dec  1 21:13 ..
-rw-r-----  1 root www-data  277 Dec  1 21:03 cluster.conf
-r--r-----  1 root www-data  153 Dec 31  1969 .clusterlog
-rw-r-----  1 root www-data    2 Dec 31  1969 .debug
lrwxr-x---  1 root www-data    0 Dec 31  1969 local -> nodes/fbc186
-r--r-----  1 root www-data  223 Dec 31  1969 .members
lrwxr-x---  1 root www-data    0 Dec 31  1969 openvz -> nodes/fbc186/openvz
lrwxr-x---  1 root www-data    0 Dec 31  1969 qemu-server -> nodes/fbc186/qemu-server
-r--r-----  1 root www-data  200 Dec 31  1969 .rrd
-r--r-----  1 root www-data  230 Dec 31  1969 .version
-r--r-----  1 root www-data   18 Dec 31  1969 .vmlist


root@fbc1 /etc/pve # ls -la /etc/pve
total 16
drwxr-x---   2 root www-data     0 Dec 31  1969 .
drwxr-xr-x 125 root root     12288 Dec  1 21:01 ..
-r--r-----   1 root www-data   451 Oct 31 12:53 authkey.pub
-r--r-----   1 root www-data   277 Dec  1 21:03 cluster.conf
-r--r-----   1 root www-data   228 Dec  1 21:03 cluster.conf.old
-r--r-----   1 root www-data   938 Dec 31  1969 .clusterlog
-rw-r-----   1 root www-data     2 Dec 31  1969 .debug
lr-xr-x---   1 root www-data     0 Dec 31  1969 local -> nodes/fbc1
-r--r-----   1 root www-data   219 Dec 31  1969 .members
dr-xr-x---   2 root www-data     0 Oct 31 12:53 nodes
lr-xr-x---   1 root www-data     0 Dec 31  1969 openvz -> nodes/fbc1/openvz
dr-x------   2 root www-data     0 Oct 31 12:53 priv
-r--r-----   1 root www-data  1533 Oct 31 12:53 pve-root-ca.pem
-r--r-----   1 root www-data  1675 Oct 31 12:53 pve-www.key
lr-xr-x---   1 root www-data     0 Dec 31  1969 qemu-server -> nodes/fbc1/qemu-server
-r--r-----   1 root www-data  1243 Dec 31  1969 .rrd
-r--r-----   1 root www-data   216 Dec  1 21:09 storage.cfg
-r--r-----   1 root www-data   228 Dec 31  1969 .version
-r--r-----   1 root www-data   393 Dec 31  1969 .vmlist
-r--r-----   1 root www-data   281 Nov 26 19:38 vzdump.cron

note on fbc186 , nodes/fbc186/openvz is missing .
 
more info

fbc186 was created today. originally it had 1 test kvm.

when i tried to add it to the cluster, there was a msg that it could not be done with an existing vm. so I deleted it.

originally I could ssh into fbc186 from fbc1 , but can not now.

from fbc1 this does not work:
ssh-copy-id fbc186

results in:
Code:
root@fbc1 ~ # ssh-copy-id fbc186
root@fbc186's password: 
bash: .ssh/authorized_keys: No such file or directory


let me know if you need more info, or tests .
 
Please try to reinstall fcb186 and try to join again (when there is no existing VM on fbc186).
 
Please try to reinstall fcb186 and try to join again (when there is no existing VM on fbc186).
will do that later.

fbc1, which was installed using the 1-st beta, has gnome and other software installed, may be the bigger problem. [ i use it as a workstation and backup storage area..] .

before try to setup a cluster, fbc1 had authorized keys setup to allow password-less login . now those keys do not work, and I can not update them:
Code:
root@fbcadmin ~ # ssh-copy-id -v fbc1
OpenSSH_5.5p1 Debian-6+squeeze1, OpenSSL 0.9.8o 01 Jun 2010
Pseudo-terminal will not be allocated because stdin is not a terminal.
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
ssh: Could not resolve hostname umask 077; test -d .ssh || mkdir .ssh ; cat >> .ssh/authorized_keys: Name or service not known

fbc1 .ssh list:
Code:
total 112
drwx------  2 root root  4096 Dec  1 21:01 .
drwx------ 21 root pro   4096 Dec  1 22:54 ..
lrwxrwxrwx  1 root root    29 Dec  1 21:01 authorized_keys -> /etc/pve/priv/authorized_keys
-rw-------  1 root root  5997 Dec  1 20:58 authorized_keys.org
-rw-------  1 root root  1675 Sep 25 07:57 id_rsa
-rw-------  1 root root   391 Sep 25 07:57 id_rsa.pub
-rw-------  1 root root 83502 Dec  1 20:40 known_hosts

and
Code:
root@fbc1 ~ # ls -l /etc/pve/priv/authorized_keys
-r-------- 1 root www-data 1176 Dec  1 21:14 /etc/pve/priv/authorized_keys

my question - are those perms normal when a cluster is not operational?


if not then maybe I should reinstall proxmox 2.0 onto fbc1 .
 
Last edited:
per http://pve.proxmox.com/wiki/Proxmox_VE_2.0_Cluster#Remove_a_cluster_node :
Code:
del node :
root@fbc1 /bkup/fbc1-pvebkup/dump # pvecm delnode  fbc186
cluster not ready - no quorum?

root@fbc1 /bkup/fbc1-pvebkup/dump # pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M      4   2011-12-01 21:01:24  fbc1
   2   X     36                        fbc186

node fbc186 can't be deleted?

if that can not be solved, I assume that it has something to do with fbc1 , and for fbc1 to be put into the cluster a reinstall should be done first?
 
you have no quorum, so the proxmox cluster file system is readonly. set the expected votes to 1 and try again.

> pvecm e 1
 
that worked:
Code:
root@fbc1 ~ # pvecm e 1
root@fbc1 ~ # pvecm delnode  fbc186

root@fbc1 ~ # pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M      4   2011-12-01 21:01:24  fbc1

fbc186 is reinstalled, I'll try to add it to the cluster.
Code:
root@fbc186:~# pvecm add 10.100.100.1
Generating public/private rsa key pair.
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
1f:34:2e:ee:f5:fb:98:ba:81:59:fb:09:13:4e:57:76 root@fbc186
The key's randomart image is:
+--[ RSA 2048]----+
|                 |
|                 |
|          o   o E|
|         o . o . |
|        S * .    |
|       . O =     |
|        + O      |
|       . . * +   |
|        . ooBo.  |
+-----------------+
The authenticity of host '10.100.100.1 (10.100.100.1)' can't be established.
RSA key fingerprint is c3:60:2f:ca:b3:aa:eb:21:24:52:72:02:a7:b0:3c:a8.
Are you sure you want to continue connecting (yes/no)? yes
root@10.100.100.1's password: 
copy corosync auth key
stopping pve-cluster service
Stopping pve cluster filesystem: pve-cluster.
backup old database
Starting pve cluster filesystem : pve-cluster.
Starting cluster: 
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Waiting for quorum...

it is stuck at Waiting for quorum...

then after a minute:
Code:
... Timed-out waiting for cluster
[FAILED]
cluster not ready - no quorum?
 
no firewall.

is there a netstat or telnet command I can use to test ports? ssh works between the systems.
 
Maybe you enabled network-manager when you installed gnome? What is the output of:

# dpkg -l network-manager

Please remove if that is installed.
 
Code:
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                        Version                     Description
+++-===========================-===========================-======================================================================
un  network-manager             <none>                      (no description available)
 
I just saw this in one of your posts "ssh: Could not resolve hostname"

Did you change the hostname somehow - why is that not resolvable (should be in /etc/hosts)?
 
fbc186 is a newly installed system. the only thing I did besides trying to add it to the cluster was an apt update.

here is it's hosts :
Code:
root@fbc186:/etc# cat hosts
127.0.0.1 localhost.localdomain localhost
10.100.100.186 fbc186.fantinibakery.com fbc186 pvelocalhost

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

the command to join a cluster uses an i/p address..

I think the basis of my problems could be due to installing squeeze, gnome then proxmox2 on fbc1 .

that makes for complications .

I have a cluster at home using 2 systems since the 1-st beta and that works good. to improve it and do more tests, I just need to add a 3-rd system.

so I'll do a fresh install on fbc1 and fbc186 . then setup a 3 node cluster [ those 3 are at the factory].
 
fbc1 has a large hosts file, i put all our local network in each machines' hosts.

it's hosts file is the same as what works on our 2 proxmox 1.9 production systems.
 
fbc1 has a large hosts file, i put all our local network in each machines' hosts.

it's hosts file is the same as what works on our 2 proxmox 1.9 production systems.

PVE 2.0 has other requirements than 1.9. The hostname from /etc/hostname must have an entry in /etc/hosts.

So what is the output of:

# grep fbc1 /etc/hosts
 
PVE 2.0 has other requirements than 1.9. The hostname from /etc/hostname must have an entry in /etc/hosts.

So what is the output of:

# grep fbc1 /etc/hosts



10.100.100.1 fbc1.fantinibakery.com fbc1 # backup server.


the issue probably has something to do with the managed switches an multicast, as noted in the other thread.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!