Big cluster problem with fresh install proxmox 2.1

basanisi

Renowned Member
Apr 15, 2011
40
2
73
I have create à two node cluster with 2 dell R510 equiped with 8 1T in Perc 700 raid 10.

All work previously with the installation and for test purposes I decide to reinstall from blank, after I backup all my vm.

After a fresh install with proxmox 2.1 without problem I create à new cluster with the help of http://pve.proxmox.com/wiki/Proxmox_VE_2.0_Cluster#Adding_nodes_to_the_Cluster

But when it try to add the second node the answer is

pvecm add 10.165.2.189
authentication key already exists

when I try to force

pvecm add 10.165.2.189 -force
I/O warning : failed to load external entity "/etc/pve/cluster.conf"
ccs_tool: Error: unable to parse requested configuration file

command 'ccs_tool lsnode -c /etc/pve/cluster.conf' failed: exit code 1
unable to add node: command failed (ssh 10.165.2.189 -o BatchMode=yes pvecm addnode proxmox01 --force 1)

On the firt machine where I create the cluster I have a strange date with ls -la /etc/pve

ls -la /etc/pve
total 8
drwxr-x--- 2 root www-data 0 1 jan 1970 .
drwxr-xr-x 80 root root 4096 24 jui 08:37 ..
-rw-r----- 1 root www-data 451 24 jui 07:45 authkey.pub
-rw-r----- 1 root www-data 239 24 jui 08:15 cluster.conf
-r--r----- 1 root www-data 554 1 jan 1970 .clusterlog
-rw-r----- 1 root www-data 16 24 jui 07:41 datacenter.cfg
-rw-r----- 1 root www-data 2 1 jan 1970 .debug
lrwxr-x--- 1 root www-data 0 1 jan 1970 local -> nodes/proxmox01
-r--r----- 1 root www-data 198 1 jan 1970 .members
drwxr-x--- 2 root www-data 0 24 jui 07:45 nodes
lrwxr-x--- 1 root www-data 0 1 jan 1970 openvz -> nodes/proxmox01/openvz
drwx------ 2 root www-data 0 24 jui 07:45 priv
-rw-r----- 1 root www-data 1533 24 jui 07:45 pve-root-ca.pem
-rw-r----- 1 root www-data 1675 24 jui 07:45 pve-www.key
lrwxr-x--- 1 root www-data 0 1 jan 1970 qemu-server -> nodes/proxmox01/qemu-server
-r--r----- 1 root www-data 206 1 jan 1970 .rrd
-rw-r----- 1 root www-data 58 24 jui 07:41 user.cfg
-r--r----- 1 root www-data 256 1 jan 1970 .version
-r--r----- 1 root www-data 18 1 jan 1970 .vmlist
-rw-r----- 1 root www-data 119 24 jui 07:45 vzdump.cron

I try aptitude update && aptitude full-upgrade on both machine and nothing after reboot

I try to blank all partition with dd if=/dev/zero of=/dev/sdX and reinstall, same results

I try to recreate raid nothing

Help please
 
pvecm add 10.165.2.189 -force
I/O warning : failed to load external entity "/etc/pve/cluster.conf"
ccs_tool: Error: unable to parse requested configuration file

Seems you have syntax errors in cluster.conf (check yourself or post the file here).
 
I don't create cluster.conf

the cluster.conf automaticaly create is

cat /etc/pve/cluster.conf
<?xml version="1.0"?>
<cluster name="ac-boussu" config_version="1">

<cman keyfile="/var/lib/pve-cluster/corosync.authkey">
</cman>

<clusternodes>
<clusternode name="proxmox01" votes="1" nodeid="1"/>
</clusternodes>

</cluster>
 
The result is

/etc/init.d/pve-cluster restart
Restarting pve cluster filesystem: pve-cluster.
root@proxmox01:~#

and syslog is

Jul 24 11:21:51 proxmox01 pmxcfs[2889]: [main] notice: teardown filesystem
Jul 24 11:21:52 proxmox01 pmxcfs[2889]: [main] notice: exit proxmox configuration filesystem (0)
Jul 24 11:21:52 proxmox01 pmxcfs[2968]: [status] notice: update cluster info (cluster name cluster01, version = 1)
Jul 24 11:21:52 proxmox01 pmxcfs[2968]: [status] notice: node has quorum
Jul 24 11:21:52 proxmox01 pmxcfs[2968]: [dcdb] notice: members: 1/2968
Jul 24 11:21:52 proxmox01 pmxcfs[2968]: [dcdb] notice: all data is up to date
Jul 24 11:21:52 proxmox01 pmxcfs[2968]: [dcdb] notice: members: 1/2968
Jul 24 11:21:52 proxmox01 pmxcfs[2968]: [dcdb] notice: all data is up to date
Jul 24 11:21:59 proxmox01 pvestatd[1966]: WARNING: ipcc_send_rec failed: Transport endpoint is not connected
Jul 24 11:22:26 proxmox01 pmxcfs[2968]: [main] notice: teardown filesystem
Jul 24 11:22:27 proxmox01 pmxcfs[2968]: [main] notice: exit proxmox configuration filesystem (0)
Jul 24 11:22:27 proxmox01 pmxcfs[3003]: [status] notice: update cluster info (cluster name cluster01, version = 1)
Jul 24 11:22:27 proxmox01 pmxcfs[3003]: [status] notice: node has quorum
Jul 24 11:22:27 proxmox01 pmxcfs[3003]: [dcdb] notice: members: 1/3003
Jul 24 11:22:27 proxmox01 pmxcfs[3003]: [dcdb] notice: all data is up to date
Jul 24 11:22:27 proxmox01 pmxcfs[3003]: [dcdb] notice: members: 1/3003
Jul 24 11:22:27 proxmox01 pmxcfs[3003]: [dcdb] notice: all data is up to date
Jul 24 11:22:29 proxmox01 pvestatd[1966]: WARNING: ipcc_send_rec failed: Transport endpoint is not connected



For information I install proxmox 2.1 iso on 2 vm on an other proxmox server, and on both vm i don aptitude update && aptitude full-upgrade && rebooot

After reboot I try cluster creation on first vm with no problem

and pvecm add IP_OF_SECOND_VM to the same result authentication key already exists and with force option I/O warning : failed to load external entity "/etc/pve/cluster.conf"
ccs_tool: Error: unable to parse requested configuration file

What' the problem
 
Last edited:
I try both :

pvecm create and pvecm add on first node

for pvecm create on first note the console gave me this

pvecm create cluster
Generating public/private rsa key pair.
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
67:a6:31:6b:39:32:f4:a4:d4:6c:df:3a:d9:1e:0d:b3 root@proxmox01
The key's randomart image is:
+--[ RSA 2048]----+
| |
| |
| |
| o |
| o S + o |
| o = @ . = |
| + B .oE . |
| + .o... |
| .o. |
+-----------------+
Restarting pve cluster filesystem: pve-cluster[dcdb] notice: wrote new cluster config '/etc/cluster/cluster.conf'
.
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... [ OK ]
Waiting for quorum... [ OK ]
Starting fenced... [ OK ]
Starting dlm_controld... [ OK ]
Unfencing self... [ OK ]



and

pvecm create on first node and pvecm add to second node


for pvecm add to second node the console gave me this

pvecm add 10.165.2.189
Generating public/private rsa key pair.
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
1d:28:ea:55:f2:51:38:a5:9d:18:03:9d:54:8d:7a:93 root@proxmox02
The key's randomart image is:
+--[ RSA 2048]----+
| .++=+o |
| =O... |
| o *o+. |
| . =.oE. |
| . . S... |
| . . |
| . |
| |
| |
+-----------------+
The authenticity of host '10.165.2.189 (10.165.2.189)' can't be established.
RSA key fingerprint is ee:d3:97:1a:fd:50:0d:7f:2a:74:a6:8b:e4:01:d5:61.
Are you sure you want to continue connecting (yes/no)? yes
root@10.165.2.189's password:
I/O warning : failed to load external entity "/etc/pve/cluster.conf"
ccs_tool: Error: unable to parse requested configuration file

command 'ccs_tool lsnode -c /etc/pve/cluster.conf' failed: exit code 1
unable to add node: command failed (ssh 10.165.2.189 -o BatchMode=yes pvecm addnode proxmox02 --force 1)
 
hi.. i got strange behaviour on my proxmox
Selection_003.png

suddenly one of my node lost contact with other node in my cluster. ping and ssh work :(
what actually happen?
 
i just got the answer : just stop pve-cluster, cman all node and then restart. everything will be working again then.