Big cluster problem with fresh install proxmox 2.1

basanisi

Renowned Member
Apr 15, 2011
40
2
73
I have create à two node cluster with 2 dell R510 equiped with 8 1T in Perc 700 raid 10.

All work previously with the installation and for test purposes I decide to reinstall from blank, after I backup all my vm.

After a fresh install with proxmox 2.1 without problem I create à new cluster with the help of http://pve.proxmox.com/wiki/Proxmox_VE_2.0_Cluster#Adding_nodes_to_the_Cluster

But when it try to add the second node the answer is

pvecm add 10.165.2.189
authentication key already exists

when I try to force

pvecm add 10.165.2.189 -force
I/O warning : failed to load external entity "/etc/pve/cluster.conf"
ccs_tool: Error: unable to parse requested configuration file

command 'ccs_tool lsnode -c /etc/pve/cluster.conf' failed: exit code 1
unable to add node: command failed (ssh 10.165.2.189 -o BatchMode=yes pvecm addnode proxmox01 --force 1)

On the firt machine where I create the cluster I have a strange date with ls -la /etc/pve

ls -la /etc/pve
total 8
drwxr-x--- 2 root www-data 0 1 jan 1970 .
drwxr-xr-x 80 root root 4096 24 jui 08:37 ..
-rw-r----- 1 root www-data 451 24 jui 07:45 authkey.pub
-rw-r----- 1 root www-data 239 24 jui 08:15 cluster.conf
-r--r----- 1 root www-data 554 1 jan 1970 .clusterlog
-rw-r----- 1 root www-data 16 24 jui 07:41 datacenter.cfg
-rw-r----- 1 root www-data 2 1 jan 1970 .debug
lrwxr-x--- 1 root www-data 0 1 jan 1970 local -> nodes/proxmox01
-r--r----- 1 root www-data 198 1 jan 1970 .members
drwxr-x--- 2 root www-data 0 24 jui 07:45 nodes
lrwxr-x--- 1 root www-data 0 1 jan 1970 openvz -> nodes/proxmox01/openvz
drwx------ 2 root www-data 0 24 jui 07:45 priv
-rw-r----- 1 root www-data 1533 24 jui 07:45 pve-root-ca.pem
-rw-r----- 1 root www-data 1675 24 jui 07:45 pve-www.key
lrwxr-x--- 1 root www-data 0 1 jan 1970 qemu-server -> nodes/proxmox01/qemu-server
-r--r----- 1 root www-data 206 1 jan 1970 .rrd
-rw-r----- 1 root www-data 58 24 jui 07:41 user.cfg
-r--r----- 1 root www-data 256 1 jan 1970 .version
-r--r----- 1 root www-data 18 1 jan 1970 .vmlist
-rw-r----- 1 root www-data 119 24 jui 07:45 vzdump.cron

I try aptitude update && aptitude full-upgrade on both machine and nothing after reboot

I try to blank all partition with dd if=/dev/zero of=/dev/sdX and reinstall, same results

I try to recreate raid nothing

Help please
 
pvecm add 10.165.2.189 -force
I/O warning : failed to load external entity "/etc/pve/cluster.conf"
ccs_tool: Error: unable to parse requested configuration file

Seems you have syntax errors in cluster.conf (check yourself or post the file here).
 
I don't create cluster.conf

the cluster.conf automaticaly create is

cat /etc/pve/cluster.conf
<?xml version="1.0"?>
<cluster name="ac-boussu" config_version="1">

<cman keyfile="/var/lib/pve-cluster/corosync.authkey">
</cman>

<clusternodes>
<clusternode name="proxmox01" votes="1" nodeid="1"/>
</clusternodes>

</cluster>
 
The result is

/etc/init.d/pve-cluster restart
Restarting pve cluster filesystem: pve-cluster.
root@proxmox01:~#

and syslog is

Jul 24 11:21:51 proxmox01 pmxcfs[2889]: [main] notice: teardown filesystem
Jul 24 11:21:52 proxmox01 pmxcfs[2889]: [main] notice: exit proxmox configuration filesystem (0)
Jul 24 11:21:52 proxmox01 pmxcfs[2968]: [status] notice: update cluster info (cluster name cluster01, version = 1)
Jul 24 11:21:52 proxmox01 pmxcfs[2968]: [status] notice: node has quorum
Jul 24 11:21:52 proxmox01 pmxcfs[2968]: [dcdb] notice: members: 1/2968
Jul 24 11:21:52 proxmox01 pmxcfs[2968]: [dcdb] notice: all data is up to date
Jul 24 11:21:52 proxmox01 pmxcfs[2968]: [dcdb] notice: members: 1/2968
Jul 24 11:21:52 proxmox01 pmxcfs[2968]: [dcdb] notice: all data is up to date
Jul 24 11:21:59 proxmox01 pvestatd[1966]: WARNING: ipcc_send_rec failed: Transport endpoint is not connected
Jul 24 11:22:26 proxmox01 pmxcfs[2968]: [main] notice: teardown filesystem
Jul 24 11:22:27 proxmox01 pmxcfs[2968]: [main] notice: exit proxmox configuration filesystem (0)
Jul 24 11:22:27 proxmox01 pmxcfs[3003]: [status] notice: update cluster info (cluster name cluster01, version = 1)
Jul 24 11:22:27 proxmox01 pmxcfs[3003]: [status] notice: node has quorum
Jul 24 11:22:27 proxmox01 pmxcfs[3003]: [dcdb] notice: members: 1/3003
Jul 24 11:22:27 proxmox01 pmxcfs[3003]: [dcdb] notice: all data is up to date
Jul 24 11:22:27 proxmox01 pmxcfs[3003]: [dcdb] notice: members: 1/3003
Jul 24 11:22:27 proxmox01 pmxcfs[3003]: [dcdb] notice: all data is up to date
Jul 24 11:22:29 proxmox01 pvestatd[1966]: WARNING: ipcc_send_rec failed: Transport endpoint is not connected



For information I install proxmox 2.1 iso on 2 vm on an other proxmox server, and on both vm i don aptitude update && aptitude full-upgrade && rebooot

After reboot I try cluster creation on first vm with no problem

and pvecm add IP_OF_SECOND_VM to the same result authentication key already exists and with force option I/O warning : failed to load external entity "/etc/pve/cluster.conf"
ccs_tool: Error: unable to parse requested configuration file

What' the problem
 
Last edited:
I try both :

pvecm create and pvecm add on first node

for pvecm create on first note the console gave me this

pvecm create cluster
Generating public/private rsa key pair.
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
67:a6:31:6b:39:32:f4:a4:d4:6c:df:3a:d9:1e:0d:b3 root@proxmox01
The key's randomart image is:
+--[ RSA 2048]----+
| |
| |
| |
| o |
| o S + o |
| o = @ . = |
| + B .oE . |
| + .o... |
| .o. |
+-----------------+
Restarting pve cluster filesystem: pve-cluster[dcdb] notice: wrote new cluster config '/etc/cluster/cluster.conf'
.
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... [ OK ]
Waiting for quorum... [ OK ]
Starting fenced... [ OK ]
Starting dlm_controld... [ OK ]
Unfencing self... [ OK ]



and

pvecm create on first node and pvecm add to second node


for pvecm add to second node the console gave me this

pvecm add 10.165.2.189
Generating public/private rsa key pair.
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
1d:28:ea:55:f2:51:38:a5:9d:18:03:9d:54:8d:7a:93 root@proxmox02
The key's randomart image is:
+--[ RSA 2048]----+
| .++=+o |
| =O... |
| o *o+. |
| . =.oE. |
| . . S... |
| . . |
| . |
| |
| |
+-----------------+
The authenticity of host '10.165.2.189 (10.165.2.189)' can't be established.
RSA key fingerprint is ee:d3:97:1a:fd:50:0d:7f:2a:74:a6:8b:e4:01:d5:61.
Are you sure you want to continue connecting (yes/no)? yes
root@10.165.2.189's password:
I/O warning : failed to load external entity "/etc/pve/cluster.conf"
ccs_tool: Error: unable to parse requested configuration file

command 'ccs_tool lsnode -c /etc/pve/cluster.conf' failed: exit code 1
unable to add node: command failed (ssh 10.165.2.189 -o BatchMode=yes pvecm addnode proxmox02 --force 1)
 
hi.. i got strange behaviour on my proxmox
Selection_003.png

suddenly one of my node lost contact with other node in my cluster. ping and ssh work :(
what actually happen?
 
i just got the answer : just stop pve-cluster, cman all node and then restart. everything will be working again then.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!