CMAN does nothing

  • Thread starter Thread starter flisboac
  • Start date Start date
F

flisboac

Guest
Howdy, recently i've downloaded and configured two PCs with Proxmox VE 2.0 8d4f53a0-20. My intention is to keep virtual machines running on a more powerful machine, and leave the other for storing scheduled backups. Both machines are installed and running, tested the VM's on the master and they're working great, but i can't seem to create cluster nodes on the master. I think the current Proxmox 2 interface doesn't offer such functionality (at least i couldn't find my way on that), so i tried to configure everything on the command line.


This is my PVE version:
Code:
root@master:~# uname -a
Linux master 2.6.32-11-pve #1 SMP Tue Apr 3 10:21:21 CEST 2012 x86_64 GNU/Linux


root@master:~# pveversion -v
pve-manager: 2.0-57 (pve-manager/2.0/ff6cd700)
running kernel: 2.6.32-10-pve
proxmox-ve-2.6.32: 2.0-60
pve-kernel-2.6.32-10-pve: 2.6.32-63
pve-kernel-2.6.32-7-pve: 2.6.32-60
lvm2: 2.02.88-2pve2
clvm: 2.02.88-2pve2
corosync-pve: 1.4.1-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.8-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.7-2
pve-cluster: 1.0-26
qemu-server: 2.0-37
pve-firmware: 1.0-15
libpve-common-perl: 1.0-25
libpve-access-control: 1.0-17
libpve-storage-perl: 2.0-17
vncterm: 1.0-2
vzctl: 3.0.30-2pve2
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-9
ksm-control-daemon: 1.1-1


First, i tried to see the cluster status:
Code:
root@master:~# pvecm status
cman_tool: Cannot open connection to cman, is it running ?


Strange... Let me see the status of the service:
Code:
root@master:~# service cman status[code]


Nothing is output! Maybe i need to start it:
[code]root@master:~# service cman start[code]


Again, no output. There must be something on the logs...
[code]
root@master:~# tail -n 20 /var/log/syslog
Apr  9 16:28:15 master rrdcached[1621]: started new journal /var/lib/rrdcached/journal//rrd.journal.1333999695.297792
Apr  9 17:17:01 master /USR/SBIN/CRON[3861]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Apr  9 17:28:15 master rrdcached[1621]: flushing old values
Apr  9 17:28:15 master rrdcached[1621]: rotating journals
Apr  9 17:28:15 master rrdcached[1621]: started new journal /var/lib/rrdcached/journal//rrd.journal.1334003295.297782
Apr  9 17:28:15 master rrdcached[1621]: removing old journal /var/lib/rrdcached/journal//rrd.journal.1333974883.207034
Apr  9 17:28:15 master rrdcached[1621]: removing old journal /var/lib/rrdcached/journal//rrd.journal.1333978483.207022
Apr  9 17:28:15 master rrdcached[1621]: removing old journal /var/lib/rrdcached/journal//rrd.journal.1333981291.768981
Apr  9 17:28:15 master rrdcached[1621]: removing old journal /var/lib/rrdcached/journal//rrd.journal.1333996095.297432
Apr  9 18:17:01 master /USR/SBIN/CRON[4946]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Apr  9 18:28:15 master rrdcached[1621]: flushing old values
Apr  9 18:28:15 master rrdcached[1621]: rotating journals
Apr  9 18:28:15 master rrdcached[1621]: started new journal /var/lib/rrdcached/journal//rrd.journal.1334006895.297798
Apr  9 18:28:15 master rrdcached[1621]: removing old journal /var/lib/rrdcached/journal//rrd.journal.1333999695.297792
Apr  9 19:17:01 master /USR/SBIN/CRON[6032]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Apr  9 19:28:15 master rrdcached[1621]: flushing old values
Apr  9 19:28:15 master rrdcached[1621]: rotating journals
Apr  9 19:28:15 master rrdcached[1621]: started new journal /var/lib/rrdcached/journal//rrd.journal.1334010495.297788
Apr  9 19:28:15 master rrdcached[1621]: removing old journal /var/lib/rrdcached/journal//rrd.journal.1334003295.297782
Apr  9 19:47:00 master pvedaemon[1744]: <root@pam> successful auth for user 'root@pam'

root@master:~# date
Mon Apr  9 19:50:55 BRT 2012

root@master:~# tail -v /var/log/cluster/*
==> /var/log/cluster/fence_na.log <==
TCP Port: ...... [238].
Node: .......... [00].
Login: ......... [].
Password: ...... [].
Action: ........ [metadata].
Version Request: [no].
Done reading args.
Connection to Node Assassin: [] failed.
Error was: [unknown remote host: ]
Username and/or password invalid. Did you use the command line switches properly?

root@master:~# ls -lR /etc/pve
/etc/pve:
total 4
-rw-r----- 1 root www-data  451 Mar 19 19:53 authkey.pub
-rw-r----- 1 root www-data   16 Mar 19 19:52 datacenter.cfg
lrwxr-x--- 1 root www-data    0 Dec 31  1969 local -> nodes/master
drwxr-x--- 2 root www-data    0 Mar 19 19:53 nodes
lrwxr-x--- 1 root www-data    0 Dec 31  1969 openvz -> nodes/master/openvz
drwx------ 2 root www-data    0 Mar 19 19:53 priv
-rw-r----- 1 root www-data 1533 Mar 19 19:53 pve-root-ca.pem
-rw-r----- 1 root www-data 1675 Mar 19 19:53 pve-www.key
lrwxr-x--- 1 root www-data    0 Dec 31  1969 qemu-server -> nodes/master/qemu-server
-rw-r----- 1 root www-data  129 Apr  4 16:52 storage.cfg
-rw-r----- 1 root www-data   37 Mar 19 19:52 user.cfg
-rw-r----- 1 root www-data  119 Mar 19 19:53 vzdump.cron


/etc/pve/nodes:
total 0
drwxr-x--- 2 root www-data 0 Mar 19 19:53 master


/etc/pve/nodes/master:
total 1
drwxr-x--- 2 root www-data    0 Mar 19 19:53 openvz
drwx------ 2 root www-data    0 Mar 19 19:53 priv
-rw-r----- 1 root www-data 1675 Mar 19 19:53 pve-ssl.key
-rw-r----- 1 root www-data 1354 Mar 19 19:53 pve-ssl.pem
drwxr-x--- 2 root www-data    0 Mar 19 19:53 qemu-server


/etc/pve/nodes/master/openvz:
total 0


/etc/pve/nodes/master/priv:
total 0


/etc/pve/nodes/master/qemu-server:
total 0


/etc/pve/priv:
total 3
-rw------- 1 root www-data 1675 Mar 19 19:53 authkey.key
-rw------- 1 root www-data  396 Apr  9 15:28 authorized_keys
-rw------- 1 root www-data  884 Apr  9 15:28 known_hosts
drwx------ 2 root www-data    0 Mar 20 15:48 lock
-rw------- 1 root www-data 1679 Mar 19 19:53 pve-root-ca.key
-rw------- 1 root www-data    3 Mar 19 19:53 pve-root-ca.srl


/etc/pve/priv/lock:
total 0


I guess there's nothing related to cman there. I even tried to add the node, despite the problems:
Code:
root@master:~# pvecm add <ip>
root@<ip>'s password:
I/O warning : failed to load external entity "/etc/pve/cluster.conf"
ccs_tool: Error: unable to parse requested configuration file


command 'ccs_tool lsnode -c /etc/pve/cluster.conf' failed: exit code 1
unable to add node: command failed (ssh <ip> -o BatchMode=yes pvecm addnode master --force 1)

I don't quite get what's happening... Do i need to create that cluster.conf file? I thought it'd be created at the time the first node was connected.

I also tried to "aptitude update && aptitude full-upgrade" on the backup node, but i got exactly the same outcomes. Can someone point me to the right direction?
 
Last edited by a moderator:
My mistake, i read through that page and all seems well now. Both nodes were initialized and cman is running, and none of them have any virtual machines. But now i'm getting this:

Code:
root@backup:~# pvecm add <master_ip>
authentication key already exists

root@backup:~# pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M      8   2012-04-10 08:44:59  backup

This is probably because of my previous attempts. How'd i go to fix that?
 
Last edited by a moderator:
Yes, i did that... The wikipage doesn't make it so clear that i should not. Guess i'm off to formatting both of them, it'll be easier. At least i have a better understanding of things now:

- The master node is the one that i'll be executing "pvecm create <cluster-name>"
- The slave nodes are the ones i'll be executing "pvecm add <master-ip>"

The only thing i need to sort out is which of them should be the master: the backup or the VM Host.