CMAN does nothing

F

flisboac

Guest
Howdy, recently i've downloaded and configured two PCs with Proxmox VE 2.0 8d4f53a0-20. My intention is to keep virtual machines running on a more powerful machine, and leave the other for storing scheduled backups. Both machines are installed and running, tested the VM's on the master and they're working great, but i can't seem to create cluster nodes on the master. I think the current Proxmox 2 interface doesn't offer such functionality (at least i couldn't find my way on that), so i tried to configure everything on the command line.


This is my PVE version:
Code:
root@master:~# uname -a
Linux master 2.6.32-11-pve #1 SMP Tue Apr 3 10:21:21 CEST 2012 x86_64 GNU/Linux


root@master:~# pveversion -v
pve-manager: 2.0-57 (pve-manager/2.0/ff6cd700)
running kernel: 2.6.32-10-pve
proxmox-ve-2.6.32: 2.0-60
pve-kernel-2.6.32-10-pve: 2.6.32-63
pve-kernel-2.6.32-7-pve: 2.6.32-60
lvm2: 2.02.88-2pve2
clvm: 2.02.88-2pve2
corosync-pve: 1.4.1-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.8-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.7-2
pve-cluster: 1.0-26
qemu-server: 2.0-37
pve-firmware: 1.0-15
libpve-common-perl: 1.0-25
libpve-access-control: 1.0-17
libpve-storage-perl: 2.0-17
vncterm: 1.0-2
vzctl: 3.0.30-2pve2
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-9
ksm-control-daemon: 1.1-1


First, i tried to see the cluster status:
Code:
root@master:~# pvecm status
cman_tool: Cannot open connection to cman, is it running ?


Strange... Let me see the status of the service:
Code:
root@master:~# service cman status[code]


Nothing is output! Maybe i need to start it:
[code]root@master:~# service cman start[code]


Again, no output. There must be something on the logs...
[code]
root@master:~# tail -n 20 /var/log/syslog
Apr  9 16:28:15 master rrdcached[1621]: started new journal /var/lib/rrdcached/journal//rrd.journal.1333999695.297792
Apr  9 17:17:01 master /USR/SBIN/CRON[3861]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Apr  9 17:28:15 master rrdcached[1621]: flushing old values
Apr  9 17:28:15 master rrdcached[1621]: rotating journals
Apr  9 17:28:15 master rrdcached[1621]: started new journal /var/lib/rrdcached/journal//rrd.journal.1334003295.297782
Apr  9 17:28:15 master rrdcached[1621]: removing old journal /var/lib/rrdcached/journal//rrd.journal.1333974883.207034
Apr  9 17:28:15 master rrdcached[1621]: removing old journal /var/lib/rrdcached/journal//rrd.journal.1333978483.207022
Apr  9 17:28:15 master rrdcached[1621]: removing old journal /var/lib/rrdcached/journal//rrd.journal.1333981291.768981
Apr  9 17:28:15 master rrdcached[1621]: removing old journal /var/lib/rrdcached/journal//rrd.journal.1333996095.297432
Apr  9 18:17:01 master /USR/SBIN/CRON[4946]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Apr  9 18:28:15 master rrdcached[1621]: flushing old values
Apr  9 18:28:15 master rrdcached[1621]: rotating journals
Apr  9 18:28:15 master rrdcached[1621]: started new journal /var/lib/rrdcached/journal//rrd.journal.1334006895.297798
Apr  9 18:28:15 master rrdcached[1621]: removing old journal /var/lib/rrdcached/journal//rrd.journal.1333999695.297792
Apr  9 19:17:01 master /USR/SBIN/CRON[6032]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Apr  9 19:28:15 master rrdcached[1621]: flushing old values
Apr  9 19:28:15 master rrdcached[1621]: rotating journals
Apr  9 19:28:15 master rrdcached[1621]: started new journal /var/lib/rrdcached/journal//rrd.journal.1334010495.297788
Apr  9 19:28:15 master rrdcached[1621]: removing old journal /var/lib/rrdcached/journal//rrd.journal.1334003295.297782
Apr  9 19:47:00 master pvedaemon[1744]: <root@pam> successful auth for user 'root@pam'

root@master:~# date
Mon Apr  9 19:50:55 BRT 2012

root@master:~# tail -v /var/log/cluster/*
==> /var/log/cluster/fence_na.log <==
TCP Port: ...... [238].
Node: .......... [00].
Login: ......... [].
Password: ...... [].
Action: ........ [metadata].
Version Request: [no].
Done reading args.
Connection to Node Assassin: [] failed.
Error was: [unknown remote host: ]
Username and/or password invalid. Did you use the command line switches properly?

root@master:~# ls -lR /etc/pve
/etc/pve:
total 4
-rw-r----- 1 root www-data  451 Mar 19 19:53 authkey.pub
-rw-r----- 1 root www-data   16 Mar 19 19:52 datacenter.cfg
lrwxr-x--- 1 root www-data    0 Dec 31  1969 local -> nodes/master
drwxr-x--- 2 root www-data    0 Mar 19 19:53 nodes
lrwxr-x--- 1 root www-data    0 Dec 31  1969 openvz -> nodes/master/openvz
drwx------ 2 root www-data    0 Mar 19 19:53 priv
-rw-r----- 1 root www-data 1533 Mar 19 19:53 pve-root-ca.pem
-rw-r----- 1 root www-data 1675 Mar 19 19:53 pve-www.key
lrwxr-x--- 1 root www-data    0 Dec 31  1969 qemu-server -> nodes/master/qemu-server
-rw-r----- 1 root www-data  129 Apr  4 16:52 storage.cfg
-rw-r----- 1 root www-data   37 Mar 19 19:52 user.cfg
-rw-r----- 1 root www-data  119 Mar 19 19:53 vzdump.cron


/etc/pve/nodes:
total 0
drwxr-x--- 2 root www-data 0 Mar 19 19:53 master


/etc/pve/nodes/master:
total 1
drwxr-x--- 2 root www-data    0 Mar 19 19:53 openvz
drwx------ 2 root www-data    0 Mar 19 19:53 priv
-rw-r----- 1 root www-data 1675 Mar 19 19:53 pve-ssl.key
-rw-r----- 1 root www-data 1354 Mar 19 19:53 pve-ssl.pem
drwxr-x--- 2 root www-data    0 Mar 19 19:53 qemu-server


/etc/pve/nodes/master/openvz:
total 0


/etc/pve/nodes/master/priv:
total 0


/etc/pve/nodes/master/qemu-server:
total 0


/etc/pve/priv:
total 3
-rw------- 1 root www-data 1675 Mar 19 19:53 authkey.key
-rw------- 1 root www-data  396 Apr  9 15:28 authorized_keys
-rw------- 1 root www-data  884 Apr  9 15:28 known_hosts
drwx------ 2 root www-data    0 Mar 20 15:48 lock
-rw------- 1 root www-data 1679 Mar 19 19:53 pve-root-ca.key
-rw------- 1 root www-data    3 Mar 19 19:53 pve-root-ca.srl


/etc/pve/priv/lock:
total 0


I guess there's nothing related to cman there. I even tried to add the node, despite the problems:
Code:
root@master:~# pvecm add <ip>
root@<ip>'s password:
I/O warning : failed to load external entity "/etc/pve/cluster.conf"
ccs_tool: Error: unable to parse requested configuration file


command 'ccs_tool lsnode -c /etc/pve/cluster.conf' failed: exit code 1
unable to add node: command failed (ssh <ip> -o BatchMode=yes pvecm addnode master --force 1)

I don't quite get what's happening... Do i need to create that cluster.conf file? I thought it'd be created at the time the first node was connected.

I also tried to "aptitude update && aptitude full-upgrade" on the backup node, but i got exactly the same outcomes. Can someone point me to the right direction?
 
Last edited by a moderator:
My mistake, i read through that page and all seems well now. Both nodes were initialized and cman is running, and none of them have any virtual machines. But now i'm getting this:

Code:
root@backup:~# pvecm add <master_ip>
authentication key already exists

root@backup:~# pvecm nodes
Node  Sts   Inc   Joined               Name
   1   M      8   2012-04-10 08:44:59  backup

This is probably because of my previous attempts. How'd i go to fix that?
 
Last edited by a moderator:
Yes, i did that... The wikipage doesn't make it so clear that i should not. Guess i'm off to formatting both of them, it'll be easier. At least i have a better understanding of things now:

- The master node is the one that i'll be executing "pvecm create <cluster-name>"
- The slave nodes are the ones i'll be executing "pvecm add <master-ip>"

The only thing i need to sort out is which of them should be the master: the backup or the VM Host.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!