Cluster setup communication problem

liane

Renowned Member
Nov 25, 2008
40
1
73
Hi,

I have a hard time setting a cluster between 2 proxmox boxes.

I followed (or so I think) the instructions on the wiki, setting the master first, then the node, which doesn't have any VM right now.

Code:
master:~# pveca -l
CID----IPADDRESS----ROLE-STATE--------UPTIME---LOAD----MEM---DISK
 1 : xx.xx.17.96    M     S   14 days 10:12   1.32    16%     2%
 2 : xx.xx.27.206   N     ERROR: 500 Server closed connection without sending any data back

node:~# pveca -l
CID----IPADDRESS----ROLE-STATE--------UPTIME---LOAD----MEM---DISK
 1 : xx.xx.17.96    M     ERROR: 500 Server closed connection without sending any data back
 2 : xx.xx.27.206   N     S           02:56   0.00     2%     2%
All storage is local to the boxes, and mounted:

Code:
master:~# pvesm list
backup    dir 0 1  489847224   34883572  7%
local     dir 0 1 1273602652   43964992  3%

node:~# pvesm list
backup    dir 0 1  489847224     202660  0%
local     dir 0 1 1273602652     343360  0%
I can ssh from one box to the other without being asked a password (I have to mention though that I changed the default ssh port in /etc/ssh/ssh_config and /etc/ssh/sshd_config, but I reverted those changes to no avail)

pvedaemon and pvetunnel are running on both boxes:

Code:
anybox:~# ps xaf | grep pve
   2317 ?        Ss     0:15 pvedaemon worker
 630754 ?        S      0:12  \_ pvedaemon worker
 639110 ?        S      0:06  \_ pvedaemon worker
 668191 ?        S      1:35 /usr/bin/perl -w /usr/bin/pvetunnel -p /var/run/pvetunnel.pid
 668201 ?        S      5:21 /usr/bin/perl -w /usr/bin/pvemirror -p /var/run/pvemirror.pid
 651319 pts/0    S+     0:00                  \_ grep pve

in /var/log/auth.log, I have tons of these lines (on master and node):
Code:
Sep 30 00:50:15 master sshd[2738]: error: connect_to localhost port 83: failed.
in /var/log/syslog, I see:
Code:
Sep 30 00:53:33 node pvedaemon[6718]: WARNING: Cannot encode 'meminfo' element as 'hash'. Will be encoded as 'map' instead
Sep 30 00:53:34 node proxwww[7582]: Starting new child 7582
Sep 30 00:53:35 node proxwww[7535]: 500 Server closed connection without sending any data back
Sep 30 00:53:39 node pvedaemon[7405]: WARNING: Cannot encode 'meminfo' element as 'hash'. Will be encoded as 'map' instead
Sep 30 00:53:41 node proxwww[7535]: 500 Server closed connection without sending any data back
on master:
Code:
Sep 30 00:53:14 master pvemirror[668201]: starting cluster syncronization
Sep 30 00:53:14 master pvemirror[668201]: syncing vzlist from 'xx.xx.27.206' failed: 500 Server closed connection without sending any data back
Sep 30 00:53:14 master pvemirror[668201]: syncing templates
Sep 30 00:53:14 master pvemirror[668201]: cluster syncronization finished (0.05 seconds (files 0.00, config 0.00))
Sep 30 00:53:18 master pvedaemon[630754]: WARNING: Cannot encode 'meminfo' element as 'hash'. Will be encoded as 'map' instead
Sep 30 00:53:19 master proxwww[649156]: 500 Server closed connection without sending any data back

I searched this forum, but didn't find a definite answer to my problem, what could I check next?

config:

Code:
master:~# pveversion -v
pve-manager: 1.9-24 (pve-manager/1.9/6542)
running kernel: 2.6.32-6-pve
proxmox-ve-2.6.32: 1.9-43
pve-kernel-2.6.32-6-pve: 2.6.32-43
qemu-server: 1.1-32
pve-firmware: 1.0-13
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.28-1pve5
vzdump: 1.2-15
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.15.0-1
ksm-control-daemon: 1.0-6


node:~# pveversion -v
pve-manager: 1.9-24 (pve-manager/1.9/6542)
running kernel: 2.6.32-6-pve
proxmox-ve-2.6.32: 1.9-47
pve-kernel-2.6.32-6-pve: 2.6.32-47
qemu-server: 1.1-32
pve-firmware: 1.0-14
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.29-2pve1
vzdump: 1.2-16
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.15.0-1
ksm-control-daemon: 1.0-6
 
Last edited:
Well, few days later, I'm still stuck with this problem.

From my tests, only /var/lib/pve-managervzlist is not able to sync between master and node, as templates and backup jobs correctly appear on both web panels.

Tried to delete /var/lib/pve-managervzlist on node, it was recreated but the problem is still there.

What can I check next to troubleshoot the cluster?
 
I guess not (but you're right, I didn't checked there yet)

on master & node, a few:
Code:
[Mon Oct 03 17:25:45 2011][error] [client 82.224.60.106] File does not exist: /usr/share/pve-manager/images/favicon.ico

and on master, 1 line for today:
Code:
[Mon Oct 03 17:25:45 2011][error] [4092]ERR:  24:  Error in Perl code: 500 Server closed connection without sending any data back
 
This line was the key:
Code:
Sep 30 00:50:15 master sshd[2738]: error: connect_to localhost port 83: failed.
For some reason (I'll find out later), my /etc/hosts files on both boxes was empty, and localhost didn't resolve to 127.0.0.1 as it should.

Adding this line to both /etc/hosts files solved the problem instantly:
Code:
127.0.0.1       localhost.localdomain localhost