pve-cluster restart

SamTzu

Renowned Member
Mar 27, 2009
527
17
83
Helsinki, Finland
sami.mattila.eu
For some reason one of our clusters "breaks up" every couple of days.
We have to do pve-cluster restart on several nodes and then it's fixed again for a few days.
/etc/init.d/pve-cluster restart

What could cause this? I think the "main" node has a faulty NIC that could cause problems during backup.
But if one node gets cut of, that should not break the cluster?
 
post 'pveversion -v'
 
I just remembered. This first node had a IP address change after it was created.

root@proxmox1:~# pveversion -v pve-manager: 2.2-31 (pve-manager/2.2/e94e95e9)
running kernel: 2.6.32-16-pve
proxmox-ve-2.6.32: 2.2-82
pve-kernel-2.6.32-16-pve: 2.6.32-82
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-33
qemu-server: 2.0-69
pve-firmware: 1.0-21
libpve-common-perl: 1.0-39
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.2-7
ksm-control-daemon: 1.1-1
 
proxmox1 has a correct IP address in host file and has been rebooted afterwards.
Only thing that the logs revealed was a loss of connection to the NAS server during one of the backups.
One of the backups is somehow always 2 times too-big. It's supposed to be ~30Gb but it is always ~60Gb.
But that is another matter I think and not really related.