Proxmox 2 cluster problem

Erwin123 · Jun 12, 2013

Disclaimer: I know I will get the standard answer 'Please update to the latest version' but PLEASE don't.
We have a 1.9 cluster and a 2.1 cluster.
Keeping nodes up cutting edge to date with live containers on it is just to dangerous for us.

On the 2.1 cluster the second node (called node10) is red.
I could not create a CT on it and tried to migrate a CT from node9 (the master) to it.
It failed complaining Container config file does not exist but node9 deleted the CT on node 9.
Now node10 shows the container in the interface but if you select it it says it does not exist.
I cannot start it and cannot delete it.

I tried service pvestatd restart but this did not change anything.

How do I get the non existing (apperently only exists in the admin interface) container out of the interface?
How do I get node10 to play nice with node9 in their cluster?

root@node10:/etc/pve/nodes/node10/openvz# cat /etc/pve/.members
{
"nodename": "node10",
"version": 6,
"cluster": { "name": "cluster2", "version": 2, "nodes": 2, "quorate": 1 },
"nodelist": {
"node9": { "id": 1, "online": 1, "ip": "xx.x.xx.xx"},
"node10": { "id": 2, "online": 1, "ip": "xx.x.xx.xx"}
}
}

root@node10:/etc/pve/nodes/node10/openvz# service cman status
cluster is running.

root@node10:/etc/pve/nodes/node10/openvz# pveversion -v
pve-manager: 2.1-1 (pve-manager/2.1/f9b0f63a)
running kernel: 2.6.32-11-pve
proxmox-ve-2.6.32: 2.0-66
pve-kernel-2.6.32-11-pve: 2.6.32-66
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.8-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.7-2
pve-cluster: 1.0-26
qemu-server: 2.0-39
pve-firmware: 1.0-15
libpve-common-perl: 1.0-27
libpve-access-control: 1.0-21
libpve-storage-perl: 2.0-18
vncterm: 1.0-2
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-9
ksm-control-daemon: 1.1-1

Both nodes are running the same version...

Edit: I also see this error: Jun 12 16:51:18 node10 pmxcfs[16472]: [status] crit: cpg_send_message failed: 12

Erwin123 · Jun 12, 2013

I saw when I login to the interface of node10 directly it does not show the deleted container.

This is the error message I got when migrating, scary that it deleted everything although there was a error:

Jun 12 16:18:38 starting migration of CT 104 to node 'node10' (xx.xx.xx.xx)
Jun 12 16:18:38 starting rsync phase 1
Jun 12 16:18:38 # /usr/bin/rsync -aH --delete --numeric-ids --sparse /var/lib/vz/private/104 root@xx.xx.xx.xx:/var/lib/vz/private
Jun 12 16:19:28 dump 2nd level quota
Jun 12 16:19:28 copy 2nd level quota to target node
Jun 12 16:19:29 initialize container on remote node 'node10'
Jun 12 16:19:29 initializing remote quota
Jun 12 16:19:29 # /usr/bin/ssh -c blowfish -o 'BatchMode=yes' root@xx.xx.xx.xx vzctl quotainit 104
Jun 12 16:19:29 ERROR: Failed to initialize quota: Container config file does not exist
Jun 12 16:19:29 removing container files on local node
Jun 12 16:19:30 start final cleanup
Jun 12 16:19:30 ERROR: migration finished with problems (duration 00:00:52)
TASK ERROR: migration problems

Search

Search

Proxmox 2 cluster problem

Erwin123

Member

Erwin123

Member