Proxmox 2 cluster problem

Erwin123

Member
May 14, 2008
207
1
16
Disclaimer: I know I will get the standard answer 'Please update to the latest version' but PLEASE don't.
We have a 1.9 cluster and a 2.1 cluster.
Keeping nodes up cutting edge to date with live containers on it is just to dangerous for us.

On the 2.1 cluster the second node (called node10) is red.
I could not create a CT on it and tried to migrate a CT from node9 (the master) to it.
It failed complaining Container config file does not exist but node9 deleted the CT on node 9.
Now node10 shows the container in the interface but if you select it it says it does not exist.
I cannot start it and cannot delete it.

I tried service pvestatd restart but this did not change anything.

How do I get the non existing (apperently only exists in the admin interface) container out of the interface?
How do I get node10 to play nice with node9 in their cluster?

root@node10:/etc/pve/nodes/node10/openvz# cat /etc/pve/.members
{
"nodename": "node10",
"version": 6,
"cluster": { "name": "cluster2", "version": 2, "nodes": 2, "quorate": 1 },
"nodelist": {
"node9": { "id": 1, "online": 1, "ip": "xx.x.xx.xx"},
"node10": { "id": 2, "online": 1, "ip": "xx.x.xx.xx"}
}
}

root@node10:/etc/pve/nodes/node10/openvz# service cman status
cluster is running.

root@node10:/etc/pve/nodes/node10/openvz# pveversion -v
pve-manager: 2.1-1 (pve-manager/2.1/f9b0f63a)
running kernel: 2.6.32-11-pve
proxmox-ve-2.6.32: 2.0-66
pve-kernel-2.6.32-11-pve: 2.6.32-66
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.8-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.7-2
pve-cluster: 1.0-26
qemu-server: 2.0-39
pve-firmware: 1.0-15
libpve-common-perl: 1.0-27
libpve-access-control: 1.0-21
libpve-storage-perl: 2.0-18
vncterm: 1.0-2
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-9
ksm-control-daemon: 1.1-1

Both nodes are running the same version...

Edit: I also see this error: Jun 12 16:51:18 node10 pmxcfs[16472]: [status] crit: cpg_send_message failed: 12
 
Last edited:
I saw when I login to the interface of node10 directly it does not show the deleted container.

This is the error message I got when migrating, scary that it deleted everything although there was a error:

Jun 12 16:18:38 starting migration of CT 104 to node 'node10' (xx.xx.xx.xx)
Jun 12 16:18:38 starting rsync phase 1
Jun 12 16:18:38 # /usr/bin/rsync -aH --delete --numeric-ids --sparse /var/lib/vz/private/104 root@xx.xx.xx.xx:/var/lib/vz/private
Jun 12 16:19:28 dump 2nd level quota
Jun 12 16:19:28 copy 2nd level quota to target node
Jun 12 16:19:29 initialize container on remote node 'node10'
Jun 12 16:19:29 initializing remote quota
Jun 12 16:19:29 # /usr/bin/ssh -c blowfish -o 'BatchMode=yes' root@xx.xx.xx.xx vzctl quotainit 104
Jun 12 16:19:29 ERROR: Failed to initialize quota: Container config file does not exist
Jun 12 16:19:29 removing container files on local node
Jun 12 16:19:30 start final cleanup
Jun 12 16:19:30 ERROR: migration finished with problems (duration 00:00:52)
TASK ERROR: migration problems
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!