Hi all,
I have deployed a 2 node cluster to become familiar with the Proxmox2 HA features. Am having some issues running a live migration on an ovz container between cluster nodes using NFS shared storage - using both the GUI and commandline. Had a quick search through the forums here and beyond but couldn't find anything that scratched this particular itch.
The GUI gives me the following succinct tidbit upon migration attempt:
Which results in a "failed" state as follows (was in "started" state):
And produces the following in rgmanager.log
and the following user.log
I've been disabling and enabling the group in order to reset this "failed" state:
Attempting to use pvectl to run the migration via the CLI produces the same commandline and rgmanager.log output:
Stuff that may help in resolving:
cluster.conf
Versions
Could anyone point me in the right direction to resolve? Is there anything else I can provide to facilitate the discovery of my mistake?
I have deployed a 2 node cluster to become familiar with the Proxmox2 HA features. Am having some issues running a live migration on an ovz container between cluster nodes using NFS shared storage - using both the GUI and commandline. Had a quick search through the forums here and beyond but couldn't find anything that scratched this particular itch.
The GUI gives me the following succinct tidbit upon migration attempt:
Code:
Executing HA migrate for CT 109 to node vz2
Trying to migrate pvevm:109 to vz2...Temporary failure; try again
TASK ERROR: command 'clusvcadm -M pvevm:109 -m vz2' failed: exit code 255
Which results in a "failed" state as follows (was in "started" state):
Code:
root@vz1:~# clustat
Cluster Status for digipve @ Wed Sep 12 11:52:30 2012
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
vz1 1 Online, Local, rgmanager
vz2 2 Online, rgmanager
Service Name Owner (Last) State
------- ---- ----- ------ -----
pvevm:109 (vz1) failed
Code:
root@vz1:~# tail -n6 /var/log/cluster/rgmanager.log
Sep 12 11:42:54 rgmanager [pvevm] CT 109 is running
Sep 12 11:43:24 rgmanager [pvevm] CT 109 is running
Sep 12 11:43:34 rgmanager [pvevm] CT 109 is running
Sep 12 11:43:49 rgmanager Migrating pvevm:109 to vz2
Sep 12 11:43:50 rgmanager migrate on pvevm "109" returned 1 (generic error)
Sep 12 11:43:50 rgmanager Migration of pvevm:109 to vz2 failed; return code 1
and the following user.log
Code:
root@vz1:~# tail -n 3 /var/log/user.log
Sep 12 14:38:16 vz1 pvevm: <root@pam> starting task UPID:vz1:000E40D5:0AE4AEF2:50509048:vzmigrate:109:root@pam:
Sep 12 14:38:17 vz1 task UPID:vz1:000E40D5:0AE4AEF2:50509048:vzmigrate:109:root@pam:: migration aborted
Sep 12 14:38:17 vz1 pvevm: <root@pam> end task UPID:vz1:000E40D5:0AE4AEF2:50509048:vzmigrate:109:root@pam: migration aborted
I've been disabling and enabling the group in order to reset this "failed" state:
Code:
root@vz1:~# clusvcadm -d pvevm:109 && clusvcadm -e pvevm:109
Attempting to use pvectl to run the migration via the CLI produces the same commandline and rgmanager.log output:
Code:
root@vz1:~# pvectl migrate 109 vz2 --online
Executing HA migrate for CT 109 to node vz2
Trying to migrate pvevm:109 to vz2...Failure
command 'clusvcadm -M pvevm:109 -m vz2' failed: exit code 255
Stuff that may help in resolving:
cluster.conf
Code:
root@vz1:~# cat /etc/pve/cluster.conf
<?xml version="1.0"?>
<cluster config_version="7" name="digipve">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<fencedevices>
<fencedevice agent="fence_drac5" ipaddr="<ip>" login="root" module_name="SLOT-1" name="drac-cmc-blade1" passwd="<pass>" secure="1"/>
<fencedevice agent="fence_drac5" ipaddr="<ip>" login="root" module_name="SLOT-2" name="drac-cmc-blade2" passwd="<pass>" secure="1"/>
</fencedevices>
<clusternodes>
<clusternode name="vz1" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="drac-cmc-blade1"/>
</method>
</fence>
</clusternode>
<clusternode name="vz2" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="drac-cmc-blade2"/>
</method>
</fence>
</clusternode>
</clusternodes>
<rm>
<pvevm autostart="1" vmid="109"/>
</rm>
</cluster>
Versions
Code:
root@vz1:~# pveversion --verbose
pve-manager: 2.1-14 (pve-manager/2.1/f32f3f46)
running kernel: 2.6.32-11-pve
proxmox-ve-2.6.32: 2.0-66
pve-kernel-2.6.32-11-pve: 2.6.32-66
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.92-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.8-1
pve-cluster: 1.0-27
qemu-server: 2.0-39
pve-firmware: 1.0-18
libpve-common-perl: 1.0-30
libpve-access-control: 1.0-24
libpve-storage-perl: 2.0-31
vncterm: 1.0-3
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-9
ksm-control-daemon: 1.1-1
Could anyone point me in the right direction to resolve? Is there anything else I can provide to facilitate the discovery of my mistake?