Online Migration Failure

philister

Member
Jun 18, 2012
31
0
6
Hello all,

I have a three node cluster with iSCSI backed LVM storage for my VMs. Online migration works fine for all but one VM. The error given is

Dec 03 13:23:20 starting migration of VM 111 to node 'pmx0' (10.0.91.200)
Dec 03 13:23:20 copying disk images
Dec 03 13:23:20 starting VM 111 on remote node 'pmx0'
Dec 03 13:23:21 ERROR: online migrate failure - command '/usr/bin/ssh -o 'BatchMode=yes' root@10.0.91.200 qm start 111 --stateuri tcp --skiplock --migratedfrom pmx3' failed: exit code 255
Dec 03 13:23:21 aborting phase 2 - cleanup resources
Dec 03 13:23:22 ERROR: migration finished with problems (duration 00:00:02)
TASK ERROR: migration problems

Has anybody a clue what that could mean?

Thank you very much.
 
you run the same version on all nodes? check with 'pveversion -v'.
 
post your:


  • /etc/pve/qemu-server/111.conf
  • /etc/pve/storage.cfg
  • pveversion -v
 
bootdisk: ide0
cores: 2
ide0: san1-vdisk001:vm-111-disk-1
ide2: none,media=cdrom
memory: 6144
name: win3
net0: rtl8139=CA:E9:58:05:12:36,bridge=vmbr0
ostype: win7
sockets: 1




dir: local
path /var/lib/vz
content images,iso,vztmpl,rootdir
maxfiles 0

nfs: NFS01
path /mnt/pve/NFS01
server 10.0.90.254
export /vol/esx_ds0
options vers=3
content images,iso,backup
maxfiles 2

lvm: san1-vdisk001
vgname san1-vdisk001
shared
content images




pve-manager: 2.2-30 (pve-manager/2.2/d3818aa7)
running kernel: 2.6.32-16-pve
proxmox-ve-2.6.32: 2.2-82
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-16-pve: 2.6.32-82
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-32
qemu-server: 2.0-69
pve-firmware: 1.0-21
libpve-common-perl: 1.0-39
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.2-7
ksm-control-daemon: 1.1-1




Thank you.
 
Hello,

I shut down the VM that wouldn't migrate online in order to migrate it offline. I successfully moved it from node 3 to node 0. Now, I can't start it anymore. Also, I cannot migrate it back to node 3. And also, I cannot make a backup in order to restore it into a clean configuration.

In all three cases (failing start, failing migration, failing backup) the error message given is:


can't activate LV '/dev/san1-vdisk001/vm-111-disk-1': device-mapper: create ioctl on san1--vdisk001-vm--111--disk--1 failed: Device or resource busy


Any help appreciated, this is our company's main Mailserver and I can't bring it up again. Thank you very much.
 
After rebooting two nodes of our three node cluster one after the other (the one I migrated from and the one I migrated to), I can now start the VM again. Haven't dared to try and migrate it yet ...

I can't believe such a dead-lock of a LV can't be sorted out without rebooting. I googled a lot, but couldn't find anything useful.