Live Migration Fails - ProxMox VE 2.2

woblit

New Member
Nov 14, 2012
3
0
1
Hi All,

Hopefully you can help. Just installed 2 new Proxmox VE 2.2 servers. Using the E5 processors - each with 64GB memory. I have set them up talking to a Ceph cluster which I have been running with for sometime. I started with one box, which was using Ceph perfectly. VMs would create and run without a problem. I then added the second server into the cluster etc, and am able to create VMs on this server and migrate a running VM from the original server to the second one (same setup) but cannot migrate it back from the second server to the original one. If I stop the VM on the second server and choose an offline migration this works without a problem. The VM will run on the second server without a problem, so I know that my shared storage (Ceph) is working perfectly from both systems.

These are the errors I get from the migration process:

Nov 14 15:07:12 starting migration of VM 101 to node 'ihv1' (192.168.0.1)
Nov 14 15:07:12 copying disk images
Nov 14 15:07:12 starting VM 101 on remote node 'ihv1'
Nov 14 15:07:13 ERROR: online migrate failure - command '/usr/bin/ssh -c blowfish -o 'BatchMode=yes' root@192.168.0.1 qm start 101 --stateuri tcp --skiplock --migratedfrom ihv2' failed: exit code 255
Nov 14 15:07:13 aborting phase 2 - cleanup resources
Nov 14 15:07:13 ERROR: migration finished with problems (duration 00:00:03)
TASK ERROR: migration problems

The other error I get when the original server tries to start the machine is:

TASK ERROR: start failed: command '/usr/bin/kvm -id 101 -chardev 'socket,id=qmp,path=/var/run/qemu-server/101.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -vnc unix:/var/run/qemu-server/101.vnc,x509,password -pidfile /var/run/qemu-server/101.pid -daemonize -name test.tester -smp 'sockets=1,cores=4' -cpu host -nodefaults -boot 'menu=on' -vga cirrus -k en-gb -m 768 -cpuunits 1000 -usbdevice tablet -drive 'if=none,id=drive-ide2,media=cdrom,aio=native' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -device 'ahci,id=ahci0,multifunction=on,bus=pci.0,addr=0x7' -drive 'file=rbd:rbd/vm-101-disk-1:id=admin:auth_supported=cephx\;none:keyring=/etc/pve/priv/ceph/CloudFlex.keyring:mon_host=192.168.10.10\:6789,if=none,id=drive-sata0,cache=writethrough,aio=native' -device 'ide-drive,bus=ahci0.0,drive=drive-sata0,id=sata0,bootindex=100' -netdev 'type=user,id=net0,hostname=test.tester' -device 'rtl8139,mac=6E:0C:BE:2B:42:6F,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -incoming tcp:localhost:60000 -S' failed: exit code 1

Hopefully someone can offer advice. Had a working 2.1 cluster before which seemed to be perfect.

Warren.


 
Do not use cpu=host

Hi Dietmar,

I tried that - and unfortunately gives the same error. Can do an online migration from Node 1 to Node 2. But then when trying to migrate back to Node 1 it comes up with the error as shown.

Looking forward to hearing back from you.

Warren.
 
same problem here, when I try to migrate I get:


Nov 16 12:44:34 starting migration of VM 114 to node 'fmckvm100' (10.12.4.100)
Nov 16 12:44:34 copying disk images
Nov 16 12:44:34 starting VM 114 on remote node 'fmckvm100'
Nov 16 12:44:36 starting migration tunnel
Nov 16 12:44:37 starting online/live migration on port 60000
Nov 16 12:44:39 ERROR: online migrate failure - aborting
Nov 16 12:44:39 aborting phase 2 - cleanup resources
Nov 16 12:44:40 ERROR: migration finished with problems (duration 00:00:06)
TASK ERROR: migration problems


root@fmckvm100:~# pveversion --verbose
pve-manager: 2.2-30 (pve-manager/2.2/d3818aa7)
running kernel: 2.6.32-16-pve
proxmox-ve-2.6.32: 2.2-82
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-16-pve: 2.6.32-82
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-32
qemu-server: 2.0-69
pve-firmware: 1.0-21
libpve-common-perl: 1.0-39
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.2-7
ksm-control-daemon: 1.1-1

root@proxmox104:~# pveversion --verbose
pve-manager: 2.2-30 (pve-manager/2.2/d3818aa7)
running kernel: 2.6.32-16-pve
proxmox-ve-2.6.32: 2.2-82
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-16-pve: 2.6.32-82
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-32
qemu-server: 2.0-69
pve-firmware: 1.0-21
libpve-common-perl: 1.0-39
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.2-7
ksm-control-daemon: 1.1-1


what could possibly be the reason?
 
same problem here, when I try to migrate I get:


Nov 16 12:44:34 starting migration of VM 114 to node 'fmckvm100' (10.12.4.100)
Nov 16 12:44:34 copying disk images
Nov 16 12:44:34 starting VM 114 on remote node 'fmckvm100'
Nov 16 12:44:36 starting migration tunnel
Nov 16 12:44:37 starting online/live migration on port 60000
Nov 16 12:44:39 ERROR: online migrate failure - aborting
Nov 16 12:44:39 aborting phase 2 - cleanup resources
Nov 16 12:44:40 ERROR: migration finished with problems (duration 00:00:06)
TASK ERROR: migration problems


root@fmckvm100:~# pveversion --verbose
pve-manager: 2.2-30 (pve-manager/2.2/d3818aa7)
running kernel: 2.6.32-16-pve
proxmox-ve-2.6.32: 2.2-82
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-16-pve: 2.6.32-82
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-32
qemu-server: 2.0-69
pve-firmware: 1.0-21
libpve-common-perl: 1.0-39
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.2-7
ksm-control-daemon: 1.1-1

root@proxmox104:~# pveversion --verbose
pve-manager: 2.2-30 (pve-manager/2.2/d3818aa7)
running kernel: 2.6.32-16-pve
proxmox-ve-2.6.32: 2.2-82
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-16-pve: 2.6.32-82
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-32
qemu-server: 2.0-69
pve-firmware: 1.0-21
libpve-common-perl: 1.0-39
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.2-7
ksm-control-daemon: 1.1-1


what could possibly be the reason?


Hi.

I dont know whether this will help you - but after A LOT of trial and error I corrected my issue. Basically, the problem I was experiencing was that I had a CEPH.CONF file in my /root of my hypervisor. This seemed to conflict with the migration process somehow. In any event, as soon as I removed this config file (which was not on Node 1 but was on Node 2) my problems went away and I was able to do online migrations.

Only problem I am having now is that backups don't work for Ceph based storage. But then again - its all still new and experimental.

Hope your problems get fixed.

Warren.
 
I dont have any ceph.conf


Can I just switch to virtio disks just by editing the files under /etc/pve/qemu-server/ and rebooting the guest?