Live Migration Fails - ProxMox VE 2.2

woblit

New Member
Nov 14, 2012
3
0
1
Hi All,

Hopefully you can help. Just installed 2 new Proxmox VE 2.2 servers. Using the E5 processors - each with 64GB memory. I have set them up talking to a Ceph cluster which I have been running with for sometime. I started with one box, which was using Ceph perfectly. VMs would create and run without a problem. I then added the second server into the cluster etc, and am able to create VMs on this server and migrate a running VM from the original server to the second one (same setup) but cannot migrate it back from the second server to the original one. If I stop the VM on the second server and choose an offline migration this works without a problem. The VM will run on the second server without a problem, so I know that my shared storage (Ceph) is working perfectly from both systems.

These are the errors I get from the migration process:

Nov 14 15:07:12 starting migration of VM 101 to node 'ihv1' (192.168.0.1)
Nov 14 15:07:12 copying disk images
Nov 14 15:07:12 starting VM 101 on remote node 'ihv1'
Nov 14 15:07:13 ERROR: online migrate failure - command '/usr/bin/ssh -c blowfish -o 'BatchMode=yes' root@192.168.0.1 qm start 101 --stateuri tcp --skiplock --migratedfrom ihv2' failed: exit code 255
Nov 14 15:07:13 aborting phase 2 - cleanup resources
Nov 14 15:07:13 ERROR: migration finished with problems (duration 00:00:03)
TASK ERROR: migration problems

The other error I get when the original server tries to start the machine is:

TASK ERROR: start failed: command '/usr/bin/kvm -id 101 -chardev 'socket,id=qmp,path=/var/run/qemu-server/101.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -vnc unix:/var/run/qemu-server/101.vnc,x509,password -pidfile /var/run/qemu-server/101.pid -daemonize -name test.tester -smp 'sockets=1,cores=4' -cpu host -nodefaults -boot 'menu=on' -vga cirrus -k en-gb -m 768 -cpuunits 1000 -usbdevice tablet -drive 'if=none,id=drive-ide2,media=cdrom,aio=native' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -device 'ahci,id=ahci0,multifunction=on,bus=pci.0,addr=0x7' -drive 'file=rbd:rbd/vm-101-disk-1:id=admin:auth_supported=cephx\;none:keyring=/etc/pve/priv/ceph/CloudFlex.keyring:mon_host=192.168.10.10\:6789,if=none,id=drive-sata0,cache=writethrough,aio=native' -device 'ide-drive,bus=ahci0.0,drive=drive-sata0,id=sata0,bootindex=100' -netdev 'type=user,id=net0,hostname=test.tester' -device 'rtl8139,mac=6E:0C:BE:2B:42:6F,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -incoming tcp:localhost:60000 -S' failed: exit code 1

Hopefully someone can offer advice. Had a working 2.1 cluster before which seemed to be perfect.

Warren.


 
Do not use cpu=host

Hi Dietmar,

I tried that - and unfortunately gives the same error. Can do an online migration from Node 1 to Node 2. But then when trying to migrate back to Node 1 it comes up with the error as shown.

Looking forward to hearing back from you.

Warren.
 
same problem here, when I try to migrate I get:


Nov 16 12:44:34 starting migration of VM 114 to node 'fmckvm100' (10.12.4.100)
Nov 16 12:44:34 copying disk images
Nov 16 12:44:34 starting VM 114 on remote node 'fmckvm100'
Nov 16 12:44:36 starting migration tunnel
Nov 16 12:44:37 starting online/live migration on port 60000
Nov 16 12:44:39 ERROR: online migrate failure - aborting
Nov 16 12:44:39 aborting phase 2 - cleanup resources
Nov 16 12:44:40 ERROR: migration finished with problems (duration 00:00:06)
TASK ERROR: migration problems


root@fmckvm100:~# pveversion --verbose
pve-manager: 2.2-30 (pve-manager/2.2/d3818aa7)
running kernel: 2.6.32-16-pve
proxmox-ve-2.6.32: 2.2-82
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-16-pve: 2.6.32-82
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-32
qemu-server: 2.0-69
pve-firmware: 1.0-21
libpve-common-perl: 1.0-39
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.2-7
ksm-control-daemon: 1.1-1

root@proxmox104:~# pveversion --verbose
pve-manager: 2.2-30 (pve-manager/2.2/d3818aa7)
running kernel: 2.6.32-16-pve
proxmox-ve-2.6.32: 2.2-82
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-16-pve: 2.6.32-82
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-32
qemu-server: 2.0-69
pve-firmware: 1.0-21
libpve-common-perl: 1.0-39
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.2-7
ksm-control-daemon: 1.1-1


what could possibly be the reason?
 
same problem here, when I try to migrate I get:


Nov 16 12:44:34 starting migration of VM 114 to node 'fmckvm100' (10.12.4.100)
Nov 16 12:44:34 copying disk images
Nov 16 12:44:34 starting VM 114 on remote node 'fmckvm100'
Nov 16 12:44:36 starting migration tunnel
Nov 16 12:44:37 starting online/live migration on port 60000
Nov 16 12:44:39 ERROR: online migrate failure - aborting
Nov 16 12:44:39 aborting phase 2 - cleanup resources
Nov 16 12:44:40 ERROR: migration finished with problems (duration 00:00:06)
TASK ERROR: migration problems


root@fmckvm100:~# pveversion --verbose
pve-manager: 2.2-30 (pve-manager/2.2/d3818aa7)
running kernel: 2.6.32-16-pve
proxmox-ve-2.6.32: 2.2-82
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-16-pve: 2.6.32-82
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-32
qemu-server: 2.0-69
pve-firmware: 1.0-21
libpve-common-perl: 1.0-39
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.2-7
ksm-control-daemon: 1.1-1

root@proxmox104:~# pveversion --verbose
pve-manager: 2.2-30 (pve-manager/2.2/d3818aa7)
running kernel: 2.6.32-16-pve
proxmox-ve-2.6.32: 2.2-82
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-16-pve: 2.6.32-82
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-32
qemu-server: 2.0-69
pve-firmware: 1.0-21
libpve-common-perl: 1.0-39
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.2-7
ksm-control-daemon: 1.1-1


what could possibly be the reason?


Hi.

I dont know whether this will help you - but after A LOT of trial and error I corrected my issue. Basically, the problem I was experiencing was that I had a CEPH.CONF file in my /root of my hypervisor. This seemed to conflict with the migration process somehow. In any event, as soon as I removed this config file (which was not on Node 1 but was on Node 2) my problems went away and I was able to do online migrations.

Only problem I am having now is that backups don't work for Ceph based storage. But then again - its all still new and experimental.

Hope your problems get fixed.

Warren.
 
I dont have any ceph.conf


Can I just switch to virtio disks just by editing the files under /etc/pve/qemu-server/ and rebooting the guest?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!