Live migration issues 4.1 + Ceph

cloudguy

Renowned Member
Jan 4, 2012
44
0
71
Hello,

Ever since my upgrade to 4.1 and migrating my VMs to my ceph cluster, I'm noticing issues with live migration. My ceph cluster is nominal, available and operational, all nodes are able to see the rbd pool without issues. However VMs fail to migration with the following log entires:

Code:
You do not have a valid subscription for this server. Please visit www.proxmox.com to get a list of available options.
Jan 10 22:03:15 starting migration of VM 80010 to node 'b02vm14' (10.20.9.14)
Jan 10 22:03:15 copying disk images
Jan 10 22:03:15 starting VM 80010 on remote node 'b02vm14'
Jan 10 22:03:18 start failed: command '/usr/bin/systemd-run --scope --slice qemu --unit 80010 -p 'KillMode=none' -p 'CPUShares=1000' /usr/bin/kvm -id 80010 -chardev 'socket,id=qmp,path=/var/run/qemu-server/80010.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -vnc unix:/var/run/qemu-server/80010.vnc,x509,password -pidfile /var/run/qemu-server/80010.pid -daemonize -smbios 'type=1,uuid=790393c6-404c-44dd-aa68-b0864c5cdfcb' -name fs01.erbus.kupsta.net -smp '2,sockets=1,cores=2,maxcpus=2' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000' -vga cirrus -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 16384 -k en-us -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:2c713a6ec95' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive 'file=rbd:vmrbd-T2/vm-80010-disk-1:mon_host=10.20.10.14, 10.20.10.16, 10.20.10.18:id=admin:auth_supported=cephx:keyring=/etc/pve/priv/ceph/vmrbd-T2.keyring,if=none,id=drive-virtio0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap80010i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=12:22:5C:4F:DC:E1,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -netdev 'type=tap,id=net1,ifname=tap80010i1,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=CE:7A:8D:21:B3:64,netdev=net1,bus=pci.0,addr=0x13,id=net1,bootindex=301' -incoming tcp:localhost:60000 -S' failed: exit code 1
Jan 10 22:03:18 ERROR: online migrate failure - command '/usr/bin/ssh -o 'BatchMode=yes' root@10.20.9.14 qm start 80010 --stateuri tcp --skiplock --migratedfrom b02vm18' failed: exit code 255
Jan 10 22:03:18 aborting phase 2 - cleanup resources
Jan 10 22:03:18 migrate_cancel
Jan 10 22:03:19 ERROR: migration finished with problems (duration 00:00:04)
TASK ERROR: migration problems

Ceph status:

Code:
# ceph -s
    cluster 693ea4c1-1f95-a261-ab97-a767b0c0eae7
     health HEALTH_OK
     monmap e21: 6 mons at {b02s08=10.20.10.8:6789/0,b02s12=10.20.10.12:6789/0,b02vm14=10.20.10.14:6789/0,b02vm16=10.20.10.16:6789/0,b02vm18=10.20.10.18:6789/0,smg01=10.20.10.250:6789/0}
            election epoch 7124, quorum 0,1,2,3,4,5 b02s08,b02s12,b02vm14,b02vm16,b02vm18,smg01
     osdmap e61228: 40 osds: 40 up, 40 in
      pgmap v2281359: 5120 pgs, 5 pools, 23925 GB data, 6056 kobjects
            90513 GB used, 65142 GB / 152 TB avail
                5120 active+clean
  client io 7757 kB/s wr, 278 op/s

My version on all hosts is as follows:
Code:
# pveversion
pve-manager/4.1-2/78c5f4a2 (running kernel: 4.2.6-1-pve)
root@b02vm18:~# pveversion -v
proxmox-ve: 4.1-28 (running kernel: 4.2.6-1-pve)
pve-manager: 4.1-2 (running version: 4.1-2/78c5f4a2)
pve-kernel-4.2.6-1-pve: 4.2.6-28
pve-kernel-4.2.2-1-pve: 4.2.2-16
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 0.17.2-1
pve-cluster: 4.0-29
qemu-server: 4.0-42
pve-firmware: 1.1-7
libpve-common-perl: 4.0-42
libpve-access-control: 4.0-10
libpve-storage-perl: 4.0-38
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-18
pve-container: 1.0-35
pve-firewall: 2.0-14
pve-ha-manager: 1.0-16
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-5
lxcfs: 0.13-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve6~jessie

Can some please point me in the right direction?

Thank you.
 
@spirit - yes I'm able to start VMs on other nodes without issues.
@debi@n - I'm running identical versions across 3 nodes..

@spirit -- I just checked and I'm UNABLE to start on other nodes (offline migration successful). Had a closer look and noticed that KRBD was disabled. I enabled and everything seems to be working fine now. Is there a reason why KRBD needs to be enabled? I've read that there are performance issues when compared with libceph..

The error message with KRBD disabled:

Code:
Running as unit 8007.scope.
libust[32092/32092]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:375)
libust[32095/32095]: Error: Error cancelling global ust listener thread: No such process (in lttng_ust_exit() at lttng-ust-comm.c:1592)
kvm: -drive file=rbd:vmrbd-T2/vm-8007-disk-1:mon_host=10.20.10.14, 10.20.10.16, 10.20.10.18:id=admin:auth_supported=cephx:keyring=/etc/pve/priv/ceph/vmrbd-T2.keyring,if=none,id=drive-virtio0,format=raw,cache=none,aio=native,detect-zeroes=on: Block format 'raw' used by device 'drive-virtio0' doesn't support the option ' 10.20.10.16'
libust[32096/32096]: Error: Error cancelling global ust listener thread: No such process (in lttng_ust_exit() at lttng-ust-comm.c:1592)
TASK ERROR: start failed: command '/usr/bin/systemd-run --scope --slice qemu --unit 8007 -p 'KillMode=none' -p 'CPUShares=1000' /usr/bin/kvm -id 8007 -chardev 'socket,id=qmp,path=/var/run/qemu-server/8007.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -vnc unix:/var/run/qemu-server/8007.vnc,x509,password -pidfile /var/run/qemu-server/8007.pid -daemonize -smbios 'type=1,uuid=66591b23-3f46-4ed1-9d32-d90d41a95448' -name forex01.erbus.kupsta.net -smp '2,sockets=1,cores=2,maxcpus=2' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000' -vga std -no-hpet -cpu 'kvm64,hv_spinlocks=0x1fff,hv_vapic,hv_time,hv_relaxed,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce' -m 4096 -k en-us -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:2c713a6ec95' -drive 'file=rbd:vmrbd-T2/vm-8007-disk-1:mon_host=10.20.10.14, 10.20.10.16, 10.20.10.18:id=admin:auth_supported=cephx:keyring=/etc/pve/priv/ceph/vmrbd-T2.keyring,if=none,id=drive-virtio0,format=raw,cache=none,aio=native,detect-zeroes=on' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' -drive 'if=none,id=drive-ide0,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=200' -netdev 'type=tap,id=net0,ifname=tap8007i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=AA:32:76:D0:33:3C,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -rtc 'driftfix=slew,base=localtime' -global 'kvm-pit.lost_tick_policy=discard'' failed: exit code 1
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!