[SOLVED] live migration not working after upgrade 5.1->5.3

Knuuut

Member
Jun 7, 2018
91
9
8
58
Hello Community,
after upgrading one node from 5.1.to 5.3 in a 5 node cluster, I can't do live migration of VMs. Now I'm stuck and in a bad situation, because I want to do a node by node upgrade of the cluster and I can't get the other nodes free...
Here is the migration log of migration from 5.1 to 5.3 node

Code:
2019-03-13 14:03:50 use dedicated network address for sending migration traffic (x.x.x.x)
2019-03-13 14:03:51 starting migration of VM 256 to node '1901' (x.x.x.x)
2019-03-13 14:03:51 copying disk images
2019-03-13 14:03:51 starting VM 256 on remote node '1901'
2019-03-13 14:03:53 start remote tunnel
2019-03-13 14:03:53 ssh tunnel ver 1
2019-03-13 14:03:53 starting online/live migration on tcp:x.x.x.x:60000
2019-03-13 14:03:53 migrate_set_speed: 8589934592
2019-03-13 14:03:53 migrate_set_downtime: 0.1
2019-03-13 14:03:53 set migration_caps
2019-03-13 14:03:53 set cachesize: 1717986918
2019-03-13 14:03:53 start migrate command to tcp:x.x.x.x:60000
2019-03-13 14:03:54 migration status: active (transferred 750986917, remaining 14287990784), total 17197506560)
2019-03-13 14:03:54 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-03-13 14:03:55 migration status: active (transferred 1294769337, remaining 11245088768), total 17197506560)
2019-03-13 14:03:55 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-03-13 14:03:56 migration status: active (transferred 2097557584, remaining 7959379968), total 17197506560)
2019-03-13 14:03:56 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-03-13 14:03:57 migration status: active (transferred 2858109376, remaining 6582280192), total 17197506560)
2019-03-13 14:03:57 migration xbzrle cachesize: 1073741824 transferred 0 pages 0 cachemiss 0 overflow 0
2019-03-13 14:03:58 migration speed: 3276.80 MB/s - downtime 14 ms
2019-03-13 14:03:58 migration status: completed
2019-03-13 14:03:58 ERROR: tunnel replied 'ERR: resume failed - VM 256 not running' to command 'resume 256'
2019-03-13 14:04:11 ERROR: migration finished with problems (duration 00:00:21)
TASK ERROR: migration problems

The machine is migrated, but in off state.

Here is the migration log from 5.3 to 5.1 node:

Code:
2019-03-13 14:28:59 starting migration of VM 256 to node '1801' (x.x.x.x)
2019-03-13 14:28:59 copying disk images
2019-03-13 14:28:59 starting VM 256 on remote node '1801'
2019-03-13 14:29:00 start failed: command '/usr/bin/kvm -id 256 -chardev 'socket,id=qmp,path=/var/run/qemu-server/256.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/256.pid -daemonize -smbios 'type=1,uuid=5c3cb443-3136-4d7c-852b-fb4a9aefde16' -name pmm-1801 -smp '8,sockets=2,cores=4,maxcpus=8' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga std -vnc unix:/var/run/qemu-server/256.vnc,x509,password -cpu host,+kvm_pv_unhalt,+kvm_pv_eoi -m 16384 -object 'memory-backend-ram,id=ram-node0,size=8192M' -numa 'node,nodeid=0,cpus=0-3,memdev=ram-node0' -object 'memory-backend-ram,id=ram-node1,size=8192M' -numa 'node,nodeid=1,cpus=4-7,memdev=ram-node1' -k de -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:e0bda18eac6' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=rbd:Prod660/vm-256-disk-1:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/Prod660.keyring,if=none,id=drive-scsi0,discard=on,format=raw,cache=none,aio=native,detect-zeroes=unmap' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -drive 'file=rbd:Prod660/vm-256-disk-2:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/Prod660.keyring,if=none,id=drive-scsi1,discard=on,format=raw,cache=none,aio=native,detect-zeroes=unmap' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1' -netdev 'type=tap,id=net0,ifname=tap256i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=DE:AD:0A:1F:1E:0D,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -machine 'type=pc-i440fx-2.12' -incoming unix:/run/qemu-server/256.migrate -S' failed: exit code 1
2019-03-13 14:29:00 ERROR: online migrate failure - command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=1801' root@x.x.x.x qm start 256 --skiplock --migratedfrom pmve-1901 --migration_type secure --stateuri unix --machine pc-i440fx-2.12' failed: exit code 255
2019-03-13 14:29:00 aborting phase 2 - cleanup resources
2019-03-13 14:29:00 migrate_cancel
2019-03-13 14:29:01 ERROR: migration finished with problems (duration 00:00:02)
migration problems

The machine is not migrated, still running.

Any help and suggestions are welcome.

Cheer Knuuut
 
can you post your vm config?

of corse...

Code:
bootdisk: scsi0
cores: 4
cpu: host
hotplug: disk,network,usb
ide2: none,media=cdrom
memory: 16384
name: pmm-1801
net0: virtio=DE:AD:0A:1F:1E:0D,bridge=vmbr0
numa: 1
ostype: l26
scsi0: bkp-1901:256/vm-256-disk-0.raw,discard=on,size=40G
scsi1: bkp-1901:256/vm-256-disk-1.raw,discard=on,size=400G
scsihw: virtio-scsi-pci
smbios1: uuid=5c3cb443-3136-4d7c-852b-fb4a9aefde16
sockets: 2
 
but different mainboard manufactors (Intel/Supermicro)...
if they contain different microcode updates, this may very well be the problem, can you try with a test vm with another cpu type, e.g. kvm64 (ideally is it otherwise identical to your real vm)
 
kvm64 works... :eek:

All (>200) VMs have got the cpu type set to host.

Does this mean, the only option I've got is to alter the CPU type on every machine and do a stop/start cycle?
 
Does this mean, the only option I've got is to alter the CPU type on every machine and do a stop/start cycle?
you can try to fully upgrade the bios/firmware of all mainboards, but yes to migrate them away initally, there is not really another way i am afraid
but an alternative to using the gui is using the cli tools (e.g. pvesh /qm) to change the cpu type and star stopping them
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!