Live migration of linstor-controller VM failed

tawh

Active Member
Mar 26, 2019
23
0
41
34
I acknowledged that linstor is NOT supported by Proxmox but Linbit. However, I posted here as to ask for users' assistant on the linstor-controller issue.

I have a 3-node proxmox cluster with each node configured with a drbd storage. The drbd is controlled by linstor-controller which is installed on a debian VM, I followed https://www.linbit.com/en/linstor-controller-proxmox/ and https://medium.com/@kvaps/deploying-linstor-with-proxmox-91c746b4035d to configure the HA for linstor-controller VM (not LXC), I cloned the disk storage of the linstor-controller VM to the drbd disk pool, I could successfully start the VM and could resume the controller function.

I also setup a debian VM and Windows 2016 VM with storage-pool using the same drbd disk pool. For these two guest, I can perform live-migration (i.e. without guest reboot, downtime) successfully between 3 proxmox servers.

However, I cannot perform the live-migration for the linstor-controller, when I triggered the "Migrate" button and selected the migrating node, it waits for several seconds and returns error:

On Running node (A):
Code:
2019-07-06 14:06:13 starting migration of VM 100 to node 'B' (XX.XX.XX.XX)
2019-07-06 14:06:13 copying disk images
2019-07-06 14:06:13 starting VM 100 on remote node 'B'
2019-07-06 14:06:14 Plugin "PVE::Storage::Custom::LINSTORPlugin" is implementing an older storage API, an upgrade is recommended
2019-07-06 14:06:17 start failed: command '/usr/bin/kvm -id 100 -name vmc1-san -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/100.pid -daemonize -smbios 'type=1,uuid=c1ff005b-f1df-446c-83bc-0c6d5278b46b' -smp '1,sockets=1,cores=1,maxcpus=1' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc unix:/var/run/qemu-server/100.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 512 -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'vmgenid,guid=948ebde9-70ea-4c86-9aa2-1094893f6a45' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -chardev 'socket,id=serial0,path=/var/run/qemu-server/100.serial0,server,nowait' -device 'isa-serial,chardev=serial0' -device 'VGA,id=vga,bus=pci.0,addr=0x2' -chardev 'socket,path=/var/run/qemu-server/100.qga,server,nowait,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:83974f2eb7bf' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/drbd/by-res/vm-100-disk-1/0,if=none,id=drive-scsi0,cache=writethrough,format=raw,aio=threads,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=92:9D:9A:11:13:F0,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -machine 'type=pc-i440fx-3.0' -incoming unix:/run/qemu-server/100.migrate -S' failed: exit code 1
2019-07-06 14:06:17 ERROR: online migrate failure - command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=vm7' root@192.168.159.26 qm start 100 --skiplock --migratedfrom vm5 --migration_type secure --stateuri unix --machine pc-i440fx-3.0' failed: exit code 255
2019-07-06 14:06:17 aborting phase 2 - cleanup resources
2019-07-06 14:06:17 migrate_cancel
2019-07-06 14:06:18 ERROR: migration finished with problems (duration 00:00:05)
TASK ERROR: migration problems

On Receiving Node (B):
Code:
 kvm: -drive file=/dev/drbd/by-res/vm-100-disk-1/0,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on: Could not open '/dev/drbd/by-res/vm-100-disk-1/0': Read-only file system

Surprisingly, I can perform migration when the linstor-controller is in STOP state.

Is this a restriction for the linstor controller that it cannot perform live migration?
 
Last edited:
But surprisingly, I can perform migration if the linstor-controller is in STOP state.

Is this a restriction for the linstor controller that it cannot perform live migration?
Did you already ask Linbit? I think they can gave you a better answer.
 
Did you already ask Linbit? I think they can gave you a better answer.

It is because they lack community support (only mail-list and IRC), thus I ask here if there is any member can share their experience.
 
You can't migrate the linstor controller, because when the vm pauses for the final step, proxmox can't contact the storage controller for the primary/secondary changes.
So you can't install it on a vm managed by linstor itself.
Linbit provided a tutorial for creating a drbd resource not managed by linstor, for the controller VM
Personally, I did install the controller on a proxmox node and I bakcup the linstor controller db. If it fails, the VMs will continue to run but proxmox can't manage the storage. So there is no hurry to restore the controller, I think
 
  • Like
Reactions: tawh
You can't migrate the linstor controller, because when the vm pauses for the final step, proxmox can't contact the storage controller for the primary/secondary changes.
So you can't install it on a vm managed by linstor itself.
Linbit provided a tutorial for creating a drbd resource not managed by linstor, for the controller VM
Personally, I did install the controller on a proxmox node and I bakcup the linstor controller db. If it fails, the VMs will continue to run but proxmox can't manage the storage. So there is no hurry to restore the controller, I think

Thanks for your explanation. Could you please post the link for the tutorial?
I have converted the VM to LXC for migration. The drawback is the container requires restarting during migration, and the down time during the migration is under a minute.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!