I've setup a 3 node DRBD cluster server names vm1, vm2 and vm3.
I created DRBD storage with replication set to 2:
I created a DRBD disk for VM 110, The disk is created and is using servers VM1 and VM2 for storage:
Now I try to live migrate this vm to server vm3, but it fails:
Now the status of this DRBD disk has changed so that VM3 is diskless.
Clearly the failed migration did something to cause this diskless resource to get created but was unable to utilize it for some unknown reason.
Sure diskless is not ideal but no reason it should not work:
Next I try to live migrate to VM3 a second time and it works just fine:
DRBD status is correct, vm3 is diskless and primary:
I think Proxmox should deal with this situation more gracefully.
Anyone have any opinions on how Proxmox should behave in this situation?
I created DRBD storage with replication set to 2:
Code:
drbd: drbd2
redundancy 2
content images,rootdir
I created a DRBD disk for VM 110, The disk is created and is using servers VM1 and VM2 for storage:
Code:
root@vm2:/# drbdadm status
...
vm-110-disk-1 role:Primary
disk:UpToDate
vm1 role:Secondary
peer-disk:UpToDate
Now I try to live migrate this vm to server vm3, but it fails:
Code:
Apr 05 14:40:16 starting migration of VM 110 to node 'vm3' (192.168.4.3)
Apr 05 14:40:16 copying disk images
Apr 05 14:40:16 starting VM 110 on remote node 'vm3'
Apr 05 14:40:19 start failed: command '/usr/bin/systemd-run --scope --slice qemu --unit 110 -p 'KillMode=none' -p 'CPUShares=1000' /usr/bin/kvm -id 110 -chardev 'socket,id=qmp,path=/var/run/qemu-server/110.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/110.pid -daemonize -name vm3-zabbix.inhouse.kmionline.com -smp '2,sockets=1,cores=2,maxcpus=2' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000' -vga qxl -vnc unix:/var/run/qemu-server/110.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 2048 -k en-us -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -spice 'tls-port=61000,addr=localhost,tls-ciphers=DES-CBC3-SHA,seamless-migration=on' -device 'virtio-serial,id=spice,bus=pci.0,addr=0x9' -chardev 'spicevmc,id=vdagent,name=vdagent' -device 'virtserialport,chardev=vdagent,name=com.redhat.spice.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:7ae7e697ca8' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive 'file=/dev/drbd/by-res/vm-110-disk-1/0,if=none,id=drive-virtio0,cache=none,discard=on,format=raw,aio=native,detect-zeroes=unmap' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap110i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' -device 'e1000,mac=4E:44:84:BE:DE:9A,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -netdev 'type=tap,id=net1,ifname=tap110i1,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' -device 'e1000,mac=9A:0C:64:12:3C:E5,netdev=net1,bus=pci.0,addr=0x13,id=net1,bootindex=301' -machine 'type=pc-i440fx-2.5' -incoming tcp:localhost:60000 -S' failed: exit code 1
Apr 05 14:40:19 ERROR: online migrate failure - command '/usr/bin/ssh -o 'BatchMode=yes' root@192.168.4.3 qm start 110 --stateuri tcp --skiplock --migratedfrom vm2 --machine pc-i440fx-2.5' failed: exit code 255
Apr 05 14:40:19 aborting phase 2 - cleanup resources
Apr 05 14:40:19 migrate_cancel
Apr 05 14:40:20 ERROR: migration finished with problems (duration 00:00:04)
TASK ERROR: migration problems
Now the status of this DRBD disk has changed so that VM3 is diskless.
Clearly the failed migration did something to cause this diskless resource to get created but was unable to utilize it for some unknown reason.
Sure diskless is not ideal but no reason it should not work:
Code:
root@vm2:/# drbdadm status
...
vm-110-disk-1 role:Primary
disk:UpToDate
vm1 role:Secondary congested:yes
peer-disk:UpToDate
vm3 role:Secondary
peer-disk:Diskless
Next I try to live migrate to VM3 a second time and it works just fine:
Code:
Apr 05 14:44:41 starting migration of VM 110 to node 'vm3' (192.168.4.3)
Apr 05 14:44:41 copying disk images
Apr 05 14:44:41 starting VM 110 on remote node 'vm3'
Apr 05 14:44:43 starting ssh migration tunnel
bind: Cannot assign requested address
Apr 05 14:44:43 starting online/live migration on localhost:60000
Apr 05 14:44:43 migrate_set_speed: 8589934592
Apr 05 14:44:43 migrate_set_downtime: 0.1
Apr 05 14:44:43 spice client_migrate_info
Apr 05 14:44:45 migration status: active (transferred 168883099, remaining 2060636160), total 2282831872)
Apr 05 14:44:45 migration xbzrle cachesize: 134217728 transferred 0 pages 0 cachemiss 0 overflow 0
..........
Apr 05 14:45:10 migration speed: 75.85 MB/s - downtime 10 ms
Apr 05 14:45:10 migration status: completed
Apr 05 14:45:11 Waiting for spice server migration
Apr 05 14:45:13 migration finished successfully (duration 00:00:32)
TASK OK
DRBD status is correct, vm3 is diskless and primary:
Code:
root@vm2:/# drbdadm status
...
vm-110-disk-1 role:Secondary
disk:UpToDate
vm1 role:Secondary
peer-disk:UpToDate
vm3 role:Primary
peer-disk:Diskless
I think Proxmox should deal with this situation more gracefully.
Anyone have any opinions on how Proxmox should behave in this situation?