DRBD9 live migration problems

e100

Renowned Member
Nov 6, 2010
1,268
46
88
Columbus, Ohio
ulbuilder.wordpress.com
I've setup a 3 node DRBD cluster server names vm1, vm2 and vm3.

I created DRBD storage with replication set to 2:
Code:
drbd: drbd2                   
        redundancy 2         
        content images,rootdir

I created a DRBD disk for VM 110, The disk is created and is using servers VM1 and VM2 for storage:
Code:
root@vm2:/# drbdadm status
...
vm-110-disk-1 role:Primary
  disk:UpToDate
  vm1 role:Secondary
    peer-disk:UpToDate

Now I try to live migrate this vm to server vm3, but it fails:
Code:
Apr 05 14:40:16 starting migration of VM 110 to node 'vm3' (192.168.4.3)
Apr 05 14:40:16 copying disk images
Apr 05 14:40:16 starting VM 110 on remote node 'vm3'
Apr 05 14:40:19 start failed: command '/usr/bin/systemd-run --scope --slice qemu --unit 110 -p 'KillMode=none' -p 'CPUShares=1000' /usr/bin/kvm -id 110 -chardev 'socket,id=qmp,path=/var/run/qemu-server/110.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/110.pid -daemonize -name vm3-zabbix.inhouse.kmionline.com -smp '2,sockets=1,cores=2,maxcpus=2' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000' -vga qxl -vnc unix:/var/run/qemu-server/110.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 2048 -k en-us -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -spice 'tls-port=61000,addr=localhost,tls-ciphers=DES-CBC3-SHA,seamless-migration=on' -device 'virtio-serial,id=spice,bus=pci.0,addr=0x9' -chardev 'spicevmc,id=vdagent,name=vdagent' -device 'virtserialport,chardev=vdagent,name=com.redhat.spice.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:7ae7e697ca8' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive 'file=/dev/drbd/by-res/vm-110-disk-1/0,if=none,id=drive-virtio0,cache=none,discard=on,format=raw,aio=native,detect-zeroes=unmap' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap110i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' -device 'e1000,mac=4E:44:84:BE:DE:9A,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -netdev 'type=tap,id=net1,ifname=tap110i1,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' -device 'e1000,mac=9A:0C:64:12:3C:E5,netdev=net1,bus=pci.0,addr=0x13,id=net1,bootindex=301' -machine 'type=pc-i440fx-2.5' -incoming tcp:localhost:60000 -S' failed: exit code 1
Apr 05 14:40:19 ERROR: online migrate failure - command '/usr/bin/ssh -o 'BatchMode=yes' root@192.168.4.3 qm start 110 --stateuri tcp --skiplock --migratedfrom vm2 --machine pc-i440fx-2.5' failed: exit code 255
Apr 05 14:40:19 aborting phase 2 - cleanup resources
Apr 05 14:40:19 migrate_cancel
Apr 05 14:40:20 ERROR: migration finished with problems (duration 00:00:04)
TASK ERROR: migration problems

Now the status of this DRBD disk has changed so that VM3 is diskless.
Clearly the failed migration did something to cause this diskless resource to get created but was unable to utilize it for some unknown reason.
Sure diskless is not ideal but no reason it should not work:
Code:
root@vm2:/# drbdadm status
...
vm-110-disk-1 role:Primary
  disk:UpToDate
  vm1 role:Secondary congested:yes
  peer-disk:UpToDate
  vm3 role:Secondary
  peer-disk:Diskless

Next I try to live migrate to VM3 a second time and it works just fine:
Code:
Apr 05 14:44:41 starting migration of VM 110 to node 'vm3' (192.168.4.3)
Apr 05 14:44:41 copying disk images
Apr 05 14:44:41 starting VM 110 on remote node 'vm3'
Apr 05 14:44:43 starting ssh migration tunnel
bind: Cannot assign requested address

Apr 05 14:44:43 starting online/live migration on localhost:60000
Apr 05 14:44:43 migrate_set_speed: 8589934592
Apr 05 14:44:43 migrate_set_downtime: 0.1
Apr 05 14:44:43 spice client_migrate_info
Apr 05 14:44:45 migration status: active (transferred 168883099, remaining 2060636160), total 2282831872)
Apr 05 14:44:45 migration xbzrle cachesize: 134217728 transferred 0 pages 0 cachemiss 0 overflow 0
..........
Apr 05 14:45:10 migration speed: 75.85 MB/s - downtime 10 ms
Apr 05 14:45:10 migration status: completed
Apr 05 14:45:11 Waiting for spice server migration
Apr 05 14:45:13 migration finished successfully (duration 00:00:32)
TASK OK

DRBD status is correct, vm3 is diskless and primary:
Code:
root@vm2:/# drbdadm status
...
vm-110-disk-1 role:Secondary
  disk:UpToDate
  vm1 role:Secondary
  peer-disk:UpToDate
  vm3 role:Primary
  peer-disk:Diskless

I think Proxmox should deal with this situation more gracefully.
Anyone have any opinions on how Proxmox should behave in this situation?
 
looks like a drbd bug to me?
Seems like there is a race condition between DRBD making the volume available and KVM trying to use it.

I've not had time to look at the Proxmox code yet so I'm not really sure what Proxmox is trying to do.
Is Proxmox asking DRBD to do anything or does it simply assume that the volume is there and start KVM?

what situation do you talk about exactly?
Migrating a DRBD backed VM to a node that has no DRBD backing storage for that VMs volume(s).

Seems like there are a few options that all make sense:
1) Don't allow migrating to a diskless node ( too limiting )
2) Migrate the DRBD backing storage with the node ( too many details, IMHO, for Proxmox to deal with )
3) Allow migrating to a diskless node but in a manner that does not result in the failure noted above ( Warning the user the node is diskless seems like a great idea too )

I think things change a bit for HA DRBD backed VMs.
In this case I would only want HA to move the VM to a node that has DRBD storage for that VM.
Sure one could create an HA group to do that.
However, when creating a new VM drbdmange could create the storage on any number of nodes requiring the user to figure out where the storage is, create a new group and assign the VM to that group.
 
1) Don't allow migrating to a diskless node ( too limiting )

you can add such restriction yourself (limit storage to specific nodes)

2) Migrate the DRBD backing storage with the node ( too many details, IMHO, for Proxmox to deal with )

This does not really make sense to me

3) Allow migrating to a diskless node but in a manner that does not result in the failure noted above ( Warning the user the node is diskless seems like a great idea too )

DRBD is designed to work with such diskless nodes, so why should we warn the user?
 
DRBD is designed to work with such diskless nodes, so why should we warn the user?
Running in diskless mode could negatively impact read performance.
An example might be a read IO heavy database VM, thats the sort of VM I would not want to accidentally live migrate to a diskless node.


you can add such restriction yourself (limit storage to specific nodes)
If I specify a DRBD storage limited to only two nodes of a 5 node DRBD cluster will Proxmox ensure that DRBD only uses those nodes for storage when creating disks?
 
If I specify a DRBD storage limited to only two nodes of a 5 node DRBD cluster will Proxmox ensure that DRBD only uses those nodes for storage when creating disks?

You initialize drbdmanage cluster only on those 2 nodes ...
 
You initialize drbdmanage cluster only on those 2 nodes ...
So it is perfectly acceptable to create multiple drbdmanage clusters within a single Proxmox cluster? Provided one also sets up the proper node restrictions in storage.cfg.


I did have a chance to look at the Proxmox code and it seems proper.
The only thought I have is dd opening the device for read is not the same as KVM opening the device for read/write.
That difference is the only plausible explanation I can think of to explain the migration failures I have been seeing.

When I was experiencing this problem numerous times, I had a couple of volumes doing their initial sync which was going horribly slow.
After applying some DRBD tuning things work much better and I've not been able to reproduce the problem now that all DRBD volumes are UpToDate.
 
... When I was experiencing this problem numerous times, I had a couple of volumes doing their initial sync which was going horribly slow.
After applying some DRBD tuning things work much better and I've not been able to reproduce the problem now that all DRBD volumes are UpToDate.
Do you mind share your tunings? I've seen that if I create a Vm on one node, it takes really long time to sync in the second node with default settings, and often this is unaceptable for me and would love to know exactly how to solve (possibly without becoing a drbd9 guru).
Thanks in advance
 
it takes really long time to sync in the second node with default settings
I was surprised how much slower DRBD9 is compared to 8.x without tuning.

I've only been using DRBD9 for about a week and applying some of these settings did manage to cause drbdmanage to stop working temporally due to some hung 'drbdadm adjust' processes. I had to reboot some nodes to get things back to working order. I'm using 10G Infiniband IPoIB for the DRBD network, not all of these options would be appropriate for other network types.

Code:
drbdmanage net-options --max-epoch-size 20000 --common
drbdmanage net-options --sndbuf-size 0 --common
drbdmanage net-options --max-buffers 80000 --common
drbdmanage peer-device-options --c-plan-ahead 10 --common
drbdmanage peer-device-options --c-min-rate 10240 --common
drbdmanage peer-device-options --c-max-rate 102400 --common
drbdmanage peer-device-options --c-fill-target 131072 --common
drbdmanage peer-device-options --resync-rate 1024 --common
drbdmanage disk-options --unplug-watermark 16001 --common
drbdmanage disk-options --md-flushes no --common
drbdmanage disk-options --disk-flushes no --common

I think the last two options made the most improvement.
 
Hi GUYS!!!! i've the same situation but with this configuration:

3 Nodes + 1 Satellite;
1 Vm 100;

root@pve-01:~# drbdadm status
.drbdctrl role:Secondary
volume:0 disk:UpToDate
volume:1 disk:UpToDate
pve-02 role:Secondary
volume:0 peer-disk:UpToDate
volume:1 peer-disk:UpToDate
pve-03 role:Secondary
volume:0 peer-disk:UpToDate
volume:1 peer-disk:UpToDate

vm-100-disk-1 role:Secondary
disk:UpToDate
pve-02 role:Secondary
peer-disk:Diskless
pve-03 role:primary
replication:SyncSource peer-disk:Inconsistent done:20.33


As You see the replication don't work on pve-02. This case is replicated in 3 nodes too but not in 2+1 (2 cluster nodes and 1 external )
How can i attach this peer and transform it in UptoDate?

Regards
Claudio
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!