DRBD9 live migration problems

e100 · Apr 5, 2016

I've setup a 3 node DRBD cluster server names vm1, vm2 and vm3.

I created DRBD storage with replication set to 2:

Code:

drbd: drbd2                   
        redundancy 2         
        content images,rootdir

I created a DRBD disk for VM 110, The disk is created and is using servers VM1 and VM2 for storage:

Code:

root@vm2:/# drbdadm status
...
vm-110-disk-1 role:Primary
  disk:UpToDate
  vm1 role:Secondary
    peer-disk:UpToDate

Now I try to live migrate this vm to server vm3, but it fails:

Code:

Apr 05 14:40:16 starting migration of VM 110 to node 'vm3' (192.168.4.3)
Apr 05 14:40:16 copying disk images
Apr 05 14:40:16 starting VM 110 on remote node 'vm3'
Apr 05 14:40:19 start failed: command '/usr/bin/systemd-run --scope --slice qemu --unit 110 -p 'KillMode=none' -p 'CPUShares=1000' /usr/bin/kvm -id 110 -chardev 'socket,id=qmp,path=/var/run/qemu-server/110.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/110.pid -daemonize -name vm3-zabbix.inhouse.kmionline.com -smp '2,sockets=1,cores=2,maxcpus=2' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000' -vga qxl -vnc unix:/var/run/qemu-server/110.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 2048 -k en-us -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -spice 'tls-port=61000,addr=localhost,tls-ciphers=DES-CBC3-SHA,seamless-migration=on' -device 'virtio-serial,id=spice,bus=pci.0,addr=0x9' -chardev 'spicevmc,id=vdagent,name=vdagent' -device 'virtserialport,chardev=vdagent,name=com.redhat.spice.0' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:7ae7e697ca8' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive 'file=/dev/drbd/by-res/vm-110-disk-1/0,if=none,id=drive-virtio0,cache=none,discard=on,format=raw,aio=native,detect-zeroes=unmap' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap110i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' -device 'e1000,mac=4E:44:84:BE:DE:9A,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -netdev 'type=tap,id=net1,ifname=tap110i1,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown' -device 'e1000,mac=9A:0C:64:12:3C:E5,netdev=net1,bus=pci.0,addr=0x13,id=net1,bootindex=301' -machine 'type=pc-i440fx-2.5' -incoming tcp:localhost:60000 -S' failed: exit code 1
Apr 05 14:40:19 ERROR: online migrate failure - command '/usr/bin/ssh -o 'BatchMode=yes' root@192.168.4.3 qm start 110 --stateuri tcp --skiplock --migratedfrom vm2 --machine pc-i440fx-2.5' failed: exit code 255
Apr 05 14:40:19 aborting phase 2 - cleanup resources
Apr 05 14:40:19 migrate_cancel
Apr 05 14:40:20 ERROR: migration finished with problems (duration 00:00:04)
TASK ERROR: migration problems

Now the status of this DRBD disk has changed so that VM3 is diskless.
Clearly the failed migration did something to cause this diskless resource to get created but was unable to utilize it for some unknown reason.
Sure diskless is not ideal but no reason it should not work:

Code:

root@vm2:/# drbdadm status
...
vm-110-disk-1 role:Primary
  disk:UpToDate
  vm1 role:Secondary congested:yes
  peer-disk:UpToDate
  vm3 role:Secondary
  peer-disk:Diskless

Next I try to live migrate to VM3 a second time and it works just fine:

Code:

Apr 05 14:44:41 starting migration of VM 110 to node 'vm3' (192.168.4.3)
Apr 05 14:44:41 copying disk images
Apr 05 14:44:41 starting VM 110 on remote node 'vm3'
Apr 05 14:44:43 starting ssh migration tunnel
bind: Cannot assign requested address

Apr 05 14:44:43 starting online/live migration on localhost:60000
Apr 05 14:44:43 migrate_set_speed: 8589934592
Apr 05 14:44:43 migrate_set_downtime: 0.1
Apr 05 14:44:43 spice client_migrate_info
Apr 05 14:44:45 migration status: active (transferred 168883099, remaining 2060636160), total 2282831872)
Apr 05 14:44:45 migration xbzrle cachesize: 134217728 transferred 0 pages 0 cachemiss 0 overflow 0
..........
Apr 05 14:45:10 migration speed: 75.85 MB/s - downtime 10 ms
Apr 05 14:45:10 migration status: completed
Apr 05 14:45:11 Waiting for spice server migration
Apr 05 14:45:13 migration finished successfully (duration 00:00:32)
TASK OK

DRBD status is correct, vm3 is diskless and primary:

Code:

root@vm2:/# drbdadm status
...
vm-110-disk-1 role:Secondary
  disk:UpToDate
  vm1 role:Secondary
  peer-disk:UpToDate
  vm3 role:Primary
  peer-disk:Diskless

I think Proxmox should deal with this situation more gracefully.
Anyone have any opinions on how Proxmox should behave in this situation?

dietmar · Apr 6, 2016

e100 said:
I
I think Proxmox should deal with this situation more gracefully.

looks like a drbd bug to me?

e100 said:
I
Anyone have any opinions on how Proxmox should behave in this situation?

what situation do you talk about exactly?

e100 · Apr 6, 2016

dietmar said:
looks like a drbd bug to me?

Seems like there is a race condition between DRBD making the volume available and KVM trying to use it.

I've not had time to look at the Proxmox code yet so I'm not really sure what Proxmox is trying to do.
Is Proxmox asking DRBD to do anything or does it simply assume that the volume is there and start KVM?

dietmar said:
what situation do you talk about exactly?

Migrating a DRBD backed VM to a node that has no DRBD backing storage for that VMs volume(s).

Seems like there are a few options that all make sense:
1) Don't allow migrating to a diskless node ( too limiting )
2) Migrate the DRBD backing storage with the node ( too many details, IMHO, for Proxmox to deal with )
3) Allow migrating to a diskless node but in a manner that does not result in the failure noted above ( Warning the user the node is diskless seems like a great idea too )

I think things change a bit for HA DRBD backed VMs.
In this case I would only want HA to move the VM to a node that has DRBD storage for that VM.
Sure one could create an HA group to do that.
However, when creating a new VM drbdmange could create the storage on any number of nodes requiring the user to figure out where the storage is, create a new group and assign the VM to that group.

dietmar · Apr 6, 2016

e100 said:
1) Don't allow migrating to a diskless node ( too limiting )

you can add such restriction yourself (limit storage to specific nodes)

e100 said:
2) Migrate the DRBD backing storage with the node ( too many details, IMHO, for Proxmox to deal with )

This does not really make sense to me

e100 said:
3) Allow migrating to a diskless node but in a manner that does not result in the failure noted above ( Warning the user the node is diskless seems like a great idea too )

DRBD is designed to work with such diskless nodes, so why should we warn the user?

e100 · Apr 6, 2016

dietmar said:
DRBD is designed to work with such diskless nodes, so why should we warn the user?

Running in diskless mode could negatively impact read performance.
An example might be a read IO heavy database VM, thats the sort of VM I would not want to accidentally live migrate to a diskless node.

dietmar said:
you can add such restriction yourself (limit storage to specific nodes)

If I specify a DRBD storage limited to only two nodes of a 5 node DRBD cluster will Proxmox ensure that DRBD only uses those nodes for storage when creating disks?

dietmar · Apr 6, 2016

e100 said:
If I specify a DRBD storage limited to only two nodes of a 5 node DRBD cluster will Proxmox ensure that DRBD only uses those nodes for storage when creating disks?

You initialize drbdmanage cluster only on those 2 nodes ...

e100 · Apr 6, 2016

dietmar said:
You initialize drbdmanage cluster only on those 2 nodes ...

So it is perfectly acceptable to create multiple drbdmanage clusters within a single Proxmox cluster? Provided one also sets up the proper node restrictions in storage.cfg.

I did have a chance to look at the Proxmox code and it seems proper.
The only thought I have is dd opening the device for read is not the same as KVM opening the device for read/write.
That difference is the only plausible explanation I can think of to explain the migration failures I have been seeing.

When I was experiencing this problem numerous times, I had a couple of volumes doing their initial sync which was going horribly slow.
After applying some DRBD tuning things work much better and I've not been able to reproduce the problem now that all DRBD volumes are UpToDate.

mmenaz · Apr 6, 2016

e100 said:
... When I was experiencing this problem numerous times, I had a couple of volumes doing their initial sync which was going horribly slow.
After applying some DRBD tuning things work much better and I've not been able to reproduce the problem now that all DRBD volumes are UpToDate.

Do you mind share your tunings? I've seen that if I create a Vm on one node, it takes really long time to sync in the second node with default settings, and often this is unaceptable for me and would love to know exactly how to solve (possibly without becoing a drbd9 guru).
Thanks in advance

e100 · Apr 7, 2016

mmenaz said:
it takes really long time to sync in the second node with default settings

I was surprised how much slower DRBD9 is compared to 8.x without tuning.

I've only been using DRBD9 for about a week and applying some of these settings did manage to cause drbdmanage to stop working temporally due to some hung 'drbdadm adjust' processes. I had to reboot some nodes to get things back to working order. I'm using 10G Infiniband IPoIB for the DRBD network, not all of these options would be appropriate for other network types.

Code:

drbdmanage net-options --max-epoch-size 20000 --common
drbdmanage net-options --sndbuf-size 0 --common
drbdmanage net-options --max-buffers 80000 --common
drbdmanage peer-device-options --c-plan-ahead 10 --common
drbdmanage peer-device-options --c-min-rate 10240 --common
drbdmanage peer-device-options --c-max-rate 102400 --common
drbdmanage peer-device-options --c-fill-target 131072 --common
drbdmanage peer-device-options --resync-rate 1024 --common
drbdmanage disk-options --unplug-watermark 16001 --common
drbdmanage disk-options --md-flushes no --common
drbdmanage disk-options --disk-flushes no --common

I think the last two options made the most improvement.

claudio · May 10, 2016

Hi GUYS!!!! i've the same situation but with this configuration:

3 Nodes + 1 Satellite;
1 Vm 100;

root@pve-01:~# drbdadm status
.drbdctrl role:Secondary
volume:0 disk:UpToDate
volume:1 disk:UpToDate
pve-02 role:Secondary
volume:0 peer-disk:UpToDate
volume:1 peer-disk:UpToDate
pve-03 role:Secondary
volume:0 peer-disk:UpToDate
volume:1 peer-disk:UpToDate

vm-100-disk-1 role:Secondary
disk:UpToDate
pve-02 role:Secondary
peer-disk

iskless
pve-03 role

rimary
replication:SyncSource peer-disk:Inconsistent done:20.33

As You see the replication don't work on pve-02. This case is replicated in 3 nodes too but not in 2+1 (2 cluster nodes and 1 external )
How can i attach this peer and transform it in UptoDate?

Regards
Claudio

Search

Search

DRBD9 live migration problems

e100

Famous Member

dietmar

Proxmox Staff Member

e100

Famous Member

dietmar

Proxmox Staff Member

e100

Famous Member

dietmar

Proxmox Staff Member

e100

Famous Member

mmenaz

Renowned Member

e100

Famous Member

claudio

New Member

We value your privacy