Problem by cluster to migrate virtual machines

plewka · Nov 1, 2010

Re: DRBD/LVM migration doesn't work any longer for new machines

Entering DRBD-Handler informs regarding split brain condition...
Anytime a split happens each node sends out a mail to default mail address.

resource r0
handlers {
split-brain "/usr/lib/drbd/notify-split-brain.sh root";
...
}

tom · Nov 2, 2010

Re: DRBD/LVM migration doesn't work any longer for new machines

plewka said:
...
Ok, change HA to "service back as soon as possible".

where?

plewka · Nov 2, 2010

Re: DRBD/LVM migration doesn't work any longer for new machines

The basic problem is what happens when split brain happens while having
LVM/DRBD machines running on both nodes.

Now finally to the HA -> "as short as possible" change question:
Is there a solution for DRBD/LVM which doesn't require to shut down machines?
If it's done on higher level the only thing missing would be to migrate a VM
from one to an other shared/local storage. I would expect this to be easy compared the the
migration to a different node, but I might be wrong.
I could crossmount two DRBD volumes pri/pri and convert them to sec/pri pri/sec
in case of a split and migrate to the primary volumes and then resync.
Maybe LVM can do this allone, but my understanding is not deep enough here.

-----

If the split happens with no LVM/BRDB shared storage VM on one of the nodes no
VM has to stop for the repair:

- select the node wich has none of these running
- better disable the network link for drbd
- remove the lvm volumes of these machines (their newer copy is on the other node)
this is required because DRBD mode changes are not possible with active volumes
- change DRBD to secondary
- DRBD : delete-all-my-data
- re-enable the network link
- DRBD reconnects itself (or do the connect manually)
- look at CSTATE or /proc/drbd and see when synchronisation is finished
- switch DRBD back to primary/primary

If shared machines are running on both:

- I have to stop all one onenode (at least up to my understanding) because I better synchronize
the whole DRBD.
- I move their config /etc/qemu/....conf to the other node and restart them there
- continue with the steps above

The more critical thing is what happens if the split happens while one
machine is migrating to the other node. This will cause damage to the VM since
it moves to in inconsistent storage having cached data in memory.
I would love to see a VETO possiblity in PVE when doing the online migration. Maybe
a script which is called before the hop is finally performed. I would add the mail-alert
handler in your DRBD config examples by the way.

Many thanks in advance

Search

Search

Problem by cluster to migrate virtual machines

plewka

Member

tom

Proxmox Staff Member

plewka

Member