Problem by cluster to migrate virtual machines

Re: DRBD/LVM migration doesn't work any longer for new machines

Entering DRBD-Handler informs regarding split brain condition...
Anytime a split happens each node sends out a mail to default mail address.

resource r0
handlers {
split-brain "/usr/lib/drbd/notify-split-brain.sh root";
...
}
 
Re: DRBD/LVM migration doesn't work any longer for new machines

...
Ok, change HA to "service back as soon as possible".

where?
 
Re: DRBD/LVM migration doesn't work any longer for new machines

The basic problem is what happens when split brain happens while having
LVM/DRBD machines running on both nodes.

Now finally to the HA -> "as short as possible" change question:
Is there a solution for DRBD/LVM which doesn't require to shut down machines?
If it's done on higher level the only thing missing would be to migrate a VM
from one to an other shared/local storage. I would expect this to be easy compared the the
migration to a different node, but I might be wrong.
I could crossmount two DRBD volumes pri/pri and convert them to sec/pri pri/sec
in case of a split and migrate to the primary volumes and then resync.
Maybe LVM can do this allone, but my understanding is not deep enough here.

-----

If the split happens with no LVM/BRDB shared storage VM on one of the nodes no
VM has to stop for the repair:

- select the node wich has none of these running
- better disable the network link for drbd
- remove the lvm volumes of these machines (their newer copy is on the other node)
this is required because DRBD mode changes are not possible with active volumes
- change DRBD to secondary
- DRBD : delete-all-my-data
- re-enable the network link
- DRBD reconnects itself (or do the connect manually)
- look at CSTATE or /proc/drbd and see when synchronisation is finished
- switch DRBD back to primary/primary

If shared machines are running on both:

- I have to stop all one onenode (at least up to my understanding) because I better synchronize
the whole DRBD.
- I move their config /etc/qemu/....conf to the other node and restart them there
- continue with the steps above

The more critical thing is what happens if the split happens while one
machine is migrating to the other node. This will cause damage to the VM since
it moves to in inconsistent storage having cached data in memory.
I would love to see a VETO possiblity in PVE when doing the online migration. Maybe
a script which is called before the hop is finally performed. I would add the mail-alert
handler in your DRBD config examples by the way.

Many thanks in advance
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!