cluster+DBRD: replicated images and cloned VMs

A

ansanto

Guest
Assume we create all the (KVM) VMs in the slave node only, i.e. the slave node is the only one responsible for running all the VMs.
In my understanding, in such scenario a DRBD split-brain (or a clutster sb too) will never cause data incosistence just because all the VMs run in only one side of the cluster(!) while the other one doesn't issue any I/O operations on the replicated disk (just the ones issued by DRBD in order to sync the disks).
Is that correct?

That said, assume also we have some simple code:

- clonevm
scan and "clone" the VMs definitions files from the slave (node2:/etc/qemu-server/*.conf)
to the master (node1:/etc/qemu-server-clone/*.conf). It could be also scheduled for updating pursoses.

- initvm
copy the [selected|all] conf file from /etc/qemu-server-clone to /etc/qemu-server (inside the master node1). These initialized VM(s) are held down (stopped).

- startvm/stopvm
SystemV-style rc script which starts|stops the [selected|all] cloned VM (inside the master node1)

'initvm' and 'startvm' may be triggered by DRBD/Cluster events or started manually so that in case of the slave node2 failure we get a fast on-line migration (as VMs images are updated by DRBD).
In few words, replicated images and cloned VMs.

It is in the above scenario that we could experience data inconsistence due to the split-brain if the slave node2 becomes available and it starts all the VMs at boot(!).
In that case, however, the sb-recovery policy to be adopted is quite simple. Once the slave node2 disk is synced (DRBD manual/autonatic recovery) we have to stop the cloned VMs on the master node1 and start them (the original ones) on the slave node2.

Is that correct or I'm missing something? (I'm testing what I posted and it seems working like a charm...)

Antonio
 
Last edited by a moderator:
Assume we create all the (KVM) VMs in the slave node only, i.e. the slave node is the only one responsible for running all the VMs.
In my understanding, in such scenario a DRBD split-brain (or a clutster sb too) will never cause data incosistence just because all the VMs run in only one side of the cluster(!) while the other one doesn't issue any I/O operations on the replicated disk (just the ones issued by DRBD in order to sync the disks).
Is that correct?
That sounds correct. Almost the same scenario is explained here: http://pve.proxmox.com/wiki/DRBD#Recovery_from_communication_failure. As explained there you could actually have two DRBD devices - one for each server. This gives you a DRBD device for VMs for each server. Then you don't have one of your servers doing nothing all the time and still you avoid most split brain situations.
However always be aware that in the event your slave server breaks down your VMs need to be able to survive a "power shortage". Only data flushed to disk by your VM is guaranteed to be at both primary and secondary DRBD server.


That said, assume also we have some simple code:

- clonevm
scan and "clone" the VMs definitions files from the slave (node2:/etc/qemu-server/*.conf)
to the master (node1:/etc/qemu-server-clone/*.conf). It could be also scheduled for updating pursoses.

- initvm
copy the [selected|all] conf file from /etc/qemu-server-clone to /etc/qemu-server (inside the master node1). These initialized VM(s) are held down (stopped).

- startvm/stopvm
SystemV-style rc script which starts|stops the [selected|all] cloned VM (inside the master node1)

'initvm' and 'startvm' may be triggered by DRBD/Cluster events or started manually so that in case of the slave node2 failure we get a fast on-line migration (as VMs images are updated by DRBD).
In few words, replicated images and cloned VMs.
Interesting, I've been testing a similar setup, although only with manual migration. Do you use a standard cluster application to do the fencing?

It is in the above scenario that we could experience data inconsistence due to the split-brain if the slave node2 becomes available and it starts all the VMs at boot(!).
In that case, however, the sb-recovery policy to be adopted is quite simple. Once the slave node2 disk is synced (DRBD manual/autonatic recovery) we have to stop the cloned VMs on the master node1 and start them (the original ones) on the slave node2.

Is that correct or I'm missing something? (I'm testing what I posted and it seems working like a charm...)

Antonio
Have you found a good way of setting the sb-recovery policy?
In my testing so far I have used:
Code:
                after-sb-0pri discard-zero-changes;
                after-sb-1pri violently-as0p;
                after-sb-2pri violently-as0p;
However I have not had time to finish it, but when I left it I was stock at having e.g. 1 primary which DRBD says needs to be secondary to get synchronized. I never finished scripting for it. Do you do it in an easier way to get automatic recovery?

Best regards,
Bo
 
The goal is to be as safe as possible about the peer working conditions.

The idea is to run on both the DRBD nodes a simple DRBD-ssh-key-exchange script (something like pveca perl script does) so that one may run commands remotely on DRBD nodes by SSH instances and using the DRBD link too other than the pve-cluster link. In few words, two SSH available channels betwen the nodes.

This way we have the chance to run scripts (and/or daemons) in oder to get informations from the peer about its DRBD status and inject drbdadm commands accordingly ...even if one of the two links is down. For example, you might disconnect the resources on the peer if the DRBD link is down/broken (the poor man's fencing), you might inspect cluster status using the DRBD link, stop VMs running on the peer if the cluster is in disconnect status (vmbr0 down), and so on.
 
Last edited by a moderator:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!