Hello everyone.
My question is, how is a locked VM (by backup or config change) behaving, if it's node is fencing or has a hardware error? The HA service is set to "running".
Will it be shown as locked with unchanged configuration on another node. After my readings of the HA mechanism in case of locked VMs this VM will be shown as locked and it will not start, until we investigate the problem and manually unlock the VM. Is that right?
We are in a support case with Nakivo Backup about that.
Our cluster has attached a fast shared NFS storage and die VMs have qcow2-disks.
Nakivos backup mechanism is to freeze the VM base-disk and produce a temp. overlay-disk, while the base-disk is backed up. After successful backup it will merge the temp. overlay to the base-disk and the temp. overlay disk is deleted.
Our problem was, if a backup job is stuck, the VM stays locked and will not do the merge or commit automatically. If you are not careful and you simply unlock the VM and reboot it, the VM is starting and don't know about the temp-overlay image, all the data was only written to. So the VM after unlock is running on the base-disk, with only the data of the backup start. And after changing the base-disk, a merge or commit is impossible. Yes, you can merge it, but the new disk after that has errors and is almost unusable.
We now know, how to get out this problem and know, not to start again before merging the temp-overlay. We tested that and it's all o.k.
But we need to be sure, that HA will never start a before locked VM on a crashed node, so we can investigate it. We will test it ourselves, with only one test-VM on a cluster-node and simply turn the node off during a backup process of Nakivo. Cause if the VM then starts on another node, cause it's not locked, the HA could lead to massive data loose on the affected VMs, that was in a backup-process during HA case.
So, I'm right with my knowledge, that the VM will stayed locked and will not start in HA case?
My question is, how is a locked VM (by backup or config change) behaving, if it's node is fencing or has a hardware error? The HA service is set to "running".
Will it be shown as locked with unchanged configuration on another node. After my readings of the HA mechanism in case of locked VMs this VM will be shown as locked and it will not start, until we investigate the problem and manually unlock the VM. Is that right?
We are in a support case with Nakivo Backup about that.
Our cluster has attached a fast shared NFS storage and die VMs have qcow2-disks.
Nakivos backup mechanism is to freeze the VM base-disk and produce a temp. overlay-disk, while the base-disk is backed up. After successful backup it will merge the temp. overlay to the base-disk and the temp. overlay disk is deleted.
Our problem was, if a backup job is stuck, the VM stays locked and will not do the merge or commit automatically. If you are not careful and you simply unlock the VM and reboot it, the VM is starting and don't know about the temp-overlay image, all the data was only written to. So the VM after unlock is running on the base-disk, with only the data of the backup start. And after changing the base-disk, a merge or commit is impossible. Yes, you can merge it, but the new disk after that has errors and is almost unusable.
We now know, how to get out this problem and know, not to start again before merging the temp-overlay. We tested that and it's all o.k.
But we need to be sure, that HA will never start a before locked VM on a crashed node, so we can investigate it. We will test it ourselves, with only one test-VM on a cluster-node and simply turn the node off during a backup process of Nakivo. Cause if the VM then starts on another node, cause it's not locked, the HA could lead to massive data loose on the affected VMs, that was in a backup-process during HA case.
So, I'm right with my knowledge, that the VM will stayed locked and will not start in HA case?
Last edited: