zsync and recover

nightcrawler

New Member
Apr 15, 2023
3
0
1
Greetings,

I have read and tried hard to understand but can't wrap my brain around it, so... I have two PVE servers at the same location, they are not clustered (for reasons) and the primary server has a particular VM that is very important. I'm currently using zsync to copy over the VM to the secundary server, and AFAIK everything is setup correctly. The secondary server has no VMs setup (its more of a storage and backup, we have not experimented failure of the primary server so far).

After the initial sync, subsequent snapshots are send (15m interval). My question is: zsync replicates the same name (ID) (that name is NOT used in the secondary server). My question is about the steps in case of a hardware failure or any event that renders the primary server offline for whatever reason, and I want the VM to be up and running as of the last snapshot on the secondary server. The docs say:

1) Stop the sync.
2) Copy the VM config.
3) ZFS send latest snapshot.
4) Edit VM config.
5) Run the VM on the secondary server.

I have doubts about the third step. Docs say ZFS send and receive in a new ID. That would copy the whole thing, right? (that would be very slow, its around 1TB on spinning disks and there's not a lot of space available). Can I ZFS send and receive in the same ID?

zfs send storage/vm-110-disk-1@<latest> | zfs receive storage/vm-110-disk-1

Is that redundant, or simply wrong? I have read a LOT of documentation, maybe is the language barrier, or I'm just not smart enough to get all this. Gotta love ZFS thoug.
 
Ok, Now I catch up it, your step was follow PVE-zsync! I'm also work through the each step follow by the instruction on PVE-zsync, and when I review each steps I done before, I realized I do not do the step "3)ZFS send latest snapshot". I remember the reason is because I use PVE-zsync to do VM migrate between two hosts, and last one replica copy do not contains any update, I believe it is because the last replicate is done at VM poweroff status, next I delete all replica copies made by PVE-zsync on destination host. then other steps keep as same as yours.
For the PVE-zsync mentioned about "send the VM or Dataset to the selected target" in "Recovering an VM" section, I have no idea why it say so. My opinion is, PVE-zsync will let us to keep some replica copies on destination host, It let we can handle with. Because in almost time if disaster happened, we can't assume the source host is still reachable! that we will need decide use which replica copy to restart VM on destination host. I'm not sure at that time we still can "send the VM or Dataset to the selected target".
 
Let me just start by saying, I don't use ZFS or pve-zsync at all. But having skimmed this post & the docs:

and I want the VM to be up and running as of the last snapshot on the secondary server.
That seems possible according to the docs, as quoted in the introduction section:
By synchronizing, you have a full copy of your virtual machine on the second host and you can start your virtual machines on the second server (in case of data loss on the first server).

But you say:
The docs say:

1) Stop the sync.
2) Copy the VM config.
3) ZFS send latest snapshot.
4) Edit VM config.
5) Run the VM on the secondary server.
It appears you are referring to the Recovering an VM section in the docs.
That section is referring to recovering the VM from the TARGET where the original pve-zync was synced to BACK to the ORIGINAL server (or a different one). That is not what you need - as your synced VM is already present on that secondary server.

So definitely stage 3 here is irrelevant. I guess you basically just have to configure the VM on that secondary server to make it run. If testing while the first server is running - you will probably also need stage 1 to stop the syncing overwriting the data now used on secondary server.

Again I can only hope I helped somewhat, as I have no first-hand experience with this.


Good luck.
 
So, after a bit more reading and some help from IA, I think I have figured it out.
As pve-zsync kept all changes after each sync, to effectively recover the VM after loss of primary server, you have to:

1) Stop the sync. (in case the primary server recovers for any reason, to not mess up the data on secondary server).
2) Copy VM config (pve-zsync mantains a copy of it too).
3) Rollback to latest snapshot on secondary server.

Bash:
zfs rollback -r <zfs>/vm-<id>-disk-0@<latest_snapshot>
(careful with -r, as it will destroy all snapshots)

4) Edit the VM config (to point to correct storage).
5) Run the VM on the secondary server.

There's no mention of rollback on the pve-zsync docs, because the main reason for using it is to keep a version as up to date as possible, AFAIK. If the image is already at the server, a simple rollback to latest snapshot in the case of loss of the primary server gives the latest available data on the secondary server ready for use. If anyone has a better option or some insight, I'd love to hear it.
 
Last edited: