Proxmox VE ZFS replication manager released (pve-zsync)

martin · Jun 30, 2015

We just released the brand new Proxmox VE ZFS replication manager (pve-zsync)!

This CLI tool synchronizes your virtual machine (virtual disks and VM configuration) or directory stored on ZFS between two servers - very useful for backup and replication tasks.

A big Thank-you to our active community for all feedback, testing, bug reporting and patch submissions.

Documentation
http://pve.proxmox.com/wiki/PVE-zsync

Git
https://git.proxmox.com/?p=pve-zsync.git;a=summary

Bugtracker
https://bugzilla.proxmox.com/

__________________
Best regards,

Martin Maurer
Proxmox VE project leader

mmenaz · Jun 30, 2015

Great, just I find the documentation confusing, you are using the same IP 192.168.1.2 sometime as "where to get the vm to save locally" and sometime as "where to (remotely) put the local vm". At the end of the wiki you use different IP too (192.168.15.95). Could you please provide a "real" usercase, with preamble of hostnames and ip involved? Thanks a lot!

tom · Jun 30, 2015

it does not make any difference, if you use internal or external IP - as long as SSH is possible.

replication inside a LAN is a "real" usecase. If you want to use it with public IP, just do it.

mmenaz · Jun 30, 2015

You missed the point, it's not private vs public, is writing some sample as you were ON 192.168.1.2 and other like you were on a DIFFERENT host and the "remote" (other host) is 192.168.1.2. And at the end, to show the result, you use one more different host (192.168.15.95... is it the targhet? the source? who knows...).
If the article's goal is to make things clear, then should be improved with something like "we have host zfs1 with IP 192.168.1.1 and host zfs2 with IP 192.168.1.2. We run VMs on zfs1 and sync with zfs2 in tank/backup". Instead in the wiki you seem to run all commands from $ zfs1 but, seems to me, that is not the case since sometime you seem to "push" the sync against the other host, and other cases you seem to "pull" it from the other host.
In any case I'm confused and I'm sure that wiki page can be wrote much clearer (I can't improve since I'm confused).

wolfgang · Jun 30, 2015

Hi,
I have update the Wiki hope it is now clear.

mmenaz · Jul 2, 2015

Seems is not available in 4.0 beta1 (even after aptitude update ; aptitude search pve-zsync ), probably a "bug".
In addition just want to share with you that ZFS has been recently updated with some bugfix, in case you find time to add to 4.0

https://github.com/zfsonlinux/zfs/releases/tag/zfs-0.6.4.2
(btw, you all are doing a great work with Proxmox!)

wolfgang · Jul 2, 2015

You can download it from the wheezy repo - it will work with Jessie.

blackpaw · Aug 24, 2015

This looks very useful Martin and I'm just in the middle of setting up a couple of ZFS storage servers, so thank you

One thing, on the wiki you mention:

and you can start your virtual machines on the second server (in case of data loss on the first server).

What is the actual procedure for doing that? are the replicated VM's visible on the second node somehow?

wolfgang · Aug 24, 2015

blackpaw said:
What is the actual procedure for doing that?

You must copy the disks manual, with zfs send and resieve
see https://pve.proxmox.com/wiki/PVE-zsync#Recovering_an_VM

blackpaw said:
are the replicated VM's visible on the second node somehow?

No, the Gui integration will come later.

blackpaw · Aug 24, 2015

Ok, thanks wolfgang. I'll have a play around with it.

mciacci · Sep 14, 2015

Hi all, my job is in STATE error, where i can find some logs?

many thanks

SOURCE NAME STATE LAST SYNC TYPE
100 test1 error 0 ssh

wolfgang · Sep 14, 2015

Hi,
if you use it with Proxmox you should get the error message per Email on the default admin email.

vkhera · Sep 17, 2015

I would like to suggest some ideas to help guide the development of this while it is still not final. My use case is that I have two PVE nodes I use to run VMs for development support purposes (NFS file share for common objects, image cache servers, Jira, etc.) and to run our general office server and PBX software.

It is not worth the extra cost and maintenance complexity to me to have HA with automatic failover. I can live with short downtime to move the machines around.

I did configure my two nodes with ZFS, and named the storage the same on both nodes. This allows me to use the UI to move machines around when they're small. I have two VMs that have significant amount of disk space attached to them. Copying those takes hours across machines. This is where I would like to suggest a change in how you expect the recovery to happen with zfs-sync.

Instead of needing yet another send/recv step, I think it would be ideal if the sync could drop the backup files directly into an available ZFS storage on the remote node. Then the recovery would simply be to move the QEMU config file to the new node and start the machine (possibly changing the storage name) but not requiring a copy that could take hours. Also, the remote node I have does not necessarily have enough disk space for two whole copies of the disks (one for the zfs-send copy, then another for the VM to use).

My only question is what, if any, confusion would be caused to the remote node if there are extra VM disks in its ZFS storage it is not using?

Here is how I would anticipate using it:

storage on PVE1 called "tank" holds the disk images for vm-104-disk-1, vm-104-disk-2, vm-104-disk-3. These total about 1TB of data. I would like to zfs-sync this directly to node PVE2 into "tank" (each "tank" is private and local to each node, just named the same). I don't have room to sync it to another location on PVE2 then send/recv into "tank" when I need it. There is only one ZFS pool for data on this machine and it is does not have enough spare space to do this duplication. Also, if I sync the disks to a third node (just a file server) then the recovery send/recv takes about 6 hours over the LAN. By having the disk images already in the ZFS storage that PVE2 wants to use, recovering the machine becomes trivial and fast.

Also, if PVE made a ZFS data set per VM inside the pool, the send/recv would be much easier: you would "zfs snap -r tank/vm-104@xx" and "zfs send -r" instead of having to do each disk individually. This would assume a structure like "tank/vm-104/disk-1". You could also simplify setting zfs properties per VM like disabling compression if the VM's file system is already compressed, for example.

jeanlau · Sep 25, 2015

Hello

I made a thread about this subject yesterday but nobody answered, maybe if i put the link here someone could see it and have answer to give me ?
http://forum.proxmox.com/threads/23673-can-t-make-zfs-zsync-working

Best regards,
Jeanlau

jeanlau · Sep 25, 2015

Hi,

Just a little post to ask a wiki update, please go and look at the thread i just mentioned, my issue is resolved thanks to redmop

There is an improvement that should be done on the pve-zsync script or at least THAT SHOULD BE PRECISED ON THE WIKI, the pve-zsync script doesn't work with name resolution !
It's very weird, my target hostname/ip was in my hosts file and the command was not working because it only works with ip addresses !!!

Please update the wiki to prevent other people to struggle with this like i have...

Best regards,

tom · Sep 26, 2015

thanks for feedback - the wiki is open, so please register yourself and add your comments!

jeanlau · Sep 26, 2015

registration done

jeanlau · Sep 27, 2015

Hello

If someone is interested i made cool script. What the script make is to keep in sync in both the hosts, create a new log file everyday to log all the sync and send you an email containing the log file if something bad happens. In fact it's a loop and the goal is to always have the most recent copies of the vm disk on both sides. Almost as interesting than DRBD but without the split-brain complications

In my case for exemple i have approximately 15 KVM VM witch are not to much solicited and the script need 1 minute to make a loop, I think during solicited period maybe 2 or 3 minutes, surely less than 5... It's all new so i have not experience with it, if someone use it i would be very happy if he let me know how it works for him.

It's made to work almost "out of the box"" in a full ZFS Proxmox installation in a two hosts cluster only, if your configuration is different you will have to adapt it...

You just have to verify that you have following packages installed : pve-zsync and screen, you will have to put your mail address in the var monmail at the beginning of the script.

Sorry all the comments in the script are in french, hope you will understand

Code:

#!/bin/bash


monmail="admin@mydomain.com"


gosync() {
##On commence la boucle
while true;do
## Creation du log (on verifie que le dossier des logs est cree avant)
if [ ! -d "/var/log/syncro" ];then
mkdir -p /var/log/syncro
fi
logfic="/var/log/syncro/syncro-`date '+%d-%m-%Y'`.log"
##On detecte sur quelle machine on se trouve et quelle est la machine distante
loc=`hostname`
dist=`ls /etc/pve/nodes/ | grep -v $loc`
###On recupere les ID des VM qui utilisent zfs locales puis de VM distantes
vmloc=`grep rpool /etc/pve/nodes/$loc/qemu-server/*.conf | cut -d / -f 7 | cut -d . -f 1`
vmdist=`grep rpool /etc/pve/nodes/$dist/qemu-server/*.conf | cut -d / -f 7 | cut -d . -f 1`
###On recupere l'IP de l'hote distant
ipdist=$(ping -c 1 $dist | gawk -F'[()]' '/PING/{print $2}')
##On vérifie la présence du répertoire des hotes du cluster
if [ ! -d "/etc/pve/nodes/" ]; then
    echo "PB avec le cluster a `date '+%d-%m-%Y_%Hh%Mm%Ss'`"  >> $logfic
        ##On laisse une trace d'envoi de mail et on l'envoie
        if [ $logfic != `cat /tmp/mail.tmp` ];then
        echo $logfic > /tmp/mail.tmp
        cat $logfic | mail -s "PB Syncro ZFS" $monmail;
        fi
fi


echo "syncro des machines de $loc vers $dist" >> $logfic
for n in $vmloc
do
    if test -f "/tmp/stopsync.req"
        then
            rm /tmp/stopsync.req
            touch /tmp/stopsync.ok
            exit 0
        else
            echo "debut syncro de la machine $n a `date '+%d-%m-%Y_%Hh%Mm%Ss'`" >> $logfic
            pve-zsync sync --source $n --dest $ipdist:rpool/lastsync --maxsnap 1 --verbose >> $logfic
                if test ${?} -eq 0 ; then
                    echo "syncro de la machine $n finie a `date '+%d-%m-%Y_%Hh%Mm%Ss'`" >> $logfic
                else
                    ##On laisse une trace d'envoi de mail et on l'envoie
                    if [ $logfic != `cat /tmp/mail.tmp` ];then
                    echo $logfic > /tmp/mail.tmp
                    cat $logfic | mail -s "PB Syncro ZFS" $monmail;
                    fi
                fi
    fi
done


echo "syncro des machines de $dist vers $loc" >> $logfic    
for n in $vmdist
do
    if test -f "/tmp/stopsync.req"
        then
            rm /tmp/stopsync.req
            touch /tmp/stopsync.ok
            exit 0
            
        else 
            echo "debut syncro de la machine $n a `date '+%d-%m-%Y_%Hh%Mm%Ss'`" >> $logfic
            pve-zsync sync --source $ipdist:$n --dest rpool/lastsync --maxsnap 1 --verbose >> $logfic
                if test ${?} -eq 0 ; then
                echo "syncro de la machine $n finie a `date '+%d-%m-%Y_%Hh%Mm%Ss'`" >> $logfic
                else
                    ##On laisse une trace d'envoi de mail et on l'envoie
                    if [ $logfic != `cat /tmp/mail.tmp` ];then
                    echo $logfic > /tmp/mail.tmp
                    cat $logfic | mail -s "PB Syncro ZFS" $monmail;
                    fi
                fi
    fi


done


done
}


stop() {
touch /tmp/stopsync.req
##On commence une nouvelle boucle pour attendre que la syncro en cours soit finie
while true;do
    if test -f "/tmp/stopsync.ok"
        then
        echo "Arret de la syncro : OK"
        ##Et l'arret du script en lui meme
        rm /tmp/stopsync.ok
        kill $$
        exit 0
        else
        echo "Arret en cours..."
        echo "la syncronisation en cours se finit, cela peut durer un peu..."
        sleep 3    
    fi
done        
}


case "$1" in
   gosync)
    gosync
    ;;
   start)
    screen -d -m -S syncro-zfs bash -c '/root/scripts/syncro-zfs gosync'
    echo "Lancement de la syncronisation : OK"
    echo "taper la commande 'screen -r syncro-zfs' pour voir la sortie standard."
    ;;
  stop)
    stop
    ;;
  *)
    echo "Usage: $0 {start|stop}" >&2
    exit 1
    ;;
esac

Hope you will like it, please let me know.

Best regards,

wbumiller · Sep 28, 2015

Not just the comments but also the output/logfiles, which is a bit... unconventional.
Also I recommend putting all variables in double-quotes in commands, and using double-brackets for if statements, especially if they contain backtick operators as this can lead to many issues.
You seem to like writing shell scripts, so it's my duty to point you to this very useful article[1].

[1] http://mywiki.wooledge.org/BashPitfalls/

jeanlau · Sep 28, 2015

Thank you for the feedback, I'll read your link with attention

and thank you for having given my script its original form.

I'll do another script witch will be a kind of nagios plugin to compare de date and hour of the last sync and give alert when its too long... i'll put the script here when and if it's done.

Best regards,

Proxmox VE ZFS replication manager released (pve-zsync)

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Retired Staff

Renowned Member

Proxmox Retired Staff

Renowned Member

Proxmox Retired Staff

Renowned Member

New Member

Proxmox Retired Staff

Member

Renowned Member

Renowned Member

Proxmox Staff Member

Renowned Member

Renowned Member

Proxmox Staff Member

Renowned Member

We value your privacy