replication options

howudodat

Active Member
Feb 15, 2019
8
5
43
56
I am a bit overwhelmed by all the replication options. I am trying to set up a low availability HA environment. I dont need live migrate.
Basically we have 2 servers, one of them is offsite and only used in a disaster situation. We have 8 VMs, with 2 having frequent changes and the other 6 are more static. The 6 could be backup'd and transferred to the 2nd server once a week. The 2 active VMs need to be transferred nightly.

One option I thought of was rsync of /etc/pve/qemu-server and /var/lib/vz/dump, but the backups there are not incremental

so then I tried replication, that copied the data, but not the config

so then I thought about PVE-zsync, but that seems to take 2x the space

So I'm looking for some recommendations on which way to go.

Code:
root@pve1:~# cat /etc/pve/storage.cfg
dir: local
    path /var/lib/vz
    content iso,vztmpl,backup

zfspool: pool1
    pool pool1
    content images,rootdir
    mountpoint /pool1


root@pve1:~# lsblk -f
NAME         FSTYPE            FSVER    LABEL UUID                                   FSAVAIL FSUSE% MOUNTPOINT
sda                                                                                                 
├─sda1       zfs_member        5000     pool1 13153554493931190882                                 
└─sda9                                                                                             
sdb                                                                                                 
├─sdb1       zfs_member        5000     pool1 13153554493931190882                                 
└─sdb9                                                                                             
sdc                                                                                                 
├─sdc1       zfs_member        5000     pool1 13153554493931190882                                 
└─sdc9                                                                                             
sdd                                                                                                 
├─sdd1       zfs_member        5000     pool1 13153554493931190882                                 
└─sdd9                                                                                             
sde                                                                                                 
├─sde1                                                                                             
├─sde2       vfat              FAT32          65E2-5602                               510.6M     0% /boot/efi
└─sde3       LVM2_member       LVM2 001       SkFS0b-ZoKP-IeWA-iefW-7m3g-8zTc-Oiq9lO               
  ├─pve-swap swap              1              e92f31d4-ab52-4b0e-aafb-a18e1829e266                  [SWAP]
  └─pve-root ext4              1.0            589664f3-4da6-4a9b-b3cf-f0b42e905816    866.5G     2% /
 
Hi,
Basically we have 2 servers, one of them is offsite and only used in a disaster situation.

so then I tried replication, that copied the data, but not the config
I'd recommend against clustering if the second server is offsite and not always active. There needs to be low latency for cluster communication and if one node is down, the other one won't have quorum. You could use a QDevice for vote support, but I'd say that clustering is not designed for your scenario.

so then I thought about PVE-zsync, but that seems to take 2x the space
pve-zsync should also just keep 1 snapshot by default, so that sounds strange. After replicating something, could you share the output of
Code:
zfs list -o space <replicated dataset> -r -t all
on both source and target?

So I'm looking for some recommendations on which way to go.

Code:
root@pve1:~# cat /etc/pve/storage.cfg
dir: local
    path /var/lib/vz
    content iso,vztmpl,backup

zfspool: pool1
    pool pool1
    content images,rootdir
    mountpoint /pool1


root@pve1:~# lsblk -f
NAME         FSTYPE            FSVER    LABEL UUID                                   FSAVAIL FSUSE% MOUNTPOINT
sda                                                                                                
├─sda1       zfs_member        5000     pool1 13153554493931190882                                
└─sda9                                                                                            
sdb                                                                                                
├─sdb1       zfs_member        5000     pool1 13153554493931190882                                
└─sdb9                                                                                            
sdc                                                                                                
├─sdc1       zfs_member        5000     pool1 13153554493931190882                                
└─sdc9                                                                                            
sdd                                                                                                
├─sdd1       zfs_member        5000     pool1 13153554493931190882                                
└─sdd9                                                                                            
sde                                                                                                
├─sde1                                                                                            
├─sde2       vfat              FAT32          65E2-5602                               510.6M     0% /boot/efi
└─sde3       LVM2_member       LVM2 001       SkFS0b-ZoKP-IeWA-iefW-7m3g-8zTc-Oiq9lO              
  ├─pve-swap swap              1              e92f31d4-ab52-4b0e-aafb-a18e1829e266                  [SWAP]
  └─pve-root ext4              1.0            589664f3-4da6-4a9b-b3cf-f0b42e905816    866.5G     2% /
 
I will get you the results of zfs list, as soon as I can. However from the documentation
pve-zsync create --source 192.168.1.1:100 --dest tank/backup --verbose --maxsnap 2 --name test1 --limit 512 --skip
this creates a snapshot at the backup mount point, but of course that is not usable until you do:
zfs send <pool>/[<path>/]vm-<VMID>-disk-<number>@<last_snapshot> | [ssh root@<destination>] zfs receive <pool>/<path>/vm-<VMID>-disk-<number>

that indicates to me that there are 2 copies of the data, or am I misunderstanding?
 
I will get you the results of zfs list, as soon as I can. However from the documentation
pve-zsync create --source 192.168.1.1:100 --dest tank/backup --verbose --maxsnap 2 --name test1 --limit 512 --skip
this creates a snapshot at the backup mount point, but of course that is not usable until you do:
zfs send <pool>/[<path>/]vm-<VMID>-disk-<number>@<last_snapshot> | [ssh root@<destination>] zfs receive <pool>/<path>/vm-<VMID>-disk-<number>

that indicates to me that there are 2 copies of the data, or am I misunderstanding?
pve-zsync should that care of creating and sending the dataset/snapshots for you (no need for manual zfs send/recv). If you use --maxsnap 2 it might take up to two times as much space (depending on how big the delta between snapshots is of course).

EDIT: See also here for how to recover a VM on the second node. If you use the destination where the VM config expects it (rather than tank/backup things are easier of course.
 
Last edited:
ok, this is perfect. I didn't realize I could go directly to the correct diskid for the vm. Here are the results. They are exactly what I would expect!
Code:
root@pve2:~# pve-zsync sync --source 192.168.10.2:100 --dest pool1 --verbose --maxsnap 2 --name xpen1
...
root@pve2:~# zfs list -o space pool1/vm-100-disk-0 -r -t all
NAME                                               AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
pool1/vm-100-disk-0                                2.48T  77.4G        0B   77.4G             0B         0B
pool1/vm-100-disk-0@rep_xpen1_2021-12-17_10:16:20      -     0B         -       -              -          -

30 minutes later:
root@pve2:~# pve-zsync sync --source 192.168.10.2:100 --dest pool1 --verbose --maxsnap 2 --name xpen1
send from pool1/vm-100-disk-0@rep_xpen1_2021-12-17_10:16:20 to pool1/vm-100-disk-0@rep_xpen1_2021-12-17_10:41:24 estimated size is 50.3M
total estimated size is 50.3M
root@pve2:~# zfs list -o space pool1/vm-100-disk-0 -r -t all
NAME                                               AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
pool1/vm-100-disk-0                                2.48T  77.5G     38.0M   77.4G             0B         0B
pool1/vm-100-disk-0@rep_xpen1_2021-12-17_10:16:20      -  38.0M         -       -              -          -
pool1/vm-100-disk-0@rep_xpen1_2021-12-17_10:41:24      -     0B         -       -              -          -

30 minutes later with a forced change on the guest (apt upgrade - 4 packages updated):
root@pve2:~# pve-zsync sync --source 192.168.10.2:100 --dest pool1 --verbose --maxsnap 2 --name xpen1
send from pool1/vm-100-disk-0@rep_xpen1_2021-12-17_10:59:31 to pool1/vm-100-disk-0@rep_xpen1_2021-12-17_11:10:24 estimated size is 227M
total estimated size is 227M
TIME        SENT   SNAPSHOT pool1/vm-100-disk-0@rep_xpen1_2021-12-17_11:10:24
11:10:27    113M   pool1/vm-100-disk-0@rep_xpen1_2021-12-17_11:10:24
11:10:28    225M   pool1/vm-100-disk-0@rep_xpen1_2021-12-17_11:10:24
root@pve2:~# zfs list -o space pool1/vm-100-disk-0 -r -t all
NAME                                               AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
pool1/vm-100-disk-0                                2.48T  77.7G      229M   77.5G             0B         0B
pool1/vm-100-disk-0@rep_xpen1_2021-12-17_10:59:31      -   229M         -       -              -          -
pool1/vm-100-disk-0@rep_xpen1_2021-12-17_11:10:24      -     0B         -       -              -          -
root@pve2:~#

Now one last question, I am trying to rsync the guest configs, if I rsync directly it fails. if I rsync to /tmp and then move the files it works
Code:
root@pve2:~# rsync -avP root@192.168.10.2:/etc/pve/qemu-server/* /etc/pve/qemu-server/
receiving incremental file list
rsync: [receiver] mkstemp "/etc/pve/qemu-server/.100.conf.eLHSrX" failed: Operation not permitted (1)
            364 100%  355.47kB/s    0:00:00 (xfr#1, to-chk=7/8)
rsync: [receiver] mkstemp "/etc/pve/qemu-server/.101.conf.ni208X" failed: Operation not permitted (1)
            458 100%  447.27kB/s    0:00:00 (xfr#2, to-chk=6/8)
rsync: [receiver] mkstemp "/etc/pve/qemu-server/.102.conf.XceQl1" failed: Operation not permitted (1)
            407 100%  397.46kB/s    0:00:00 (xfr#3, to-chk=5/8)
rsync: [receiver] mkstemp "/etc/pve/qemu-server/.103.conf.TkuWG0" failed: Operation not permitted (1)
            410 100%  400.39kB/s    0:00:00 (xfr#4, to-chk=4/8)
rsync: [receiver] mkstemp "/etc/pve/qemu-server/.104.conf.QTu6i0" failed: Operation not permitted (1)
            266 100%  259.77kB/s    0:00:00 (xfr#5, to-chk=3/8)
rsync: [receiver] mkstemp "/etc/pve/qemu-server/.105.conf.feUKIZ" failed: Operation not permitted (1)
            279 100%  272.46kB/s    0:00:00 (xfr#6, to-chk=2/8)
rsync: [receiver] mkstemp "/etc/pve/qemu-server/.108.conf.LYGTfY" failed: Operation not permitted (1)
            352 100%  343.75kB/s    0:00:00 (xfr#7, to-chk=1/8)
rsync: [receiver] mkstemp "/etc/pve/qemu-server/.109.conf.1cUXNY" failed: Operation not permitted (1)
            431 100%  420.90kB/s    0:00:00 (xfr#8, to-chk=0/8)

sent 176 bytes  received 3,484 bytes  7,320.00 bytes/sec
total size is 2,967  speedup is 0.81
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1819) [generator=3.2.3]

root@pve2:~# rsync -avP root@192.168.10.2:/etc/pve/qemu-server/* .
receiving incremental file list
100.conf
            364 100%  355.47kB/s    0:00:00 (xfr#1, to-chk=7/8)
101.conf
            458 100%  447.27kB/s    0:00:00 (xfr#2, to-chk=6/8)
102.conf
            407 100%  397.46kB/s    0:00:00 (xfr#3, to-chk=5/8)
103.conf
            410 100%  400.39kB/s    0:00:00 (xfr#4, to-chk=4/8)
104.conf
            266 100%  259.77kB/s    0:00:00 (xfr#5, to-chk=3/8)
105.conf
            279 100%  272.46kB/s    0:00:00 (xfr#6, to-chk=2/8)
108.conf
            352 100%  343.75kB/s    0:00:00 (xfr#7, to-chk=1/8)
109.conf
            431 100%  420.90kB/s    0:00:00 (xfr#8, to-chk=0/8)

sent 176 bytes  received 3,484 bytes  7,320.00 bytes/sec
total size is 2,967  speedup is 0.81
root@pve2:~# ls -l
total 32
-rw-r----- 1 root www-data 364 Dec 16 17:24 100.conf
-rw-r----- 1 root www-data 458 Dec 13 14:36 101.conf
-rw-r----- 1 root www-data 407 Dec 13 16:53 102.conf
-rw-r----- 1 root www-data 410 Dec 13 14:22 103.conf
-rw-r----- 1 root www-data 266 Dec 13 12:11 104.conf
-rw-r----- 1 root www-data 279 Dec 13 12:46 105.conf
-rw-r----- 1 root www-data 352 Dec 13 14:38 108.conf
-rw-r----- 1 root www-data 431 Dec 13 13:55 109.conf
root@pve2:~# mv *.conf /etc/pve/qemu-server/
or I guess I could write a script that looks for the most recent file in /var/lib/pve-zsync/ for each VMID and copy it over, that seems a bit of a hack though
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!