Replication failed to some node

If you guys are using Nagios, I've create a small check script (check_storagereplication) to get alerted on replication errors. It's not perfect but it works fine for us in prod.

Code:
#!/bin/bash
# Script to check Proxmox storage replication
# ExitCode:
# 0 = Ok
# 1 = Warning
# 2 = Critical
# 4 = Ok (No replicatons configured)

RESULTS=($(/usr/bin/pvesr status | awk 'NR>1 {print $7}'))
STATE=($(/usr/bin/pvesr status | awk 'NR>1 {print $8}'))
EXITCODE=0

for i in "${RESULTS[@]}"
do
    if [ $i -gt 0 ] && [ $i -le 10 ]
    then
        EXITCODE=1
    elif [ $i -gt 10 ]
    then
        EXITCODE=2
    else
        EXITCODE=2
    fi
done


for i in "${STATE[@]}"
do
    if [ $i ==  "OK" ] || [ $i == "SYNCING" ]
    then
        EXITCODE=0
    else
        EXITCODE=2
    fi
done

if [ -z $RESULTS ] && [ -z $STATE ]
then
    EXITCODE=4
fi

if [ $EXITCODE -eq 2 ]
then
    echo "CRITICAL: Some replication jobs failed !"
    exit 2
elif [ $EXITCODE -eq 1 ]
then
    echo "WARNING: There is some errors with some replication jobs"
    exit 1
elif [ $EXITCODE -eq 4 ]
then
    echo "OK: No replication jobs configured"
    exit 0
elif [ $EXITCODE -eq 0 ]
then
    echo "OK: All replication jobs working as intented"
    exit 0
fi
 
@andy77

do you use an extra snapshot tool like zfsnap?
Can you please send the output of you zfs snapshots from source and destination
Code:
zfs list -r -o name -t snap rpool/data
 
@wolfgang

Sorry I did not saw your answer.

No, we do not use nothing extra. Just pve-zsync.
Code:
rpool/data/vm-101-disk-1@rep_ps11_2017-11-27_17:30:42
rpool/data/vm-101-disk-1@rep_ds11_2017-11-27_17:45:19
rpool/data/vm-103-disk-1@rep_ps11_2017-11-27_17:30:22
rpool/data/vm-103-disk-1@rep_ps11_2017-11-27_17:45:01
rpool/data/vm-104-disk-1@rep_ds12_2017-11-27_17:30:32
rpool/data/vm-104-disk-1@rep_ds12_2017-11-27_17:45:26
rpool/data/vm-105-disk-1@rep_ps12_2017-11-27_17:30:45
rpool/data/vm-105-disk-1@rep_ps12_2017-11-27_17:45:22
rpool/data/vm-106-disk-1@rep_ds13_2017-11-27_17:45:56
rpool/data/vm-106-disk-1@rep_ds13_2017-11-27_18:00:01
rpool/data/vm-107-disk-1@rep_ps13_2017-11-27_17:30:29
rpool/data/vm-107-disk-1@rep_ps13_2017-11-27_17:45:36
rpool/data/vm-108-disk-1@rep_ds14_2017-11-27_17:30:48
rpool/data/vm-108-disk-1@rep_ds14_2017-11-27_17:45:40
rpool/data/vm-109-disk-1@rep_ps14_2017-11-27_17:45:43
rpool/data/vm-109-disk-1@rep_ps14_2017-11-27_18:00:14
rpool/data/vm-111-disk-1@rep_ds15_2017-11-27_17:30:35
rpool/data/vm-111-disk-1@rep_ds15_2017-11-27_17:45:53
rpool/data/vm-112-disk-1@rep_ps15_2017-11-27_17:30:10
rpool/data/vm-112-disk-1@rep_ps15_2017-11-27_17:45:29
rpool/data/vm-113-disk-2@rep_ds16_2017-11-27_17:45:34
rpool/data/vm-113-disk-2@rep_ds16_2017-11-27_18:00:11
rpool/data/vm-114-disk-1@rep_ps16_2017-11-27_17:46:07
rpool/data/vm-114-disk-1@rep_ps16_2017-11-27_18:00:05
rpool/data/vm-115-disk-1@rep_ds17_2017-11-27_17:30:25
rpool/data/vm-115-disk-1@rep_ds17_2017-11-27_17:45:17
rpool/data/vm-116-disk-1@rep_ps17_2017-11-27_17:30:02
rpool/data/vm-116-disk-1@rep_ps17_2017-11-27_17:45:13
rpool/data/vm-150-disk-1@rep_test_2017-11-24_14:36:35
rpool/data/vm-151-disk-1@rep_ps35_2017-11-27_17:30:38
rpool/data/vm-151-disk-1@rep_ps35_2017-11-27_17:45:59
rpool/data/vm-200-disk-1@__replicate_200-0_1511802000__
rpool/data/vm-201-disk-1@__replicate_201-0_1511802002__
rpool/data/vm-202-disk-1@__replicate_202-0_1511802005__
rpool/data/vm-203-disk-1@__replicate_203-0_1511802008__
rpool/data/vm-204-disk-1@__replicate_204-0_1511802012__
rpool/data/vm-205-disk-1@__replicate_205-0_1511802014__
rpool/data/vm-206-disk-1@__replicate_206-0_1511802018__
rpool/data/vm-207-disk-1@__replicate_207-0_1511801126__
rpool/data/vm-208-disk-1@__replicate_208-0_1511801129__
rpool/data/vm-209-disk-1@__replicate_209-0_1511801131__
rpool/data/vm-210-disk-1@__replicate_210-0_1511801136__
rpool/data/vm-211-disk-1@__replicate_211-0_1511801139__
rpool/data/vm-212-disk-1@__replicate_212-0_1511801141__
rpool/data/vm-213-disk-1@__replicate_213-0_1511801144__
rpool/data/vm-500-disk-1@__replicate_500-0_1511802000__
rpool/data/vm-501-disk-1@__replicate_501-0_1511802002__
rpool/data/vm-502-disk-2@__replicate_502-0_1511802006__
rpool/data/vm-503-disk-1@__replicate_503-0_1511802009__
rpool/data/vm-504-disk-1@__replicate_504-0_1511802012__
rpool/data/vm-505-disk-1@__replicate_505-0_1511802015__
rpool/data/vm-506-disk-1@__replicate_506-0_1511802018__
rpool/data/vm-507-disk-1@__replicate_507-0_1511801130__
rpool/data/vm-507-disk-1@__replicate_507-0_1511802021__
rpool/data/vm-508-disk-1@__replicate_508-0_1511801132__
rpool/data/vm-509-disk-1@__replicate_509-0_1511801135__
rpool/data/vm-510-disk-1@__replicate_510-0_1511801140__
rpool/data/vm-511-disk-1@__replicate_511-0_1511801142__
rpool/data/vm-512-disk-1@__replicate_512-0_1511454638__
rpool/data/vm-513-disk-1@__replicate_513-0_1511801145__
rpool/data/vm-514-disk-1@__replicate_514-0_1511801148__
rpool/data/vm-515-disk-1@__replicate_515-0_1511801151__

As I can see there is only one snapshot of vm-150. And the second one does not work. Just stopps with the error I provided before.
 
I'm examine the error but I think your pool is under heavy load and so something went wrong.
But i'm working on this case.
 
Hmm... I don't thik that this is the reason. It does not matter if the pool is under havy load or not (I had destroyed all other zsyncs for testing pourposes), the error for this one always appears.

So when I delete it, and then again create it, the first sync works. And then we get the error that there is a snapshot available.....
 
Hi Wolfgang, I do still have the same issues with pve-zsync.
Now the same error occoured on a "new VM" where the snyc first time broke because of a node downtime.

Of course this is clear that sync will not work if the node where the VM is running is down. But after the node was up again, I am faced again with the exactly same error, where a pve-zsync is telling me always "error" and this is because it is not able to delete some old snapshots (even if they do not exist).

Code:
node1:702               VM1                  error     2017-12-16_10:54:11 qemu  ssh

I did again tried to destroy the zfs pool "rpool/data/vm-702-disk-1" including all snapshots with the "-r" option.
Then I did a "pve-zsync destroy" to delete the job, and also found some files in "/var/lib/pve-zsync/data" with the VM name like "702.conf.qemu.rep_VM1_2017-12-16_10:54:11" which I deleted too.

Now with creating a new job with "pve-zsync create......" the sync works again the first time (telling me a warning that a old snapshot was not able to be deleted because not awailable) and then on the second sync the error occours again, what causes to not sync this VM any more. :-(

Where could pve-zsync have saved the info of old snapshots? I mean, why pve-zsync is always trying to delete an old snapshot that does not exist any more? I think this is causing error, that for some reason the command "pve-zsync destroy" does not deletes this information about an old snapshot.
 
I do have now a temporary workarround:

Because pve-zsync always complaints that a snapshot is allready available and cannot be overwritten because not found (even if the whole zfs pool is deleted), I have just choosen a different --name for the job. This way the snapshot also has another name and the job works without errors.

Anyhow, there must be a "real" solution to this problem. Maybe someone has more infos on that.

Regards
Andy
 
Do you have more jobs for one VM with the same name (none name)?
 
Hi @andi77,

If you do not want to re-create again and again the replication tasks, use pve-zsync. I can told you that is rock solid. I setup this and I forget about the problem. I use this for many month in several and different proxmox clusters.
Try it and then told me if I was wrong ;)

Hi)
Can pve-zsync replace native 'PVE Storage Replication' for HA (high availability) ?
For example when node will be in fail state VM or Container will be able automatically start on node where had made replica.
 
@wolfgang
Code:
2017-11-20 09:14:00 506-0: start replication job
2017-11-20 09:14:00 506-0: guest => VM 506, running => 6157
2017-11-20 09:14:00 506-0: volumes => local-zfs:vm-506-disk-1
2017-11-20 09:14:01 506-0: create snapshot '__replicate_506-0_1511165640__' on local-zfs:vm-506-disk-1
2017-11-20 09:14:01 506-0: full sync 'local-zfs:vm-506-disk-1' (__replicate_506-0_1511165640__)
2017-11-20 09:14:01 506-0: full send of rpool/data/vm-506-disk-1@__replicate_506-0_1511165640__ estimated size is 1.69G
2017-11-20 09:14:01 506-0: total estimated size is 1.69G
2017-11-20 09:14:01 506-0: TIME        SENT   SNAPSHOT
2017-11-20 09:14:01 506-0: rpool/data/vm-506-disk-1    name    rpool/data/vm-506-disk-1    -
2017-11-20 09:14:01 506-0: volume 'rpool/data/vm-506-disk-1' already exists
2017-11-20 09:14:01 506-0: warning: cannot send 'rpool/data/vm-506-disk-1@__replicate_506-0_1511165640__': signal received
2017-11-20 09:14:01 506-0: cannot send 'rpool/data/vm-506-disk-1': I/O error
2017-11-20 09:14:01 506-0: command 'zfs send -Rpv -- rpool/data/vm-506-disk-1@__replicate_506-0_1511165640__' failed: exit code 1
2017-11-20 09:14:01 506-0: delete previous replication snapshot '__replicate_506-0_1511165640__' on local-zfs:vm-506-disk-1
2017-11-20 09:14:01 506-0: end replication job with error: command 'set -o pipefail && pvesm export local-zfs:vm-506-disk-1 zfs - -with-snapshots 1 -snapshot __replicate_506-0_1511165640__ | /usr/bin/ssh -o 'BatchMode=yes' -o 'HostKeyAlias=c2b1px' root@10.0.0.100 -- pvesm import local-zfs:vm-506-disk-1 zfs - -with-snapshots 1' failed: exit code 255

I'm getting a very similar error for two of my VMs in my cluster when using the GUI replication feature. Only difference is that I don't have and "I/O error" for the second "cannot send" line but in my case it is "Broken pipe". And "Broken pipe" repeats for every snapshot of the VM (I have more than one).

When I remove the replication job, destroy the ZFS volumes of the VM on the backup host and create a replication job it works again. However this is the second time I encounter this and already for 2 VMs. Would be great if this could be stabilized (by any sort of workaround or patch). If I should post more information please say so.
 
Hi Asano,

can you please send me the output of this two commands.
Code:
cat /etc/pve/storage.cfg
zfs list -t all
[Code]
 
Hi Asano,

can you please send me the output of this two commands.
Code:
cat /etc/pve/storage.cfg
zfs list -t all

Sure, see below. The failed VMs yesterday were 113 and 114. hdds-sb is a cifs network share for all install ISOs and templates and an additional qcow2 image for "low I/O" content but this is only attached to 112 so shouldn't affect the rest. Currently everything is running fine since the recreation of the replication yesterday (in total there are 19 VMs in the cluster which all replicate every 5 minutes which really works great - below 3 sec for follow up replications and no measurable impact on performance - as long as there is not the kind of hick up like yesterday ;-)).

Code:
~# cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content backup,iso,vztmpl

zfspool: local-zfs
        pool rpool/data
        content images,rootdir
        sparse 1

dir: hdds-sb
        path /var/lib/vz/template/hdds
        content images
        shared 1

       
~# zfs list -t all
NAME                                                                USED  AVAIL  REFER  MOUNTPOINT
rpool                                                               163G  67.9G   104K  /rpool
rpool/ROOT                                                         1.58G  67.9G    96K  /rpool/ROOT
rpool/ROOT/pve-1                                                   1.58G  67.9G  1.14G  /
rpool/ROOT/pve-1@init                                              2.79M      -   854M  -
rpool/ROOT/pve-1@network-up                                        1.54M      -   854M  -
rpool/ROOT/pve-1@network-up-2                                      1.31M      -   854M  -
rpool/ROOT/pve-1@external-hosts                                    1.38M      -   854M  -
rpool/ROOT/pve-1@init-cluster                                      2.14M      -   855M  -
rpool/ROOT/pve-1@firewall-logrotate-network                        45.5M      -   869M  -
rpool/ROOT/pve-1@subscribed-updated                                41.9M      -  1.13G  -
rpool/ROOT/pve-1@after-px-set                                      44.7M      -  1.13G  -
rpool/ROOT/pve-1@before-snap-cron                                  50.2M      -  1.13G  -
rpool/data                                                          152G  67.9G   104K  /rpool/data
rpool/data/subvol-111-disk-1                                       3.14G  8.50G  1.50G  /rpool/data/subvol-111-disk-1
rpool/data/subvol-111-disk-1@base                                   127M      -   334M  -
rpool/data/subvol-111-disk-1@AfterPxSetup                           761M      -  1.90G  -
rpool/data/subvol-111-disk-1@autoweekly180211000005                   0B      -  1.49G  -
rpool/data/subvol-111-disk-1@autodaily180211000005                    0B      -  1.49G  -
rpool/data/subvol-111-disk-1@autodaily180212000004                 93.1M      -  1.49G  -
rpool/data/subvol-111-disk-1@autodaily180213000003                  135M      -  1.53G  -
rpool/data/subvol-111-disk-1@autodaily180214000004                  104M      -  1.50G  -
rpool/data/subvol-111-disk-1@autohourly180214100003                25.3M      -  1.50G  -
rpool/data/subvol-111-disk-1@autohourly180214110003                19.6M      -  1.50G  -
rpool/data/subvol-111-disk-1@autohourly180214120003                19.6M      -  1.50G  -
rpool/data/subvol-111-disk-1@autohourly180214130004                20.7M      -  1.50G  -
rpool/data/subvol-111-disk-1@autohourly180214140004                22.1M      -  1.50G  -
rpool/data/subvol-111-disk-1@autohourly180214150003                19.7M      -  1.51G  -
rpool/data/subvol-111-disk-1@__replicate_111-0_1518619504__           0B      -  1.50G  -
rpool/data/vm-101-disk-1                                            965M  67.9G   852M  -
rpool/data/vm-101-disk-1@basicDhcpDns                              9.75M      -   844M  -
rpool/data/vm-101-disk-1@base2                                     9.26M      -   845M  -
rpool/data/vm-101-disk-1@AfterPxSetup                              11.5M      -   847M  -
rpool/data/vm-101-disk-1@OpenVPNInstalled                          8.03M      -   849M  -
rpool/data/vm-101-disk-1@OpenVPNIntranet                              7M      -   849M  -
rpool/data/vm-101-disk-1@SomeBackup                                10.7M      -   849M  -
rpool/data/vm-101-disk-1@autoweekly180211000003                    8.28M      -   850M  -
rpool/data/vm-101-disk-1@autodaily180213000004                     5.31M      -   851M  -
rpool/data/vm-101-disk-1@autodaily180214000003                     6.26M      -   852M  -
rpool/data/vm-101-disk-1@autohourly180214100003                    3.44M      -   852M  -
rpool/data/vm-101-disk-1@autohourly180214110003                    2.78M      -   852M  -
rpool/data/vm-101-disk-1@autohourly180214120003                    2.82M      -   852M  -
rpool/data/vm-101-disk-1@autohourly180214130003                    2.82M      -   852M  -
rpool/data/vm-101-disk-1@autohourly180214140003                    2.93M      -   852M  -
rpool/data/vm-101-disk-1@autohourly180214150004                    2.84M      -   852M  -
rpool/data/vm-101-disk-1@__replicate_101-0_1518619500__            2.34M      -   852M  -
rpool/data/vm-102-disk-1                                           6.90G  67.9G  4.97G  -
rpool/data/vm-102-disk-1@AfterLandscapeReg                          550M      -  3.47G  -
rpool/data/vm-102-disk-1@autodaily180211000005                      281M      -  4.67G  -
rpool/data/vm-102-disk-1@autohourly180211140002                       0B      -  4.67G  -
rpool/data/vm-102-disk-1@autohourly180211140003                       0B      -  4.67G  -
rpool/data/vm-102-disk-1@autohourly180211150002                       0B      -  4.67G  -
rpool/data/vm-102-disk-1@autohourly180211150003                       0B      -  4.67G  -
rpool/data/vm-102-disk-1@autohourly180211160003                       0B      -  4.67G  -
rpool/data/vm-102-disk-1@autohourly180211160004                       0B      -  4.67G  -
rpool/data/vm-102-disk-1@__replicate_102-0_1518619500__               0B      -  4.97G  -
rpool/data/vm-103-disk-1                                            921M  67.9G   826M  -
rpool/data/vm-103-disk-1@basicDhcpDns                              8.14M      -   817M  -
rpool/data/vm-103-disk-1@base2                                     9.18M      -   819M  -
rpool/data/vm-103-disk-1@AfterPxSetup                              10.4M      -   820M  -
rpool/data/vm-103-disk-1@OpenVPNIntranet                           9.08M      -   823M  -
rpool/data/vm-103-disk-1@SomeBackup                                8.35M      -   824M  -
rpool/data/vm-103-disk-1@autoweekly180211000005                       0B      -   824M  -
rpool/data/vm-103-disk-1@autodaily180211000005                        0B      -   824M  -
rpool/data/vm-103-disk-1@autodaily180212000004                     7.05M      -   824M  -
rpool/data/vm-103-disk-1@autodaily180214000004                     6.80M      -   825M  -
rpool/data/vm-103-disk-1@autohourly180214100003                    3.04M      -   826M  -
rpool/data/vm-103-disk-1@autohourly180214110003                    2.37M      -   826M  -
rpool/data/vm-103-disk-1@autohourly180214120003                    2.42M      -   826M  -
rpool/data/vm-103-disk-1@autohourly180214130004                    2.42M      -   826M  -
rpool/data/vm-103-disk-1@autohourly180214140004                    2.42M      -   826M  -
rpool/data/vm-103-disk-1@autohourly180214150003                    2.34M      -   826M  -
rpool/data/vm-103-disk-1@__replicate_103-0_1518619502__               0B      -   826M  -
rpool/data/vm-110-disk-1                                           19.4G  67.9G  13.4G  -
rpool/data/vm-110-disk-1@AllRunning                                 919M      -  10.1G  -
rpool/data/vm-110-disk-1@autoweekly180211000003                    1.06G      -  12.4G  -
rpool/data/vm-110-disk-1@autodaily180212000003                      796M      -  12.7G  -
rpool/data/vm-110-disk-1@autodaily180213000004                      847M      -  12.9G  -
rpool/data/vm-110-disk-1@autodaily180214000003                      524M      -  13.2G  -
rpool/data/vm-110-disk-1@autohourly180214100003                    99.3M      -  13.3G  -
rpool/data/vm-110-disk-1@autohourly180214110003                    54.4M      -  13.3G  -
rpool/data/vm-110-disk-1@autohourly180214120003                    39.9M      -  13.3G  -
rpool/data/vm-110-disk-1@autohourly180214130003                    45.7M      -  13.3G  -
rpool/data/vm-110-disk-1@autohourly180214140003                    48.7M      -  13.4G  -
rpool/data/vm-110-disk-1@autohourly180214150004                    44.9M      -  13.4G  -
rpool/data/vm-110-disk-1@__replicate_110-0_1518619800__            11.3M      -  13.4G  -
rpool/data/vm-112-disk-1                                           93.7G  67.9G  26.8G  -
rpool/data/vm-112-disk-1@autodaily180211000003                     17.6G      -  23.3G  -
rpool/data/vm-112-disk-1@autodaily180213000004                     18.2G      -  26.9G  -
rpool/data/vm-112-disk-1@autodaily180214000003                     15.4G      -  26.8G  -
rpool/data/vm-112-disk-1@autohourly180214100003                    2.59G      -  26.8G  -
rpool/data/vm-112-disk-1@autohourly180214110003                    73.3M      -  26.8G  -
rpool/data/vm-112-disk-1@autohourly180214120003                    74.7M      -  26.8G  -
rpool/data/vm-112-disk-1@autohourly180214130003                    75.8M      -  26.8G  -
rpool/data/vm-112-disk-1@autohourly180214140003                    88.7M      -  26.8G  -
rpool/data/vm-112-disk-1@autohourly180214150004                    91.0M      -  26.8G  -
rpool/data/vm-112-disk-1@__replicate_112-0_1518619803__            19.0M      -  26.8G  -
rpool/data/vm-113-disk-1                                           14.3G  67.9G  11.7G  -
rpool/data/vm-113-disk-1@AllRunning                                 762M      -  10.8G  -
rpool/data/vm-113-disk-1@autoweekly180211000003                     216K      -  11.3G  -
rpool/data/vm-113-disk-1@autodaily180211000003                      216K      -  11.3G  -
rpool/data/vm-113-disk-1@autodaily180212000003                      176M      -  11.4G  -
rpool/data/vm-113-disk-1@autodaily180213000004                      331M      -  11.5G  -
rpool/data/vm-113-disk-1@autodaily180214000003                      230M      -  11.6G  -
rpool/data/vm-113-disk-1@autohourly180214100003                    24.3M      -  11.7G  -
rpool/data/vm-113-disk-1@autohourly180214110003                    5.93M      -  11.7G  -
rpool/data/vm-113-disk-1@autohourly180214120003                    6.07M      -  11.7G  -
rpool/data/vm-113-disk-1@autohourly180214130003                    6.11M      -  11.7G  -
rpool/data/vm-113-disk-1@autohourly180214140003                    5.22M      -  11.7G  -
rpool/data/vm-113-disk-1@autohourly180214150004                    5.96M      -  11.7G  -
rpool/data/vm-113-disk-1@__replicate_113-0_1518619808__            5.07M      -  11.7G  -
rpool/data/vm-113-state-AllRunning                                 4.19G  67.9G  4.19G  -
rpool/data/vm-113-state-AllRunning@__replicate_113-0_1518619808__     0B      -  4.19G  -
rpool/data/vm-114-disk-1                                           8.55G  67.9G  6.53G  -
rpool/data/vm-114-disk-1@SomeStuffRunning                           720M      -  4.34G  -
rpool/data/vm-114-disk-1@autoweekly180211000003                       0B      -  5.49G  -
rpool/data/vm-114-disk-1@autodaily180211000003                        0B      -  5.49G  -
rpool/data/vm-114-disk-1@autodaily180212000003                      277M      -  6.08G  -
rpool/data/vm-114-disk-1@autodaily180213000004                      392M      -  6.32G  -
rpool/data/vm-114-disk-1@autodaily180214000003                      174M      -  6.53G  -
rpool/data/vm-114-disk-1@autohourly180214100003                    19.6M      -  6.53G  -
rpool/data/vm-114-disk-1@autohourly180214110003                    1.97M      -  6.53G  -
rpool/data/vm-114-disk-1@autohourly180214120003                    1.96M      -  6.53G  -
rpool/data/vm-114-disk-1@autohourly180214130003                    2.02M      -  6.53G  -
rpool/data/vm-114-disk-1@autohourly180214140003                    2.09M      -  6.53G  -
rpool/data/vm-114-disk-1@autohourly180214150004                    2.05M      -  6.53G  -
rpool/data/vm-114-disk-1@__replicate_114-0_1518619811__            1.19M      -  6.53G  -
rpool/swap                                                         8.50G  76.4G  7.57M  -
 
Which tool makes the "auto*" snapshots?
 
Which tool makes the "auto*" snapshots?
That is eve4pve-autosnap: https://github.com/EnterpriseVE/eve4pve-autosnap/blob/master/eve4pve-autosnap
Code looks quite well-arranged to me and basically it's not doing more than creating a cron for each label (hourly, daily, weekly) which then takes a snapshot and has a retention policy. As far as I can see it's doing everything with `qm` so should be well compatible.

Side note: Imho it would be nice if we could just get a retention policy for the `__replicate_*` snapshots which take place anyways and also seem to have a hard coded retention policy which always keeps the most recent one. This way we could also easily exclude certain images from snapshots ;-)
 
I seem to be having the same problem

Code:
root@pvelaptop:~# zfs list -t all
NAME                                                                           USED  AVAIL     REFER  MOUNTPOINT
rpool                                                                         35.6G   179G      104K  /rpool
rpool/ROOT                                                                    2.10G   179G       96K  /rpool/ROOT
rpool/ROOT/pve-1                                                              2.10G   179G     2.10G  /
rpool/data                                                                    33.4G   179G      104K  /rpool/data
rpool/data/subvol-103-disk-0                                                   776M  7.24G      776M  /rpool/data/subvol-103-disk-0
rpool/data/vm-100-disk-0                                                        56K   179G       56K  -
rpool/data/vm-100-disk-0@MigratedPreDataUpdate                                   0B      -       56K  -
rpool/data/vm-100-disk-0@__replicate_100-1_1662619501__                          0B      -       56K  -
rpool/data/vm-100-disk-1                                                      30.9G   179G     22.8G  -
rpool/data/vm-100-disk-1@MigratedPreDataUpdate                                8.09G      -     13.1G  -
rpool/data/vm-100-disk-1@__replicate_100-1_1662619501__                       2.67M      -     22.8G  -
rpool/data/vm-100-state-MigratedPreDataUpdate                                 1.79G   179G     1.79G  -
rpool/data/vm-100-state-MigratedPreDataUpdate@__replicate_100-1_1662619501__     0B      -     1.79G  -
root@pvelaptop:~#

Code:
root@pvelaptop:~# cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content iso,vztmpl,backup

zfspool: local-zfs
        pool rpool/data
        sparse
        content images,rootdir

Screenshot from 2022-09-08 16-53-27.png
 
Last edited:
2023-12-26 17:19:09 192-0: cannot receive: local origin for clone rpool/data/vm-192-disk-1@S2023_12_18_10_54 does not exist
2023-12-26 17:19:09 192-0: cannot open 'rpool/data/vm-192-disk-1': dataset does not exist
2023-12-26 17:19:09 192-0: command 'zfs recv -F -- rpool/data/vm-192-disk-1' failed: exit code 1
2023-12-26 17:19:09 192-0: warning: cannot send 'rpool/data/vm-192-disk-1@S2023_12_18_10_54': signal received
2023-12-26 17:19:09 192-0: TIME SENT SNAPSHOT rpool/data/vm-192-disk-1@S_2023_12_23_11_58
2023-12-26 17:19:09 192-0: warning: cannot send 'rpool/data/vm-192-disk-1@S_2023_12_23_11_58': Broken pipe
2023-12-26 17:19:09 192-0: TIME SENT SNAPSHOT rpool/data/vm-192-disk-1@__replicate_192-0_1703571544__
2023-12-26 17:19:09 192-0: warning: cannot send 'rpool/data/vm-192-disk-1@__replicate_192-0_1703571544__': Broken pipe
2023-12-26 17:19:09 192-0: cannot send 'rpool/data/vm-192-disk-1': I/O error
2023-12-26 17:19:09 192-0: command 'zfs send -Rpv -- rpool/data/vm-192-disk-1@__replicate_192-0_1703571544__' failed: exit code 1
2023-12-26 17:19:09 192-0: delete previous replication snapshot '__replicate_192-0_1703571544__' on local-zfs:base-9002-disk-0/vm-192-disk-1
2023-12-26 17:19:09 192-0: delete previous replication snapshot '__replicate_192-0_1703571544__' on local-zfs:base-9002-disk-1/vm-192-disk-0
2023-12-26 17:19:09 192-0: delete previous replication snapshot '__replicate_192-0_1703571544__' on local-zfs:vm-192-disk-2
2023-12-26 17:19:09 192-0: delete previous replication snapshot '__replicate_192-0_1703571544__' on local-zfs:vm-192-state-S2023_12_18_10_54
2023-12-26 17:19:09 192-0: delete previous replication snapshot '__replicate_192-0_1703571544__' on local-zfs:vm-192-state-S_2023_12_23_11_58
2023-12-26 17:19:09 192-0: end replication job with error: command 'set -o pipefail && pvesm export local-zfs:base-9002-disk-0/vm-192-disk-1 zfs - -with-snapshots 1 -snapshot __replicate_192-0_1703571544__ | /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=dsco-vm-01' root@192.168.3.61 -- pvesm import local-zfs:base-9002-disk-0/vm-192-disk-1 zfs - -with-snapshots 1 -snapshot __replicate_192-0_1703571544__ -allow-rename 0' failed: exit code 1


Can anyone suggest what i can do?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!