[SOLVED] Proxmox backup server continuous restore script

EuroDomenii

Well-Known Member
Sep 30, 2016
144
30
48
Slatina
www.domenii.eu
The use case is a production Proxmox server sending incremental backups via pbs to a remote datastore on another Proxmox backup server.

The other Proxmox backup server would act as a hot standby ( couldn’t be as synchronized as https://pve.proxmox.com/wiki/Storage_Replication, due to longer restore times). In case of downtime of the production server, the failover IPs should be rerouted to the hot standby server.

This is a stub script that is supposed to run on cron.

cat pbs_continuous_restore.pl
Perl:
#!/usr/bin/perl

# by EuroDomenii - MIT -  2020

use strict;
use warnings;

sub uniq {
    my %seen;
    grep !$seen{$_}++, @_;
}

#ToDo execute only when there's no other restore task running, via proxmox-backup-client task list

#ToDo Setup --repository workflow.
#Temporary manual setup sample: export PBS_REPOSITORY="root@pam@localhost:store2"

my @vmids = `proxmox-backup-client snapshots --output-format=json-pretty | jq '.[] | ."backup-id"'`;

my @filtered = uniq(@vmids);

foreach my $i (@filtered) {
        chomp $i;
        my @timestamps = `proxmox-backup-client snapshots --output-format=json-pretty | jq -r '.[] | select(."backup-id" == $i) | ."backup-time"'`;
        my @sorted = sort @timestamps;
        my $latest = pop @sorted;
        my $backup_type = `proxmox-backup-client snapshots --output-format=json-pretty | jq -r '.[] | select(."backup-id" == $i and ."backup-time" == $latest) | ."backup-type"'`;
        print "ID $i LATEST timestamp $latest for backup type $backup_type";
        #ToDo FeatureRequest: It would be helpful if the output format json-pretty would provide out of the box a snapshot field like text format, in order to avoid reconstruction

        #WIP restore
        #Todo format date with T and Z
        #my $date = `date -u -d @$latest  +'%Y-%m-%d %H:%M:%S'`;
        #proxmox-backup-client restore $backup_type/$i/$date index.json -
        #or maybe qmrestore
        #Asked here https://forum.proxmox.com/threads/pbs-restore-proxmox-kvm-from-cli.73163/
}

Any suggestions to move on are more than welcome. Thanks!
 
Last edited:
  • Like
Reactions: pablopol24
Further improvement of the program logic…

Since this program is supposed to run on cron every minute, I want to make sure that I loop through all VMs before starting another cycle, so there’s a need for a lock.

This works https://www.perl.com/article/2/2015/11/4/Run-only-one-instance-of-a-program-at-a-time/


Perl:
use Fcntl qw(:flock);
open my $file, ">", "app.lock" or die $!;
flock $file, LOCK_EX|LOCK_NB or die "Unable to lock file $!";
# we have the lock


On the other hand, also simultaneous restores are possible, I think is better to run only one restore at a time

Use case

Bash:
root@kevin:~# df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            2.0G     0  2.0G   0% /dev
tmpfs           395M  704K  394M   1% /run
/dev/sda1       191G  108G   84G  57% /
tmpfs           2.0G     0  2.0G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           2.0G     0  2.0G   0% /sys/fs/cgroup
/dev/sda15      105M  3.6M  101M   4% /boot/efi
/dev/sdb        492G  101G  366G  22% /var/sata
tmpfs           395M     0  395M   0% /run/user/0

The restore is slower since both /dev/sda and /dev/sdb in this test are on HDD ( usually /dev/sda is on NVME/SSD)

5 kvm restore simultaneous

restore image complete (bytes=211741048832, duration=5021.59s, speed=40.21MB/s) + restore image complete (bytes=536870912000, duration=5496.60s, speed=93.15MB/s) 2.92 h, around 3 hours

single kvm restore

restore image complete (bytes=211741048832, duration=1956.54s, speed=103.21MB/s) + restore image complete (bytes=536870912000, duration=1381.97s, speed=370.49MB/s ) 0.92 h, around 1 hour

From the above example, instead of waiting 3 hours for all the KVM to restore simultaneous ( from a 3 hours old backup), it’s better to restore each VM sequentially, from the most recent incremental backup ( in this case 1 hour) ( Dirty Bitmaps incremental backups in RAM are so fast that you can afford refreshing at 15 min in datastore).

Also, I need to make sure, when looping through @vmids that only one restore task is running at the time
proxmox-backup-client task list doesn’t provide the current running backup tasks

I could try to postpone the next restore, while /usr/bin/pbs-restore is running, something like https://stackoverflow.com/questions/3844168/how-can-i-check-if-a-unix-process-is-running-in-perl


Bash:
ps -aef | grep pbs-restore
root       557 23643  0 03:44 pts/1    00:00:00 grep pbs-restore
root     31828 31741 32 03:40 ?        00:01:01 /usr/bin/pbs-restore --repository root@pam@localhost:store2 vm/100/2020-07-16T09:52:09Z drive-scsi0.img.fidx /dev/vmdata/vm-102-disk-0 --verbose --format raw --skip-zero
 
  • Like
Reactions: pablopol24
First working version:

Perl:
#!/usr/bin/perl

# by EuroDomenii - MIT -  2020
# Prereq:
# apt install jq

use strict;
use warnings;

use POSIX qw(strftime);

use File::Basename;
my $dirname = dirname(__FILE__);

use Fcntl qw(:flock);
open my $file, ">", "$dirname/app.lock" or die $!;
flock $file, LOCK_EX|LOCK_NB or die "Unable to lock file $!";
# we have the lock

my @par = @ARGV;
my $repository = "";
my $password = "";
my $prefix = "";

for (my $i = 0; $i <= $#par; ++$i) {
    local $_ = $par[$i];
    if (/--password/){
        $password = $par[++$i];
        next;
    }
    if (/--repository/) {
        $repository = $par[++$i];
        next;
    }
    if (/--prefix/) {
        $prefix = $par[++$i];
        next;
    }
}

if ($repository eq "") {
    print q(Please setup --repository parameter. Format sample: "myuser@pbs@localhost:store2");
    print "\n";
    exit;
}

if ($password eq "") {
    print q(Please setup --password parameter. Use single quotes for password to avoid exclamation mark issues in bash parameters. Format sample:  --password 'Zrs$#bVn1aQKLgzA6Lc0OJTB#RMSR**qZ6!MO9KKY');
    print "\n";
    exit;
}

sub uniq {
    my %seen;
    grep !$seen{$_}++, @_;
}

$ENV{PBS_REPOSITORY} = $repository;
$ENV{PBS_PASSWORD} = $password;

#Credits https://www.perlmonks.org/?node_id=786670
system("proxmox-backup-client login");
#for testing run proxmox-backup-client logout --repository $repository

my @vmids = `proxmox-backup-client snapshots --output-format=json-pretty | jq '.[] | ."backup-id"'`;
my @filtered = uniq(@vmids);

foreach my $i (@filtered) {
    chomp $i;
    #Remove quotes
    my $id = substr $i, 1,-1;
    my $first_digit = substr($id, 0, 1);
    if ( $prefix ne "" and $prefix ne $first_digit ) {
        next;
    }

    my @timestamps = `proxmox-backup-client snapshots --output-format=json-pretty | jq -r '.[] | select(."backup-id" == $i) | ."backup-time"'`;
    my @sorted = sort @timestamps;
    my $latest = pop @sorted;

    my $backup_type = `proxmox-backup-client snapshots --output-format=json-pretty | jq -r '.[] | select(."backup-id" == $i and ."backup-time" == $latest) | ."backup-type"'`;
    chomp $backup_type;

    my $datestring = strftime "%Y-%m-%dT%H:%M:%SZ", gmtime($latest);
    #ToDo FeatureRequest: It would be helpful if the output format json-pretty would provide out of the box a snapshot field like text format, in order to avoid reconstruction
    my $snapshot = "$backup_type/$id/$datestring";

    #Instead of working with pvesh get /nodes/{node}/qemu/{vmid}, let's go at cluster level
    my $status = `pvesh get /cluster/resources --output-format=json-pretty | jq -r '.[] | select(.vmid == $id) | .status'`;
    chomp $status;
    my (undef, $storage) = split(':\s*', $repository);

    if( $backup_type  eq  "vm" ) {
        #ToDo Before stopping / destroying the VM, it would be better to restore to another id. In case that production server goes down and the restoring process is too long on the standby server, there would be the option to go online with a previous restored VM. For the moment is low priority, due to the burden of keeping track of the correlation between different stages of restore, for the same VM.
        if ($status eq "running") {
            #play safer with stop instead shutdown. Anyway, next step is destroy, so it doesn't matter consistency.
            system("qm stop $id --skiplock true");
        }
        if ($status ne "") {
            system("qm destroy $id --skiplock true --purge true");
        }

        #https://forum.proxmox.com/threads/pbs-restore-proxmox-kvm-from-cli.73163/#post-327076
        #no need to test running restore task via "proxmox-backup-client task list", anyway it restores sequentially
        my $qmrestore = `qmrestore --force true $storage:backup/$snapshot $id`;

        printf "Restoring VM id $id from snapshot $snapshot on storage $storage\n";
        system("qm start $id --skiplock true");
        printf "Starting VM id $id \n";

    } elsif( $backup_type  eq  "ct" ) {
        if ($status eq "running") {
            system("pct stop $id --skiplock true");
        }
        if ($status ne "") {
            system("pct destroy $id --force true --purge true");
        }

        my $lxcrestore = `pct restore --force true --unprivileged true $id $storage:backup/$snapshot`;

        printf "Restoring Container id $id from snapshot $snapshot on storage $storage\n";
        system("pct start $id --skiplock true");
        printf "Starting Container id $id \n";
    } else {
        printf "Skipping... incorect backup type $backup_type\n";
    }

}
 
Last edited:
Rationale
  1. The true High Availability with Proxmox VE could be achieved with network distributed storage, like Ceph. But, compared with local storage, there’s a performance penalty, even with 10G network.
  2. A better balanced solution is https://pve.proxmox.com/wiki/Storage_Replication ( with a COW filesystem like ZFS/BTRFS). The storage is local, but the replication uses snapshots to minimize traffic sent over the network. Therefore, new data is sent only incrementally after the initial full sync, with send receive feature between snapshots. The true advantage is that the replicated remote node doesn’t need to restore the VM, the diff snapshot is ready to be online in case of production downtime, acting as a hot standby, synchronized as low as 15 min. From my benchmarks, with a hundred GB VM, the ZFS/BTRFS send receive of incremental is blazing fast ( matter of minutes).
  3. Still, the best performance is with a local non COW storage ( because copy on write file systems like ZFS/BTRFS are not the best suited for virtualization or databases). Best choice would be LVM-thin on mdadm, in order to have local snapshots feature. See my results from comparative phoronix suites tests: https://openbenchmarking.org/result/2007127-EURO-200712128
    The use case is a production Proxmox server sending incremental backups via proxmox backup server to a remote datastore on another Proxmox backup server. The other Proxmox backup server would act as a hot standby. In case of downtime of the production server, the failover IPs should be rerouted to the hot standby server.
    There’s a trade off for the maximum performance non cow storage. It needs restore on the remote server.
    Having incremental and deduplicated backups plus a very flexible pruning schedule available can allow one to make backups more often so one has always a recent state ready. via t.lamprecht Proxmox Staff Member https://forum.proxmox.com/threads/proxmox-backup-server-beta.72677/page-2#post-324884
    Now comes the need for the current proxmox backup server continuous restore script. For the Proxmox VE Backup server to behave as a hot standby, we need a script to continuously restore on the remote server. It couldn’t be as synchronized as https://pve.proxmox.com/wiki/Storage_Replication, at 15 min, due to longer restore times. But, depending on the size of VMs, you could have up and running from several minutes to a few hours old version of production VMs , depending on the restore queue.
Prerequisites

Proxmox PVE & PBS

Proxmox backup server could be installed as a standalone product, but, in order to act as a hot standby for other Proxmox VE production server, it must be installed on top of Proxmox VE server. (See https://pbs.proxmox.com/docs/installation.html#install-proxmox-backup-server-on-proxmox-ve). Proxmox-backup-client is already included in Proxmox VE.

Other packers required:
apt install jq

Sample configuration

Production server

Code:
root@melania:~# cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content iso,snippets,vztmpl,rootdir,backup,images
        maxfiles 0
        shared 0

dir: sata
        path /ab/backup/eurodomenii
        content iso,images,backup
        maxfiles 0
        shared 0

lvmthin: local-lvm
        thinpool vmstore
        vgname vmdata
        content rootdir,images

pbs: max
        disable
        datastore melania
        server 192.168.7.158
        content backup
        fingerprint bc:9d:f7:b9:ce:d3:cd:07:2d:8f:d8:e4:99:2a:69:41:11:db:a5:4c:16:5f:5d:de:aa:42:55:ab:f2:65:99:bc
        maxfiles 0
        username melania@pbs

pbs: local_melania
        datastore store_melania
        server localhost
        content backup
        fingerprint 97:9c:cd:5b:a8:0d:67:84:53:fc:93:83:ea:dc:3e:83:d1:24:28:75:70:aa:cf:13:38:da:07:d0:51:be:eb:a4
        maxfiles 0
        username realmelania@pbs

Remote Proxmox VE Backup server hot standby

Code:
root@max:/# cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content iso,rootdir,backup,images,vztmpl,snippets
        maxfiles 0
        shared 0

lvmthin: local-lvm
        thinpool vmstore
        vgname vmdata
        content images,rootdir

pbs: melania
        datastore melania
        server localhost
        content backup
        fingerprint bc:9d:f7:b9:ce:d3:cd:07:2d:8f:d8:e4:99:2a:69:41:11:db:a5:4c:16:5f:5d:de:aa:42:55:ab:f2:65:99:bc
        maxfiles 0
        username melania@pbs

dir: sata
        path /var/eurodomenii/backup
        content vztmpl,snippets,images,rootdir,iso,backup
        maxfiles 0
        shared 0

Usage

Parameters
--repository

Format sample: --repository "myuser@pbs@localhost:store2"

--password

Use single quotes for password to avoid exclamation mark issues in bash parameters.

Format sample: --password 'Zrs$#bVn1aQKLgzA6Lc0OJTB#RMSR**qZ6!MO9KKY'

--prefix

The first digit of the VM-Container id

Format sample: --prefix 4

Sample single instance run

From cli

Code:
chmod +x /var/eurodomenii/scripts/pbs_continuous_restore/pbs_continuous_restore.pl
root@max:/# /var/eurodomenii/scripts/pbs_continuous_restore/pbs_continuous_restore.pl --repository melania@pbs@localhost:melania --password 'Zrs$#bVn1aQKLgzA6Lc0OJTB#RMSR**qZ6!MO9KKY'

From Cron

This is the preferable setup. The script should run every minute, but there’s an app.lock file that prevents breaking the foreach loop through each VMids from the datastore.

Code:
root@max:/var/eurodomenii/scripts/pbs_continuous_restore# cat pbs_bash.sh
#!/bin/bash
export PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
perl /var/eurodomenii/scripts/pbs_continuous_restore/pbs_continuous_restore.pl --repository melania@pbs@localhost:melania --password 'Zrs$#bVn1aQKLgzA6Lc0OJTB#RMSR**qZ6!MO9KKY'

Code:
root@max:/# crontab -l
* * * * * /var/eurodomenii/scripts/pbs_continuous_restore/pbs_bash.sh > /dev/null 2>&1

Sequentially versus simultaneous restores

Normally, the restore process runs sequentially. This has a big advantage: since we choose to restore the latest version of an incremental backup, during the restore of a virtual machine, there’s a good chance that remote sync job will bring a newer incremental version for the next virtual machine in the continuous restore queue. Even the simultaneous restore has a slight time advantage, it’s a bad architecture to restore too many at once, from older incremental versions.

However, depending on the particular use cases, 2 or 3 threads of restoring might be the best balanced solution, prioritizing your clients.

This can be achieved running on cron, every minute, 2-3 instance of the script, in different directories, in order to avoid conflicts of app.lock file

Further, there’s 2 ways of doing it:

  • either you run each script with a different repository parameter. Can't do this, since at the moment there's not --password command line option, but only PBS_PASSWORD environment variable, that will be overridden by multiple scripts.
  • either you run each script with a different prefix parameter. ( by prefix meaning the first digit of the VM-Container id). Using the first digit as a prefix is somehow a “dummy” solution. Instead, as a proxmox feature request, some kind of tagging of VM-Containers would be very useful!
Sample multiple instance run from cron

Code:
root@max:/# cat /var/eurodomenii/scripts/restore1/pbs_bash.sh
#!/bin/bash
export PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
perl /var/eurodomenii/scripts/restore1/pbs_continuous_restore.pl --repository melania@pbs@localhost:melania --password 'Zrs$#bVn1aQKLgzA6Lc0OJTB#RMSR**qZ6!MO9KKY' --prefix 3

Code:
root@max:/# cat /var/eurodomenii/scripts/restore2/pbs_bash.sh
#!/bin/bash
export PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
perl /var/eurodomenii/scripts/restore2/pbs_continuous_restore.pl --repository melania@pbs@localhost:melania --password 'Zrs$#bVn1aQKLgzA6Lc0OJTB#RMSR**qZ6!MO9KKY' --prefix 4

Code:
root@max:/# crontab -l
* * * * * /var/eurodomenii/scripts/restore1/pbs_bash.sh > /dev/null 2>&1
* * * * * /var/eurodomenii/scripts/restore2/pbs_bash.sh > /tmp/cronjob2.log 2>&1

Tip: check with ps -aux | grep perl when running simultaneous process

Roadmap

Todo
  • Before stopping / destroying the VM, it would be better to restore to another id. In case that production server goes down and the restoring process is too long on the standby server, there would be the option to go online with a previous restored VM. For the moment is low priority, due to the burden of keeping track of the correlation between different stages of restore, for the same VM.
Proxmox feature requests to improve workflow
  • Can't run multiple instances of the script with different repositories, since at the moment there's no --password command line option, but only PBS_PASSWORD environment variable, that will be overridden.
  • It would be helpful if the output format json-pretty would provide out of the box a snapshot field like text format, in order to avoid reconstruction.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!