Any Proxmox Ceph Users Interested in helping test Benji

adamb

Famous Member
Mar 1, 2012
1,329
77
113
Hoping this doesn't cause any issues as far as the rules go.

I have been testing this for well over 6 months now and its been really solid for a Ceph backup solution.

https://github.com/elemental-lf/benji

I backup my ceph rbd volumes every night via benji and the overall space usage is still only equal to one copy. The deduplication and compression aspects are awsome.

Only issue is it needs more testers! Easily one of the best backup solutions I have worked with to date! Can't wait to see it go stable. Good news is its a fork of backy2 which has a pretty good following. However Benji has some great enhancements to make it shine.

I was hoping the proxmox community might help bring Benji to a stable release. Anyone interested in testing?
 
I've been running this about 4 months now and it really has been great. I setup the destination to use Ceph's rados gateway (S3 compatible) with 256 bit AES encryption and compression. Duplication and backing up source RBD images using fast-diff (now possible with recent Ceph kernel module changes) makes it super fast and efficient. I get about 500MiB/s on deltas to an erasure coded hdd pool.

The ability to mount backup images using nbd and then mount the file systems for selective recovery is super useful and one can easily amend a VM to reference the resulting block device without first having to restore the entire image.
 
Hi, I have looked at this some months ago, it was looking great.

But for our ceph proxmox backups, we have implemented our own solution, as we want to have a ceph storage as backup storage. (ceph->ceph backup), as It's really easier to keep snapshot history on backup storage without need to merge/compare diff backup, we simply import delta rbd.
we also need to possibility to restore files from the backup snapshot fast.
and we also needed proxmox api integration (for agent fsfreeze before snapshot)

If somebody want to test it, we have released code to github
https://github.com/JackSlateur/backurne

(maybe it's lack a little bit of documentation)
 
Hi, I have looked at this some months ago, it was looking great.

But for our ceph proxmox backups, we have implemented our own solution, as we want to have a ceph storage as backup storage. (ceph->ceph backup), as It's really easier to keep snapshot history on backup storage without need to merge/compare diff backup, we simply import delta rbd.
we also need to possibility to restore files from the backup snapshot fast.
and we also needed proxmox api integration (for agent fsfreeze before snapshot)

If somebody want to test it, we have released code to github
https://github.com/JackSlateur/backurne

(maybe it's lack a little bit of documentation)

Interesting as well. We had a need/requirement to have our backups on a different solution. We also needed the ability to restore filesystem level backups quickly and efficiently to other types of systems with older kernels.

We could possibly be building out a new ceph cluster with different requirements that your project might fit well. I have added it to my list!
 
I've been running this about 4 months now and it really has been great. I setup the destination to use Ceph's rados gateway (S3 compatible) with 256 bit AES encryption and compression. Duplication and backing up source RBD images using fast-diff (now possible with recent Ceph kernel module changes) makes it super fast and efficient. I get about 500MiB/s on deltas to an erasure coded hdd pool.

The ability to mount backup images using nbd and then mount the file systems for selective recovery is super useful and one can easily amend a VM to reference the resulting block device without first having to restore the entire image.

Awesome to hear some are already making use of Benji!
 
Joining the discussion with questions.

Are you using benji "by itself" or are you using pve-snapbackup wrapper?

Where/how did you install benji?
One of your Proxmox/Ceph cluster? All of them? One dedicated node? As a docker container in a dedicated VM?
 
Last edited:
Joining the discussion with questions.

Are you using benji "by itself" or are you using pve-snapbackup wrapper?

Where/how did you install benji?
One of your Proxmox/Ceph cluster? All of them? One dedicated node?

I am using Benji as a standalone backup solution.

Its setup inside a VM which has connectivity to my ceph public network. It is a dedicated node because Benji can use quite a bit of CPU. Our target storage is some simple ZFS array's which I present over NFS to the Benji VM.

We put benji into production in June of 2019 and its been going pretty well. We are now backing up 65 RBD volumes every night which range in size from 100GB to 3TB. Benji can backup all those RBD images in roughly 45-60 minutes. We have requirements to keep 1.5 years worth of daily backups and with benji that is actually possible due to how well it does deduplication.

Its easily one of the best backup solutions I have ever worked with.

Don't hesitate with any questions, ill do my best to answer them. The developer has also been a fantastic resource.
 
ZFS array presented over NFS is exactly my idea. Over a 10 Gbps network.
60 minutes to backup all of this looks like a dream.
I'll do some tests with a VM on the cluster then move it to a dedicated node if needed.

How do you handle the snapshots?
Are you using the Ceph snapshotting feature (not Proxmox' one)?
 
pve-snapbackup does a Proxmox snapshot then backs it up with benji.
Thus you can you the usual qemu-guest-agent inside the VM to flush the database before the snapshot.

With a "direct" Ceph snapshot, I don't know.
 
We use ceph snapshots and so does benji. To make benji efficient it uses the rbd-diff command to determine what blocks have changed between snapshots. This way the entire rbd image doesn't need to be read each night.
 
So you're not interacting with the content of the VM when doing the snapshot (no database flush)?
 
Proxmox VMs using Ceph RBD provide writeback caching with flushing support. A standard Ceph snapshot at any point should be transactionally safe as, for example, Microsoft SQL will flush important transactions whenever it needs to.

We have done one production and several test restores with no problems whatsoever. SQL perfectly rolls unfinished transactions forward when it restarts, this is on a 1.5 TiB database with a fair amount of concurrent transactions.

We provide an enterprise cloud backup solution but customers that order virtual services, such as PBXs, core routers or Check Point firewalls often overlook the requirement to back them up so we cover this by matching on the defined VM's name in Proxmox. The following is an earlier draft backup script which backs up RBD images when the defined name matches a regex expression.

NB: This script performs no error checking, it was subsequently completely rewritten but the result isn't of much use to others as it ties directly in to various internal systems to update reference notes and notifies appropriate personnel to investigate errors.

Code:
#!/bin/bash

# /etc/cron.d/proxmox-network-backup
#  0 18 * * 1-5 root /root/proxmox-network-backup

. /usr/local/benji/bin/activate;
. /usr/local/bin/benji-backup.sh;

get_disk () {
  # Limit to first 20 lines to hopefully avoid including snapshot images
  # Convert template clone names to rbd names   ie: rbd_hdd:base-116-disk-0/vm-117-disk-0 -> rbd_hdd/vm-117-disk-0
  # Convert vm disk names to rbd names          ie: rbd_hdd:vm-103-disk-0                 -> rbd_hdd/vm-103-disk-0
  for vmconf in /etc/pve/nodes/*/qemu-server/$1.conf; do
    head -n 20 $vmconf | grep size | perl -pe 's/^\S+ (.*?):(.*?),.*/\1\/\2/g;s/(.*)\/.*(\/.*)/\1\2/g;' | uniq;
  done
}

network_ceph_backup () {
  name=`grep name /etc/pve/nodes/*/qemu-server/$1.conf | head -n 1 | perl -pe 's/.*name:\s+(.*)/\1/g'`;
  num=0;
  for disk in `get_disk $1`; do
    #[ $disk == "rbd_ssd/vm-130-disk-2" ] && continue;  # vivotek-vast2 - video recordings (skip)
    IFS='/' read pool image <<< $disk;
    benji::backup::ceph "$name""-disk$num" "$pool" "$image" "AutomatedBackup";
    let "num++";
  done
}


# Backup appliances:
for f in /etc/pve/nodes/*/qemu-server/*.conf; do
  if [ `grep -Pc 'name:.*(-mikrotik|zatjnb|sip|unix|checkpoint)' $f` -gt 0 ]; then
    f=${f#/etc/*/qemu-server/};
    f=${f%.conf};
    network_ceph_backup $f;
  fi;
done

# Delete incomplete
benji ls 2> /dev/null | grep incomplete | awk '{print $4}' | xargs -r benji rm -f;

# Scrub backups
benji batch-deep-scrub -P 15;

# Cleanup old backups
benji enforce latest3,hours48,days7,weeks4,months3;
benji cleanup;
 
The following is another legacy script where we backup a statically maintained list of Proxmox VMs:

Code:
#!/bin/bash

# /etc/cron.d/proxmox-network-backup
#  0 18 * * 1-5 root /root/proxmox-network-backup

. /usr/local/benji/bin/activate;
. /usr/local/bin/benji-backup.sh;

get_disk () {
  # Limit to first 20 lines to hopefully avoid including snapshot images
  # Convert template clone names to rbd names   ie: rbd_hdd:base-116-disk-0/vm-117-disk-0 -> rbd_hdd/vm-117-disk-0
  # Convert vm disk names to rbd names          ie: rbd_hdd:vm-103-disk-0                 -> rbd_hdd/vm-103-disk-0
  for vmconf in /etc/pve/nodes/*/qemu-server/$1.conf; do
    head -n 20 $vmconf | grep size | perl -pe 's/^\S+ (.*?):(.*?),.*/\1\/\2/g;s/(.*)\/.*(\/.*)/\1\2/g;' | uniq;
  done
}

network_ceph_backup () {
  name=`grep name /etc/pve/nodes/*/qemu-server/$1.conf | head -n 1 | perl -pe 's/.*name:\s+(.*)/\1/g'`;
  num=0;
  for disk in `get_disk $1`; do
    [ $disk == "rbd_ssd/vm-130-disk-2" ] && continue;   # vivotek-vast2 - video recordings (skip)
    IFS='/' read pool image <<< $disk;
    benji::backup::ceph "$name""-disk$num" "$pool" "$image" "AutomatedBackup";
    let "num++";
  done
}


#                   VMID     name
network_ceph_backup  143;  # dc01
network_ceph_backup  144;  # dc02
network_ceph_backup  113;  # accounts
network_ceph_backup  104;  # connect
network_ceph_backup  103;  # labtech
network_ceph_backup  101;  # eppdns
network_ceph_backup  114;  # webapp
network_ceph_backup  115;  # webruby
network_ceph_backup  127;  # mysql
network_ceph_backup  132;  # postgresql
network_ceph_backup  124;  # nt01
network_ceph_backup  131;  # sip
network_ceph_backup  117;  # dirsync
network_ceph_backup  128;  # netbox
network_ceph_backup  129;  # unifi
network_ceph_backup  102;  # eppdns-ote2
network_ceph_backup  106;  # cptool
network_ceph_backup  133;  # unix01
network_ceph_backup  130;  # vivotek-vast2
network_ceph_backup  107;  # gns3
network_ceph_backup  145;  # rhel8
network_ceph_backup  112;  # rhel7
network_ceph_backup  111;  # rhel6
network_ceph_backup  110;  # rhel5
network_ceph_backup  109;  # rhel4
network_ceph_backup  108;  # rhel3
network_ceph_backup  126;  # os2
network_ceph_backup  125;  # rdp

# Delete incomplete
benji ls 2> /dev/null | grep incomplete | awk '{print $4}' | xargs -r benji rm -f;

# Scrub backups
benji batch-deep-scrub -P 15;

# Cleanup old backups
benji enforce latest3,hours48,days7,weeks4,months3;
benji cleanup;
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!