[SOLVED] Backup fails when LXC has FUSE activated and in use (error code 23)

jsabater

Member
Oct 25, 2021
102
10
23
48
Palma, Mallorca, Spain
I have a Proxmox 7.1-7 cluster with two nodes with a number of CTs and no VMs. Hosts communicate through a VLAN. Guests communicate through a second VLAN. I have configured a PBS 2.1-2 server that communicates with the hosts through the first VLAN. All servers have ext4 (no ZFS, no Cepth). I have configured a local datastore and made it accessible from the PVE hosts.

Backups of LXC with FUSE active fail to backup whereas those without FUSE succeed. So I did a test:
  1. Create an empty LXC (hostname: test) based on Proxmox's Debian 11 template (same as all the others).
  2. Configure locales, APT mirror and update packages. Reboot.
  3. Back it up (storage: pbs1, mode: suspend, compression dropdown is disabled): worked fine. [1]
  4. Shutdown LXC, activate Options: Features: FUSE, start container.
  5. Back it up (same options): worked fine [2]
  6. apt install gnupg fuse squashfuse snapd
  7. Shut down container. Start container.
  8. Back it up (same options): worked fine [3]
  9. snap install core: error (cannot reload udev rules). [4] https://bugs.launchpad.net/snapd/+bug/1712808
  10. snap install core: installed fine.
  11. snap install --classic certbot: installed fine.
  12. Back it up (same options): error code 23 [5]
  13. Back it up (storage: local): error code 23 (as expected) [6]
  14. Shut down container.
  15. Back it up (storage: local): worked fine [7]
  16. Back it up (storage: pbs1): worked fine [8]
  17. Start container
  18. Back it up (storage: pbx1): error code 23 [9]
  19. Back it up (storage: local): error code 23 [10]
Numbers in square brackets correspond to files attached to this post.

Questions:
  1. Could someone please try this and see if the same happens?
  2. Could someone please try this in snapshot mode (ZFS, Ceph,...) and see if the same happens?
  3. Could someone please provide feedback regarding how to deal with FUSE being use making backups fail?
Thanks in advance.

P.S. Error log so that you don't have to open the attached files:

Code:
INFO: starting new backup job: vzdump 114 --storage local --node proxmox1 --compress zstd --mode suspend --remove 0
INFO: Starting Backup of VM 114 (lxc)
INFO: Backup started at 2021-12-20 18:40:07
INFO: status = running
INFO: backup mode: suspend
INFO: ionice priority: 7
INFO: CT Name: test
INFO: including mount point rootfs ('/') in backup
INFO: starting first sync /proc/694746/root/ to /var/lib/vz/dump/vzdump-lxc-114-2021_12_20-18_40_07.tmp
ERROR: Backup of VM 114 failed - command 'rsync --stats -h -X -A --numeric-ids -aH --delete --no-whole-file --sparse --one-file-system --relative '--exclude=/tmp/?*' '--exclude=/var/tmp/?*' '--exclude=/var/run/?*.pid' /proc/694746/root//./ /var/lib/vz/dump/vzdump-lxc-114-2021_12_20-18_40_07.tmp' failed: exit code 23
INFO: Failed at 2021-12-20 18:40:10
INFO: Backup job finished with errors
TASK ERROR: job errors
 

Attachments

  • 1_backup_no_fuse.txt
    2.1 KB · Views: 3
  • 2_backup_fuse_active.txt
    2.1 KB · Views: 1
  • 3_backup_fuse_active_snap_installed.txt
    2.1 KB · Views: 1
  • 4_snap_install_core_error.txt
    202 bytes · Views: 3
  • 5_snap_installed_backup_error.txt
    792 bytes · Views: 1
  • 6_snap_installed_backup_local_error.txt
    863 bytes · Views: 1
  • 7. snap_installed_lxc_shutdown_backup.txt
    863 bytes · Views: 0
  • 8_snap_installed_lxc_shutdown_backup_pbs1.txt
    1.7 KB · Views: 2
  • 9_snap_installed_lxc_start_backup_pbs1.txt
    792 bytes · Views: 0
  • 10_snap_installed_lxc_started_backup_local.txt
    863 bytes · Views: 1
Thank you very much for your reply, Oguz. From reading that documentation you linked I see that using snap to install certbot inside a LXC is just not possible:

Because of existing issues in the Linux kernel’s freezer subsystem the usage of FUSE mounts inside a container is strongly advised against, as containers need to be frozen for suspend or snapshot mode backups.

Bind mounts are considered to not be managed by the storage subsystem, so you cannot make snapshots or deal with quotas from inside the container. With unprivileged containers you might run into permission problems caused by the user mapping and cannot use ACLs.

Therefore I am going to have to give up on the idea of using snap to install certbot to have a Let's Encrypt certificate in each container, as I need to be able to back them up.

I am going to try two things and will post the results here for future reference:
  1. Try Certbot from Debian, which may not require FUSE, to keep the current way of doing things (one certificate per LXC).
  2. Install snap and certbot in a (privileged?) container, request a wildcard certificate and deploy it to all containers via scripts. And back it up manually via pbs-client.
Again, thanks again for pointing me in the right direction, Oguz. I had been banging my head against this problem for a while now.
 
Thank you very much for your reply, Oguz. From reading that documentation you linked I see that using snap to install certbot inside a LXC is just not possible:
it works in ubuntu container with 21.04

Code:
root@PVE:/# pct set 112 -features nesting=1,fuse=1
root@PVE:/# pct start 112 && pct enter 112
root@CT112:/# apt install squashfuse fuse snapd
root@CT112:/# snap install --classic certbot
Warning: /snap/bin was not found in your $PATH. If you've not restarted your session since you
         installed snapd, try doing that. Please see https://forum.snapcraft.io/t/9469 for more
         details.

certbot 1.22.0 from Certbot Project (certbot-eff✓) installed

but beware that you'll have the same problem with the backups..

Try Certbot from Debian, which may not require FUSE, to keep the current way of doing things (one certificate per LXC).
that should work fine

nstall snap and certbot in a (privileged?) container, request a wildcard certificate and deploy it to all containers via scripts. And back it up manually via pbs-client.
for that option you can try my above commands.

hope this helps!
 
  • Like
Reactions: rinseaid
Sorry to resurrect an old thread, but I wasn't able to find an acceptable answer and this is one of the first results when searching for appropriate keywords in a search engine.

I had the same goal of creating a backup of an LXC container with an active FUSE mount, and did not want to shut down the container to take the backup. The goal was to minimize downtime to the extent possible.

My solution was to utilize a vzdump hook script to stop any FUSE services on the container, and start them back up again when vzdump had completed its job. In my case, I am using systemd to manage services and their dependencies in my containers, and it's important that your systemd units are robust enough for this to function correctly. For example - you might have multiple services utilizing FUSE mounts, so it's important that your systemd units have appropriate dependency structures. Side note - no need to use systemd, you can tailor the script to run anything you want within the container - systemd was chosen because it's what I am comfortable with and because it is well suited for the task.

The script utilizes the parameters and exposed environment variables to determine the LXC that is currently being backed up, and then parses the LXC's config for two pre-defined variables START_SVC and STOP_SVC. I place these in the LXC's comments field, e.g.:
Code:
STOP_SVC=fuse-mount.service
START_SVC=fuse-mount.service

The script then executes systemd stop command against the STOP_SVC service during the 'backup-start' phase, and then starts the START_SVC service during the 'pre-restart' phase. In my case, it's only about 5-10 seconds between these two phases, which I believe is while vzdump is taking the guest snapshot. The process of the snapshot moving to the backup storage (the time consuming part) occurs after the 'pre-restart' phase.

Bash:
#! /usr/bin/env bash

EXECUTION_PHASE=$1

VMINFO=$(pvesh get /cluster/resources --output-format json|jq '.[] | select (.name == "'$HOSTNAME'")')
VMID=$(echo $VMINFO|jq -r '.vmid')

# If we didn't find a VMID, exit
if [ -z $VMID ]; then
  exit
fi

# Get stop/start services from lxc config (enter into comments field)
STOP_SVC=$(pct config $VMID | sed -rn 's/.*STOP_SVC=([^\.]+.service)(.*)$/\1/p')
START_SVC=$(pct config $VMID | sed -rn 's/.*START_SVC=([^\.]+.service)(.*)$/\1/p')

# If we didn't find start/stop svcs, exit
if [ -z $STOP_SVC ] || [ -z $START_SVC ]; then
  echo "No configuration found for VMID: $VMID"
  exit
fi

# Run specified start/stop commands
if [[ "$EXECUTION_PHASE" == "backup-start" ]]; then
  echo "Stopping $STOP_SVC for $HOSTNAME ($VMID) at $(date '+%F %T')"
  pct exec $VMID systemctl stop $STOP_SVC
elif [[ "$EXECUTION_PHASE" == "pre-restart" ]]; then
  echo "Starting $START_SVC for $HOSTNAME ($VMID) at $(date '+%F %T')"
  pct exec $VMID systemctl start $START_SVC
fi

Note: this script requires that jq installed, as I prefer to parse the output of 'pvesh' as JSON.
 
  • Like
Reactions: jsabater
@jsabater - in my particular use case, I am using an rclone mount within an LXC which partially holds a Plex library. The requirement/desire for an LXC container is to enable Intel QSV hardware acceleration within Plex. GVT-g is another option here, but the PCI passthrough then limits the ability to migrate the guest.
 
The typical problem is that you are running ZFS without POSIX ACL Support.

The LXC container has ACL settings inside its filesystem and the 'snaphot' backup process that the Proxmox VE host runs is an rsync to the /var/tmp directory. If POSIX ACL is not turned on in the rpool/ROOT/pve-1 dataset (and it isn't by default for whatever strange reason, and the Proxmox devs should and hopefully will do that in the next release), then the rsync will fail.

TEST:

Bash:
$ zfs get acltype rpool/ROOT/pve-1



if it returns:

Bash:
NAME              PROPERTY  VALUE     SOURCE

rpool/ROOT/pve-1  acltype   off       default


That means ACL's are not on.




SOLUTION:

Enable ZFS POSIX ACLs:

Bash:
$ zfs set acltype=posixacl rpool/ROOT/pve-1


Check it again:

Bash:
$ zfs get acltype rpool/ROOT/pve-1

if it returns:

Bash:
NAME              PROPERTY  VALUE     SOURCE

rpool/ROOT/pve-1  acltype   posix     local


then Success!



Now try that LXC Backup again!


Credit goes to this guy: @CH.illig --> https://forum.proxmox.com/members/ch-illig.36347/ for his post (in German - thank you Google Translate):
https://forum.proxmox.com/threads/lxc-backup-fehler-wegen-acl.129309/


I hope this helps!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!