[SOLVED] Error code 23 when backing up LXC from PVE to PBS

jsabater

Member
Oct 25, 2021
110
11
23
48
Palma, Mallorca, Spain
Hello everyone!

I have a live PVE 7.1 cluster with two nodes (proxmox1 and proxmox2) and a test PBS 2.1 (pbs1) with one datastore named "local" configured at /opt/backups (same disk as the OS, as this is just a test to learn how it works). Filesystem on both PVE nodes, their containers and the PBS is ext4, as both are installations over existing Debian Bullseye on Hetzner (learning about ZFS and its virtues as well, but not yet there).

I have configured the PBS "local" datastorage as "pbs1" of type "Proxmox Backup Storage" in the "Datacenter: Storage" option of the PVE using the user "backupuser@pbs" which has the "Admin" role over the datastore "local". When I go to the LXC "postgresql1" in the node "proxmox1" and I try to back it up using the "Suspend" mode and the "pbs1" storage, I get this:

Code:
INFO: starting new backup job: vzdump 100 --remove 0 --node proxmox1 --mode suspend --storage pbs1
INFO: Starting Backup of VM 100 (lxc)
INFO: Backup started at 2021-11-30 00:00:42
INFO: status = running
INFO: backup mode: suspend
INFO: ionice priority: 7
INFO: CT Name: postgresql1
INFO: including mount point rootfs ('/') in backup
INFO: starting first sync /proc/1715/root/ to /var/tmp/vzdumptmp3915945_100
ERROR: Backup of VM 100 failed - command 'rsync --stats -h -X -A --numeric-ids -aH --delete --no-whole-file --sparse --one-file-system --relative '--exclude=/tmp/?*' '--exclude=/var/tmp/?*' '--exclude=/var/run/?*.pid' /proc/1715/root//./ /var/tmp/vzdumptmp3915945_100' failed: exit code 23
INFO: Failed at 2021-11-30 00:01:14
INFO: Backup job finished with errors
TASK ERROR: job errors

I cannot findany sort of log file or anything where to look for more information. Does anybody have an idea of what can be happening?

Thanks in advance.
 
Your container storage does not support snapshots, so PVE does a double rsync to /var/tmp to get a consistent snapshot of the data.

I guess there is not enough free space on /var/tmp? You can change that location in /etc/vzdump.conf (see man vzdump).
 
Hi, Dietmar, and thanks for your reply.

As far as I can tell, there is enough disk space left for the operation on the proxmox1 node:

Bash:
~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             63G     0   63G   0% /dev
tmpfs            13G  1.1M   13G   1% /run
/dev/md2        934G  224G  663G  26% /
tmpfs            63G   63M   63G   1% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/md1        485M  161M  299M  35% /boot
/dev/fuse       128M   52K  128M   1% /etc/pve
tmpfs            13G     0   13G   0% /run/user/1000

Also:

Bash:
~ # df -h /var/tmp
Filesystem      Size  Used Avail Use% Mounted on
/dev/md2        934G  224G  663G  26% /

May it be something else limiting the disk space available for the operation? Or maybe something else entirely?

I tried a local backup to /var/lib/vz/dump and it worked perfectly.

Also, incidentally, if I were to have ZFS in the proxmox1 node and also in the PBS datastorage, would this rsync still be needed? I mean, enough disk space in the host for the LXC to dump the snapshop, then transfer it through the network? Or would it all be done in one operation, straight to the network cable?

Thanks in advance.

P.S. I updated the host packages (some of them) to the latest revsion of 7.1 on Friday the 26 but I didn't restart the server or the daemons because I didn't find fixes to any bugs I had been experiencing. May that be it?
 
Last edited:
Okay, so it's not a matter of disk space. I managed to back up all my other LXC, including two MySQL servers and two MongoDB servers. For some reason, only my two PostgreSQL servers fail, and they are not the biggest LXC. All my LXC are Debian 11 Bullseye with the same basic configuration and differ only in the database server.

Does this other log ring any bells?

Code:
INFO: starting new backup job: vzdump 104 --remove 0 --node proxmox2 --mode snapshot --storage pbs1
INFO: Starting Backup of VM 104 (lxc)
INFO: Backup started at 2021-11-29 23:42:01
INFO: status = running
INFO: CT Name: postgresql2
INFO: including mount point rootfs ('/') in backup
INFO: mode failure - some volumes do not support snapshots
INFO: trying 'suspend' mode instead
INFO: backup mode: suspend
INFO: ionice priority: 7
INFO: CT Name: postgresql2
INFO: including mount point rootfs ('/') in backup
INFO: starting first sync /proc/1944/root/ to /var/tmp/vzdumptmp519118_104
INFO: first sync finished - transferred 15.05G bytes in 27s
INFO: suspending guest
INFO: starting final sync /proc/1944/root/ to /var/tmp/vzdumptmp519118_104
INFO: resume vm
INFO: guest is online again after 1 seconds
ERROR: Backup of VM 104 failed - command 'rsync --stats -h -X -A --numeric-ids -aH --delete --no-whole-file --inplace --one-file-system --relative '--exclude=/tmp/?*' '--exclude=/var/tmp/?*' '--exclude=/var/run/?*.pid' /proc/1944/root//./ /var/tmp/vzdumptmp519118_104' failed: exit code 23
INFO: Failed at 2021-11-29 23:42:31
INFO: Backup job finished with errors
TASK ERROR: job errors
 
Okay, after a host reboot after the latest set of Proxmox 7.1 package upgrades, CT 104 was backed up successfully. It looks like those upgrades I installed two days ago without restarting the host were messing things up:

Code:
INFO: starting new backup job: vzdump 104 --storage pbs1 --node proxmox2 --mode suspend --remove 0
INFO: Starting Backup of VM 104 (lxc)
INFO: Backup started at 2021-11-30 23:17:22
INFO: status = running
INFO: backup mode: suspend
INFO: ionice priority: 7
INFO: CT Name: postgresql2
INFO: including mount point rootfs ('/') in backup
INFO: starting first sync /proc/12188/root/ to /var/tmp/vzdumptmp52885_104
INFO: first sync finished - transferred 15.09G bytes in 29s
INFO: suspending guest
INFO: starting final sync /proc/12188/root/ to /var/tmp/vzdumptmp52885_104
INFO: final sync finished - transferred 16.78M bytes in 1s
INFO: resuming guest
INFO: guest is online again after 1 seconds
INFO: creating Proxmox Backup Server archive 'ct/104/2021-11-30T22:17:22Z'
INFO: run: lxc-usernsexec -m u:0:100000:65536 -m g:0:100000:65536 -- /usr/bin/proxmox-backup-client backup --crypt-mode=none pct.conf:/var/tmp/vzdumptmp52885_104/etc/vzdump/pct.conf fw.conf:/var/tmp/vzdumptmp52885_104/etc/vzdump/pct.fw root.pxar:/var/tmp/vzdumptmp52885_104 --include-dev /var/tmp/vzdumptmp52885_104/. --skip-lost-and-found --exclude=/tmp/?* --exclude=/var/tmp/?* --exclude=/var/run/?*.pid --backup-type ct --backup-id 104 --backup-time 1638310642 --repository backupuser@pbs@192.168.1.10:local
INFO: Starting backup: ct/104/2021-11-30T22:17:22Z
INFO: Client name: proxmox2
INFO: Starting backup protocol: Tue Nov 30 23:17:52 2021
INFO: No previous manifest available.
INFO: Upload config file '/var/tmp/vzdumptmp52885_104/etc/vzdump/pct.conf' to 'backupuser@pbs@192.168.1.10:8007:local' as pct.conf.blob
INFO: Upload config file '/var/tmp/vzdumptmp52885_104/etc/vzdump/pct.fw' to 'backupuser@pbs@192.168.1.10:8007:local' as fw.conf.blob
INFO: Upload directory '/var/tmp/vzdumptmp52885_104' to 'backupuser@pbs@192.168.1.10:8007:local' as root.pxar.didx
INFO: root.pxar: had to backup 13.996 GiB of 14.054 GiB (compressed 2.311 GiB) in 53.42s
INFO: root.pxar: average backup speed: 268.294 MiB/s
INFO: root.pxar: backup was done incrementally, reused 59.988 MiB (0.4%)
INFO: Uploaded backup catalog (603.418 KiB)
INFO: Duration: 53.46s
INFO: End Time: Tue Nov 30 23:18:45 2021
INFO: Finished Backup of VM 104 (00:01:25)
INFO: Backup finished at 2021-11-30 23:18:47
INFO: Backup job finished successfully
TASK OK
 
Only container not backing up properly is the original one, the CT 100. The only difference I have found out is that in that LXC I tried to install Let's Encrypt Certbot from snapd and, in order to do so, I had to follow these steps:

https://github.com/lxc/lxc/issues/1854#issuecomment-606241047

After removing these two lines from /etc/pve/lxc/100.conf the backup worked correcly:

Code:
lxc.mount.entry = /dev/fuse dev/fuse none bind,create=file,optional
lxc.mount.auto=cgroup:rw

Any idea how to work this out? Installing Certbot was supposed to be done in all LXC but it is not yet because I run out of time (pending task).
 
Nevermind, I found it:
  1. Go to the container options.
  2. Go to features.
  3. Mark FUSE.
  4. Shutdown.
  5. Start.
Now I have a container with a working snap installation, but backups are still returning error code 23 when the FUSE option is activated and snap in use.
 
Last edited:
As per the other thread mentioned in the previous post, as per the documentation here and here, it is not possible to back up a container that has FUSE in use.

Because of existing issues in the Linux kernel’s freezer subsystem the usage of FUSE mounts inside a container is strongly advised against, as containers need to be frozen for suspend or snapshot mode backups.

Bind mounts are considered to not be managed by the storage subsystem, so you cannot make snapshots or deal with quotas from inside the container. With unprivileged containers you might run into permission problems caused by the user mapping and cannot use ACLs.

Options to have certbot in a LXC:
  1. Install the Certbot Debian package (a bit behind in version, but hopefully will work fine and it doesn't seem to use FUSE).
  2. Install snap in a separate, exclusive LXC, then get a wildcard certificate and deploy it everywhere needed.
More when I have been able to test these options.
 
The typical problem is that you are running ZFS without POSIX ACL Support.

The LXC container has ACL settings inside its filesystem and the 'snaphot' backup process that the Proxmox VE host runs is an rsync to the /var/tmp directory. If POSIX ACL is not turned on in the rpool/ROOT/pve-1 dataset (and it isn't by default for whatever strange reason, and the Proxmox devs should and hopefully will do that in the next release), then the rsync will fail.

TEST:

Bash:
$ zfs get acltype rpool/ROOT/pve-1

if it returns:

Bash:
NAME              PROPERTY  VALUE     SOURCE
rpool/ROOT/pve-1  acltype   off       default

That means ACL's are not on.




SOLUTION:

Enable ZFS POSIX ACLs:

Bash:
$ zfs set acltype=posixacl rpool/ROOT/pve-1


Check it again:

Bash:
$ zfs get acltype rpool/ROOT/pve-1

if it returns:

Bash:
NAME              PROPERTY  VALUE     SOURCE
rpool/ROOT/pve-1  acltype   posix     local

then Success!



Now try that LXC Backup again!


Credit goes to this guy: @CH.illig --> https://forum.proxmox.com/members/ch-illig.36347/ for his post (in German - thank you Google Translate):
https://forum.proxmox.com/threads/lxc-backup-fehler-wegen-acl.129309/


I hope this helps!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!