Trouble backing up one LXC

mshorey

Member
Dec 24, 2021
18
0
6
43
Proxmox Version 7.1-12
I have 7 LXCs and 2 VMs. All seem to backup correctly without issue except for 1 of my LXCs. When I attempt at suspend backup it fails with the following:


Task viewer: VM/CT 101 - Backup

OutputStatus

Stop
INFO: starting new backup job: vzdump 101 --mode suspend
INFO: Starting Backup of VM 101 (lxc)
INFO: Backup started at 2022-04-18 18:31:13
INFO: status = running
INFO: backup mode: suspend
INFO: ionice priority: 7
INFO: CT Name: PIHOLE2
INFO: including mount point rootfs ('/') in backup
INFO: starting first sync /proc/38670/root/ to /mnt/BACKUP_TEMP/vzdumptmp41667_101/
ERROR: Backup of VM 101 failed - command 'rsync --stats -h -X -A --numeric-ids -aH --delete --no-whole-file --sparse --one-file-system --relative '--exclude=/tmp/?*' '--exclude=/var/tmp/?*' '--exclude=/var/run/?*.pid' /proc/38670/root//./ /mnt/BACKUP_TEMP/vzdumptmp41667_101/' failed: exit code 23
INFO: Failed at 2022-04-18 18:31:31
INFO: Backup job finished with errors

TASK ERROR: job errors

If I try to backup as a stop mode backup it fails with the following:

INFO: starting new backup job: vzdump 101 --storage local --compress zstd --remove 0 --mode stop --node debian
INFO: Starting Backup of VM 101 (lxc)
INFO: Backup started at 2022-04-18 18:45:11
INFO: status = running
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: CT Name: PIHOLE2
INFO: including mount point rootfs ('/') in backup
INFO: stopping virtual guest
INFO: creating vzdump archive '/var/lib/vz/dump/vzdump-lxc-101-2022_04_18-18_45_11.tar.zst'
INFO: tar: ./etc/pihole/pihole-FTL.db: File shrank by 390250496 bytes; padding with zeros
INFO: Total bytes written: 2315591680 (2.2GiB, 120MiB/s)
INFO: restarting vm
INFO: guest is online again after 25 seconds
ERROR: Backup of VM 101 failed - command 'set -o pipefail && tar cpf - --totals --one-file-system -p --sparse --numeric-owner --acls --xattrs '--xattrs-include=user.*' '--xattrs-include=security.capability' '--warning=no-file-ignored' '--warning=no-xattr-write' --one-file-system '--warning=no-file-ignored' '--directory=/mnt/BACKUP_TEMP/vzdumptmp48410_101/' ./etc/vzdump/pct.conf ./etc/vzdump/pct.fw '--directory=/mnt/vzsnap0' --no-anchored '--exclude=lost+found' --anchored '--exclude=./tmp/?*' '--exclude=./var/tmp/?*' '--exclude=./var/run/?*.pid' ./ | zstd --rsyncable '--threads=1' >/var/lib/vz/dump/vzdump-lxc-101-2022_04_18-18_45_11.tar.dat' failed: exit code 1
INFO: Failed at 2022-04-18 18:45:36
INFO: Backup job finished with errors
TASK ERROR: job errors

vzdump config is as follows:

# vzdump default settings

tmpdir: /mnt/BACKUP_TEMP/
#dumpdir: DIR
#storage: STORAGE_ID
#mode: snapshot|suspend|stop
#bwlimit: KBPS
#ionice: PRI
#lockwait: MINUTES
#stopwait: MINUTES
#stdexcludes: BOOLEAN
#mailto: ADDRESSLIST
#prune-backups: keep-INTERVAL=N[,...]
#script: FILENAME
#exclude-path: PATHLIST
#pigz: N

The particular LXC that is failing to backup config is as follows:

arch: amd64
cores: 1
features: nesting=1
hostname: PIHOLE2
memory: 2048
net0: name=eth0,bridge=vmbr1,hwaddr=AA:D0:0C:CE:F5:F9,ip=dhcp,type=veth
onboot: 1
ostype: ubuntu
rootfs: VMs:vm-101-disk-0,size=4G
swap: 2048

All LXCs are Ubuntu 20.04 except for this one that's failing which is 18.04 because I'm running piHole on it and I believe they still don't support 20.04...correct me if I'm wrong.

Any other information I can provide that could bring more insight? Thanks so much for taking a look.
 
That failed: exit code 23 means that some file(s) couldn't be transferred by rsync. Try this:

- Get the current PID of the CT:
Code:
pgrep -f "lxc-start -F -n CT_ID"

- Ready up a destination directory that you can write to, i.e. /mnt/BACKUP_TEMP/test

- Use the rsync command in the error message and add -v --progress so rsync will show errors. The resulting command should look like:

Code:
'rsync --stats -h -X -A --numeric-ids -aH --delete --no-whole-file --sparse --one-file-system --relative '--exclude=/tmp/?*' '--exclude=/var/tmp/?*' '--exclude=/var/run/?*.pid' /proc/CT_PID/root//./ /mnt/BACKUP_TEMP/test/'

That should give you which files fail to transfer.
 
  • Like
Reactions: mshorey
That failed: exit code 23 means that some file(s) couldn't be transferred by rsync. Try this:

- Get the current PID of the CT:
Code:
pgrep -f "lxc-start -F -n CT_ID"

- Ready up a destination directory that you can write to, i.e. /mnt/BACKUP_TEMP/test

- Use the rsync command in the error message and add -v --progress so rsync will show errors. The resulting command should look like:

Code:
'rsync --stats -h -X -A --numeric-ids -aH --delete --no-whole-file --sparse --one-file-system --relative '--exclude=/tmp/?*' '--exclude=/var/tmp/?*' '--exclude=/var/run/?*.pid' /proc/CT_PID/root//./ /mnt/BACKUP_TEMP/test/'

That should give you which files fail to transfer.
Thank you! I'll give this a try this morning.
 
That failed: exit code 23 means that some file(s) couldn't be transferred by rsync. Try this:

- Get the current PID of the CT:
Code:
pgrep -f "lxc-start -F -n CT_ID"

- Ready up a destination directory that you can write to, i.e. /mnt/BACKUP_TEMP/test

- Use the rsync command in the error message and add -v --progress so rsync will show errors. The resulting command should look like:

Code:
'rsync --stats -h -X -A --numeric-ids -aH --delete --no-whole-file --sparse --one-file-system --relative '--exclude=/tmp/?*' '--exclude=/var/tmp/?*' '--exclude=/var/run/?*.pid' /proc/CT_PID/root//./ /mnt/BACKUP_TEMP/test/'

That should give you which files fail to transfer.
So I ran "sudo rsync --stats -v --verbose -h -X -A --numeric-ids -aH --delete --no-whole-file --sparse --one-file-system --relative '--exclude=/tmp/?*' '--exclude=/var/tmp/?*' '--exclude=/var/run/?*.pid' /proc/48708/root//./ /mnt/BACKUP_TEMP/test/101baktest2/" and it seemed to run through fine. I didn't see any errors pop up.

total: matches=0 hash_hits=0 false_alarms=0 data=6557565787

rsync[355975] (sender) heap statistics:
arena: 64704512 (bytes from sbrk)
ordblks: 216 (chunks not in use)
smblks: 2
hblks: 2 (chunks from mmap)
hblkhd: 401408 (bytes from mmap)
allmem: 65105920 (bytes from sbrk + mmap)
usmblks: 0
fsmblks: 192
uordblks: 1456224 (bytes used)
fordblks: 63248288 (bytes free)
keepcost: 132656 (bytes in releasable chunk)

rsync[355977] (server receiver) heap statistics:
arena: 127946752 (bytes from sbrk)
ordblks: 632 (chunks not in use)
smblks: 2
hblks: 2 (chunks from mmap)
hblkhd: 401408 (bytes from mmap)
allmem: 128348160 (bytes from sbrk + mmap)
usmblks: 0
fsmblks: 192
uordblks: 1336032 (bytes used)
fordblks: 126610720 (bytes free)
keepcost: 132784 (bytes in releasable chunk)

rsync[355976] (server generator) heap statistics:
arena: 38948864 (bytes from sbrk)
ordblks: 212 (chunks not in use)
smblks: 2
hblks: 2 (chunks from mmap)
hblkhd: 401408 (bytes from mmap)
allmem: 39350272 (bytes from sbrk + mmap)
usmblks: 0
fsmblks: 192
uordblks: 1842560 (bytes used)
fordblks: 37106304 (bytes free)
keepcost: 133776 (bytes in releasable chunk)

Number of files: 71,959 (reg: 58,835, dir: 8,370, link: 4,721, dev: 2, special: 31)
Number of created files: 71,959 (reg: 58,835, dir: 8,370, link: 4,721, dev: 2, special: 31)
Number of deleted files: 0
Number of regular files transferred: 58,828
Total file size: 6.56G bytes
Total transferred file size: 6.56G bytes
Literal data: 6.56G bytes
Matched data: 0 bytes
File list size: 2.16M
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 6.56G
Total bytes received: 1.19M

sent 6.56G bytes received 1.19M bytes 230.34M bytes/sec
total size is 6.56G speedup is 1.00

So not sure why it runs fine like this but not from the Proxmox backup job in the GUI?
 
for suspend you need to run rsync twice ;) it's usually the second copy that fails and can't overwrite something. regarding the stop mode error I am not sure what's up with the file mentioned there - the message indicates its size is "wrong" (less data could be read than should be there), and tar treats this as error apparently.
 
  • Like
Reactions: mshorey
for suspend you need to run rsync twice ;) it's usually the second copy that fails and can't overwrite something. regarding the stop mode error I am not sure what's up with the file mentioned there - the message indicates its size is "wrong" (less data could be read than should be there), and tar treats this as error apparently.
So. After looking closer...the "stop" backup task was barking about my pihole-FTL.db being corrupted. I stopped pihole-FTL, removed that database file, and restarted it so it would build a new one. After that I tried a "suspend" backup and it still failed. But when I tried a "stop" backup, it succeeded. So weird.
 
for suspend you need to run rsync twice ;) it's usually the second copy that fails and can't overwrite something. regarding the stop mode error I am not sure what's up with the file mentioned there - the message indicates its size is "wrong" (less data could be read than should be there), and tar treats this as error apparently.
Welp...this has uncovered a deeper issue. Looks like that SSD is failing...
 

Attachments

  • Screen Shot 2022-04-21 at 11.16.51 AM.png
    Screen Shot 2022-04-21 at 11.16.51 AM.png
    115.5 KB · Views: 17
  • Screen Shot 2022-04-21 at 11.16.37 AM.png
    Screen Shot 2022-04-21 at 11.16.37 AM.png
    169.8 KB · Views: 17

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!