Container backup problems

Kaboom

Well-Known Member
Mar 5, 2019
120
11
58
53
I am a big fan of Proxmox, got several clusters running and in the meantime got a lot of experience with it. I got only 1 thing that is not working properly and that are the backups/vzdumps to local of my containers on one of my clusters. If I make backup of the vm's, no problem at all !

If I restart a node/server and I try to make a backup again from the same container, it works again... but after awhile it fails and I have to restart the node/server to make it work. Also I can't reboot containers on this node/server, I have to move it to another node to start it again. After restarting the node/server the container will start again.

Configuration:
Running 10 nodes/servers
Containers running on Centos7 LXC
Proxmox latest version
Ceph latest version

Log:
INFO: starting new backup job: vzdump 116 --compress lzo --mode snapshot --remove 0 --storage local --node node004
INFO: filesystem type on dumpdir is 'zfs' -using /var/tmp/vzdumptmp3595163 for temporary files
INFO: Starting Backup of VM 116 (lxc)
INFO: status = running
INFO: CT Name: server01
INFO: found old vzdump snapshot (force removal)
rbd error: error setting snapshot context: (2) No such file or directory
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
/dev/rbd0
INFO: creating archive '/var/lib/vz/dump/vzdump-lxc-116-2019_03_05-10_39_40.tar.lzo'

### AT THIS POINT, THE BACKUP HANGS! AND I HAVE TO STOP IT MANUALLY ->

INFO: remove vzdump snapshot
Removing snap: 100% complete...done.
ERROR: Backup of VM 116 failed - command 'set -o pipefail && tar cpf - --totals --one-file-system -p --sparse --numeric-owner --acls --xattrs '--xattrs-include=user.*' '--xattrs-include=security.capability' '--warning=no-file-ignored' '--warning=no-xattr-write' --one-file-system '--warning=no-file-ignored' '--directory=/var/tmp/vzdumptmp3595163' ./etc/vzdump/pct.conf '--directory=/mnt/vzsnap0' --no-anchored '--exclude=lost+found' --anchored '--exclude=./tmp/?*' '--exclude=./var/tmp/?*' '--exclude=./var/run/?*.pid' ./ | lzop >/var/lib/vz/dump/vzdump-lxc-116-2019_03_05-10_39_40.tar.dat' failed: interrupted by signal
INFO: Backup job finished with errors
TASK ERROR: job errors

Anyone around with the same problems, or any hints/tips on this?

Thanks!
 
Hereby:

arch: amd64
cores: 12
hostname: server01
lock: backup
memory: 10240
nameserver: 213.132.xx.xx 213.132.xx.xx
net0: name=eth0,bridge=vmbr1,gw=213.132.xx.x,hwaddr=42:2F:D6:5A:97:D4,ip=213.132.xx.xx/24,type=veth
ostype: centos
parent: vzdump
rootfs: ceph_ssd:vm-116-disk-1,size=140G
searchdomain: domain.nl
swap: 4096
 
Did you try to remove the lock manually instead of rebooting?
# pct unlock <CTID>

Please take a look at your syslog and search for any errors around the time when your backup hangs, additionally look for backups before the one that hangs and check if there are errors as well. Please grep for: EXT4-fs error or rbd_assert.

And please post the output of:
rbd snap ls {pool-name}/{image-name}
 
pct unlock doesn't work. I can reset the node/server and then it works again but this is not a good solution.

I found a workaround for the container that will not start again and that is to do a forced rbd unmap on the container: rbd unmap -o force /dev/rbd/ceph_ssd/vm-XXX-disk-X
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!