backup failed: exit code 2

RobFantini

Famous Member
May 24, 2012
2,041
107
133
Boston,Mass
Hello
I am still getting backup failures. See detail below. The backup then works manually using same type of backup - snapshot .
On the average two out of 50 backups fail per week. For the last month.

Around the same time logs always show something like this:
Code:
EXT4-fs error (device rbd14): ext4_lookup:1580: inode #131073: comm tar: deleted inode referenced: 131264
EXT4-fs (rbd14): previous I/O error to superblock detected
Buffer I/O error on dev rbd14, logical block 0, lost sync page write
print_req_error: I/O error, dev rbd14, sector 0

rdb14 is not used by any of the lxc . It does not exist now. I assume it is a temporary snapshot file system.

Here is the fail backiup log.
Code:
4444: 2017-10-25 03:01:29 INFO: Starting Backup of VM 4444 (lxc)
  4444: 2017-10-25 03:01:29 INFO: status = running
  4444: 2017-10-25 03:01:29 INFO: CT Name: localhost
  4444: 2017-10-25 03:01:29 INFO: backup mode: snapshot
  4444: 2017-10-25 03:01:29 INFO: bandwidth limit: 500000 KB/s
  4444: 2017-10-25 03:01:29 INFO: ionice priority: 7
  4444: 2017-10-25 03:01:29 INFO: create storage snapshot 'vzdump'
  4444: 2017-10-25 03:01:30 INFO: creating archive '/mnt/pve/bkup-nfs/dump/vzdump-lxc-4444-2017_10_25-03_01_29.tar.lzo'
  4444: 2017-10-25 03:02:26 INFO: tar: ./tmp/pico.92645.bak: Cannot stat: Structure needs cleaning
  4444: 2017-10-25 03:02:26 INFO: tar: ./tmp/pico.70482.bak: Cannot stat: Structure needs cleaning
  4444: 2017-10-25 03:02:26 INFO: tar: ./tmp/pico.51406.bak: Cannot stat: Structure needs cleaning
  4444: 2017-10-25 03:02:26 INFO: tar: ./tmp/pico.65811.bak: Cannot stat: Structure needs cleaning
  4444: 2017-10-25 03:02:26 INFO: tar: ./tmp/zabbix_agentd.tmp: Cannot stat: Structure needs cleaning
  4444: 2017-10-25 03:02:26 INFO: tar: ./tmp/fo: Cannot stat: Structure needs cleaning
  4444: 2017-10-25 03:02:26 INFO: tar: ./tmp/pico.85104.bak: Cannot stat: Structure needs cleaning
  4444: 2017-10-25 03:02:26 INFO: tar: ./tmp/pico.69189.bak: Cannot stat: Structure needs cleaning
  4444: 2017-10-25 03:02:26 INFO: tar: ./tmp/pico.25666.bak: Cannot stat: Structure needs cleaning
  4444: 2017-10-25 03:02:26 INFO: tar: ./tmp/pico.53151.bak: Cannot stat: Structure needs cleaning
  4444: 2017-10-25 03:02:47 INFO: Total bytes written: 8474265600 (7.9GiB, 106MiB/s)
  4444: 2017-10-25 03:02:47 INFO: tar: Exiting with failure status due to previous errors
  4444: 2017-10-25 03:02:49 INFO: remove vzdump snapshot
  4444: 2017-10-25 03:02:50 ERROR: Backup of VM 4444 failed - command 'set -o pipefail && tar cpf - --totals --one-file-system -p
--sparse --numeric-owner --acls --xattrs '--xattrs-include=user.*' '--xattrs-include=security.capability'
'--warning=no-file-ignored' '--warning=no-xattr-write' --one-file-system '--warning=no-file-ignored'
'--directory=/mnt/pve/bkup-nfs/dump/vzdump-lxc-4444-2017_10_25-03_01_29.tmp' ./etc/vzdump/pct.conf '--directory=/mnt/vzsnap0'
--no-anchored '--exclude=lost+found' --anchored ./ | cstream -t 512000000 | lzop
>/mnt/pve/bkup-nfs/dump/vzdump-lxc-4444-2017_10_25-03_01_29.tar.dat' failed: exit code 2

the pve host runs:
Code:
# pveversion -v
proxmox-ve: 5.1-25 (running kernel: 4.13.4-1-pve)
pve-manager: 5.1-35 (running version: 5.1-35/722cc488)
pve-kernel-4.13.4-1-pve: 4.13.4-25
pve-kernel-4.10.17-4-pve: 4.10.17-24
pve-kernel-4.10.17-3-pve: 4.10.17-23
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-15
qemu-server: 5.0-17
pve-firmware: 2.0-3
libpve-common-perl: 5.0-20
libpve-guest-common-perl: 2.0-13
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-16
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-2
pve-container: 2.0-17
pve-firewall: 3.0-3
pve-ha-manager: 2.0-3
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.0-2
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.2-pve1~bpo90
ceph: 12.2.1-pve3

this has been going on for awhile . see https://forum.proxmox.com/threads/one-of-12-backup-failed-need-advice.37107/

Has anyone else seen these issues? It could be due some non optimal settings on our end.
 
the issue is not fixed with pct fsck vmid .

the temporary snapshot file system has the issue not the file system of the pct.

so although similar this post has differences.

check here:
Code:
# ls -l /dev/rbd/lxc-ceph/
total 0
lrwxrwxrwx 1 root root 10 Oct 21 10:45 vm-100-disk-1 -> ../../rbd7
lrwxrwxrwx 1 root root 10 Oct 21 10:42 vm-101-disk-1 -> ../../rbd0
lrwxrwxrwx 1 root root 11 Oct 21 10:58 vm-105-disk-1 -> ../../rbd10
lrwxrwxrwx 1 root root 10 Oct 21 10:42 vm-107-disk-1 -> ../../rbd1
lrwxrwxrwx 1 root root 10 Oct 21 10:45 vm-113-disk-1 -> ../../rbd8
lrwxrwxrwx 1 root root 10 Oct 21 10:42 vm-123-disk-1 -> ../../rbd2
lrwxrwxrwx 1 root root 10 Oct 21 10:42 vm-123-disk-2 -> ../../rbd3
lrwxrwxrwx 1 root root 10 Oct 21 10:42 vm-127-disk-1 -> ../../rbd4
lrwxrwxrwx 1 root root 10 Oct 21 10:53 vm-129-disk-1 -> ../../rbd9
lrwxrwxrwx 1 root root 10 Oct 21 10:42 vm-160-disk-1 -> ../../rbd5
lrwxrwxrwx 1 root root 11 Oct 21 11:10 vm-4444-disk-1 -> ../../rbd11
lrwxrwxrwx 1 root root 11 Oct 21 22:32 vm-7101-disk-1 -> ../../rbd12
lrwxrwxrwx 1 root root 10 Oct 21 10:42 vm-941-disk-1 -> ../../rbd6
lrwxrwxrwx 1 root root 11 Oct 21 22:33 vm-945-disk-1 -> ../../rbd13

the failed backup was for 4444 rbd11

rbd14 had the file system issues. I assume that is a temporary snapshot .
 
as happens most of the time, the daily backup completed this morning.
Code:
INFO: Starting Backup of VM 4444 (lxc)
INFO: status = running
INFO: CT Name: localhost
INFO: backup mode: snapshot
INFO: bandwidth limit: 500000 KB/s
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
/dev/rbd14
INFO: creating archive '/mnt/pve/bkup-nfs/dump/vzdump-lxc-4444-2017_10_26-03_01_24.tar.lzo'
INFO: Total bytes written: 8494684160 (8.0GiB, 114MiB/s)
INFO: archive file size: 2.39GB
INFO: delete old backup '/mnt/pve/bkup-nfs/dump/vzdump-lxc-4444-2017_10_25-07_29_02.tar.lzo'
INFO: remove vzdump snapshot
Removing snap: 100% complete...done.
INFO: Finished Backup of VM 4444 (00:01:16)
INFO: Backup job finished successfully
TASK OK

There is an ongoing intermittent issue with backups. It is probably due to a non optimal setting or network/dns issue on our part. Or some sort of bug. If someone has a suggestion of something to look at please reply.
 
seen this once or twice, but never been able to reproduce consistently and investigate. could you please file a bug report linking to this thread? how reliably can you trigger the issue?
 
seen this once or twice, but never been able to reproduce consistently and investigate. could you please file a bug report linking to this thread? how reliably can you trigger the issue?
2 or 3 vm's are backed up daily at 3AM. about every 3RD day the same vm fails backup.

note on our weekly backup there are 2 vm's that fail backup 1/2 the time. those are backup lxc's - we use then with rsnapshot to backup data .

rsnapshot uses an enourmous amount of hard links .

I'll work on trying to recreate the issue. As of now I do not know how to do that. Any suggestions are welcome.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!