backup failed: exit code 2

RobFantini · Oct 25, 2017

Hello
I am still getting backup failures. See detail below. The backup then works manually using same type of backup - snapshot .
On the average two out of 50 backups fail per week. For the last month.

Around the same time logs always show something like this:

Code:

EXT4-fs error (device rbd14): ext4_lookup:1580: inode #131073: comm tar: deleted inode referenced: 131264
EXT4-fs (rbd14): previous I/O error to superblock detected
Buffer I/O error on dev rbd14, logical block 0, lost sync page write
print_req_error: I/O error, dev rbd14, sector 0

rdb14 is not used by any of the lxc . It does not exist now. I assume it is a temporary snapshot file system.

Here is the fail backiup log.

Code:

4444: 2017-10-25 03:01:29 INFO: Starting Backup of VM 4444 (lxc)
  4444: 2017-10-25 03:01:29 INFO: status = running
  4444: 2017-10-25 03:01:29 INFO: CT Name: localhost
  4444: 2017-10-25 03:01:29 INFO: backup mode: snapshot
  4444: 2017-10-25 03:01:29 INFO: bandwidth limit: 500000 KB/s
  4444: 2017-10-25 03:01:29 INFO: ionice priority: 7
  4444: 2017-10-25 03:01:29 INFO: create storage snapshot 'vzdump'
  4444: 2017-10-25 03:01:30 INFO: creating archive '/mnt/pve/bkup-nfs/dump/vzdump-lxc-4444-2017_10_25-03_01_29.tar.lzo'
  4444: 2017-10-25 03:02:26 INFO: tar: ./tmp/pico.92645.bak: Cannot stat: Structure needs cleaning
  4444: 2017-10-25 03:02:26 INFO: tar: ./tmp/pico.70482.bak: Cannot stat: Structure needs cleaning
  4444: 2017-10-25 03:02:26 INFO: tar: ./tmp/pico.51406.bak: Cannot stat: Structure needs cleaning
  4444: 2017-10-25 03:02:26 INFO: tar: ./tmp/pico.65811.bak: Cannot stat: Structure needs cleaning
  4444: 2017-10-25 03:02:26 INFO: tar: ./tmp/zabbix_agentd.tmp: Cannot stat: Structure needs cleaning
  4444: 2017-10-25 03:02:26 INFO: tar: ./tmp/fo: Cannot stat: Structure needs cleaning
  4444: 2017-10-25 03:02:26 INFO: tar: ./tmp/pico.85104.bak: Cannot stat: Structure needs cleaning
  4444: 2017-10-25 03:02:26 INFO: tar: ./tmp/pico.69189.bak: Cannot stat: Structure needs cleaning
  4444: 2017-10-25 03:02:26 INFO: tar: ./tmp/pico.25666.bak: Cannot stat: Structure needs cleaning
  4444: 2017-10-25 03:02:26 INFO: tar: ./tmp/pico.53151.bak: Cannot stat: Structure needs cleaning
  4444: 2017-10-25 03:02:47 INFO: Total bytes written: 8474265600 (7.9GiB, 106MiB/s)
  4444: 2017-10-25 03:02:47 INFO: tar: Exiting with failure status due to previous errors
  4444: 2017-10-25 03:02:49 INFO: remove vzdump snapshot
  4444: 2017-10-25 03:02:50 ERROR: Backup of VM 4444 failed - command 'set -o pipefail && tar cpf - --totals --one-file-system -p
--sparse --numeric-owner --acls --xattrs '--xattrs-include=user.*' '--xattrs-include=security.capability'
'--warning=no-file-ignored' '--warning=no-xattr-write' --one-file-system '--warning=no-file-ignored'
'--directory=/mnt/pve/bkup-nfs/dump/vzdump-lxc-4444-2017_10_25-03_01_29.tmp' ./etc/vzdump/pct.conf '--directory=/mnt/vzsnap0'
--no-anchored '--exclude=lost+found' --anchored ./ | cstream -t 512000000 | lzop
>/mnt/pve/bkup-nfs/dump/vzdump-lxc-4444-2017_10_25-03_01_29.tar.dat' failed: exit code 2

the pve host runs:

Code:

# pveversion -v
proxmox-ve: 5.1-25 (running kernel: 4.13.4-1-pve)
pve-manager: 5.1-35 (running version: 5.1-35/722cc488)
pve-kernel-4.13.4-1-pve: 4.13.4-25
pve-kernel-4.10.17-4-pve: 4.10.17-24
pve-kernel-4.10.17-3-pve: 4.10.17-23
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-15
qemu-server: 5.0-17
pve-firmware: 2.0-3
libpve-common-perl: 5.0-20
libpve-guest-common-perl: 2.0-13
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-16
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-2
pve-container: 2.0-17
pve-firewall: 3.0-3
pve-ha-manager: 2.0-3
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.0-2
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.2-pve1~bpo90
ceph: 12.2.1-pve3

this has been going on for awhile . see https://forum.proxmox.com/threads/one-of-12-backup-failed-need-advice.37107/

Has anyone else seen these issues? It could be due some non optimal settings on our end.

jeffwadsworth · Oct 25, 2017

The search function on this forum is handy.

https://forum.proxmox.com/threads/cant-backup-ct-cannot-stat-structure-needs-cleaning.31426/

RobFantini · Oct 25, 2017

the issue is not fixed with pct fsck vmid .

the temporary snapshot file system has the issue not the file system of the pct.

so although similar this post has differences.

check here:

Code:

# ls -l /dev/rbd/lxc-ceph/
total 0
lrwxrwxrwx 1 root root 10 Oct 21 10:45 vm-100-disk-1 -> ../../rbd7
lrwxrwxrwx 1 root root 10 Oct 21 10:42 vm-101-disk-1 -> ../../rbd0
lrwxrwxrwx 1 root root 11 Oct 21 10:58 vm-105-disk-1 -> ../../rbd10
lrwxrwxrwx 1 root root 10 Oct 21 10:42 vm-107-disk-1 -> ../../rbd1
lrwxrwxrwx 1 root root 10 Oct 21 10:45 vm-113-disk-1 -> ../../rbd8
lrwxrwxrwx 1 root root 10 Oct 21 10:42 vm-123-disk-1 -> ../../rbd2
lrwxrwxrwx 1 root root 10 Oct 21 10:42 vm-123-disk-2 -> ../../rbd3
lrwxrwxrwx 1 root root 10 Oct 21 10:42 vm-127-disk-1 -> ../../rbd4
lrwxrwxrwx 1 root root 10 Oct 21 10:53 vm-129-disk-1 -> ../../rbd9
lrwxrwxrwx 1 root root 10 Oct 21 10:42 vm-160-disk-1 -> ../../rbd5
lrwxrwxrwx 1 root root 11 Oct 21 11:10 vm-4444-disk-1 -> ../../rbd11
lrwxrwxrwx 1 root root 11 Oct 21 22:32 vm-7101-disk-1 -> ../../rbd12
lrwxrwxrwx 1 root root 10 Oct 21 10:42 vm-941-disk-1 -> ../../rbd6
lrwxrwxrwx 1 root root 11 Oct 21 22:33 vm-945-disk-1 -> ../../rbd13

the failed backup was for 4444 rbd11

rbd14 had the file system issues. I assume that is a temporary snapshot .

RobFantini · Oct 26, 2017

as happens most of the time, the daily backup completed this morning.

Code:

INFO: Starting Backup of VM 4444 (lxc)
INFO: status = running
INFO: CT Name: localhost
INFO: backup mode: snapshot
INFO: bandwidth limit: 500000 KB/s
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
/dev/rbd14
INFO: creating archive '/mnt/pve/bkup-nfs/dump/vzdump-lxc-4444-2017_10_26-03_01_24.tar.lzo'
INFO: Total bytes written: 8494684160 (8.0GiB, 114MiB/s)
INFO: archive file size: 2.39GB
INFO: delete old backup '/mnt/pve/bkup-nfs/dump/vzdump-lxc-4444-2017_10_25-07_29_02.tar.lzo'
INFO: remove vzdump snapshot
Removing snap: 100% complete...done.
INFO: Finished Backup of VM 4444 (00:01:16)
INFO: Backup job finished successfully
TASK OK

There is an ongoing intermittent issue with backups. It is probably due to a non optimal setting or network/dns issue on our part. Or some sort of bug. If someone has a suggestion of something to look at please reply.

fabian · Oct 27, 2017

seen this once or twice, but never been able to reproduce consistently and investigate. could you please file a bug report linking to this thread? how reliably can you trigger the issue?

RobFantini · Oct 28, 2017

fabian said:
seen this once or twice, but never been able to reproduce consistently and investigate. could you please file a bug report linking to this thread? how reliably can you trigger the issue?

2 or 3 vm's are backed up daily at 3AM. about every 3RD day the same vm fails backup.

note on our weekly backup there are 2 vm's that fail backup 1/2 the time. those are backup lxc's - we use then with rsnapshot to backup data .

rsnapshot uses an enourmous amount of hard links .

I'll work on trying to recreate the issue. As of now I do not know how to do that. Any suggestions are welcome.

Search

Search

backup failed: exit code 2

RobFantini

Famous Member

jeffwadsworth

Member

RobFantini

Famous Member

RobFantini

Famous Member

fabian

Proxmox Staff Member

RobFantini

Famous Member

We value your privacy