While running backups, vzdump got stuck on a specific container; there is no outward indication of fault but the task isnt moving, and the syslog is getting spammed with
rbd: rbd54: write 1000 at 0 result -30
the vzdump processes are not in a D state so all appears normal, but the filesize of the output file has not moved in half an hour.
I finally killed the vzdump task; the snapshot was NOT deleted and remained mapped; attempting to perform fsck -n /dev/rbd54 (the snapshot) yielded "fsck.ext4: MMP: open with O_DIRECT failed while reading MMP block." I proceeded to unmap and delete the snapshot.
afterwards, performing a manual backup for the container succeeded without issue.
1. why did this happen, and how to identify?
2. If a snapshot is unreadable, vzdump should fail for that container, perform cleanup and resume to the next scheduled task. Should I submit this as a bug?
rbd: rbd54: write 1000 at 0 result -30
the vzdump processes are not in a D state so all appears normal, but the filesize of the output file has not moved in half an hour.
Code:
1320536 ? S 0:00 /bin/bash -c set -o pipefail && tar cpf - --totals --one-file-system -p --sparse --numeric-owner --acls --xattrs '--xattrs-include=user.*' '--xattrs-include=security.capability' '--warning=no-file-ignored' '--warning=no-xattr-write' --one-file-system '--warning=no-file-ignored' '--directory=/mnt/pve/altbackup/dump/vzdump-lxc-32427-2018_01_25-14_29_04.tmp' ./etc/vzdump/pct.conf '--directory=/mnt/vzsnap0' --no-anchored '--exclude=lost+found' --anchored '--exclude=./tmp/?*' '--exclude=./var/tmp/?*' '--exclude=./var/run/?*.pid' ./ | lzop >/mnt/pve/altbackup/dump/vzdump-lxc-32427-2018_01_25-14_29_04.tar.dat
1320537 ? R 30:51 tar cpf - --totals --one-file-system -p --sparse --numeric-owner --acls --xattrs --xattrs-include=user.* --xattrs-include=security.capability --warning=no-file-ignored --warning=no-xattr-write --one-file-system --warning=no-file-ignored --directory=/mnt/pve/altbackup/dump/vzdump-lxc-32427-2018_01_25-14_29_04.tmp ./etc/vzdump/pct.conf --directory=/mnt/vzsnap0 --no-anchored --exclude=lost+found --anchored --exclude=./tmp/?* --exclude=./var/tmp/?* --exclude=./var/run/?*.pid ./
I finally killed the vzdump task; the snapshot was NOT deleted and remained mapped; attempting to perform fsck -n /dev/rbd54 (the snapshot) yielded "fsck.ext4: MMP: open with O_DIRECT failed while reading MMP block." I proceeded to unmap and delete the snapshot.
afterwards, performing a manual backup for the container succeeded without issue.
1. why did this happen, and how to identify?
2. If a snapshot is unreadable, vzdump should fail for that container, perform cleanup and resume to the next scheduled task. Should I submit this as a bug?
Last edited: