VZDUMP backup causes KVM/QEMU VM to shutdown if backup storage is full

TJ101

New Member
Mar 24, 2014
9
0
1
Dear All,

I think I have discovered an issue with the VZdump backups which can cause a lot of headaches.

If the backups are KVM/QEMU with the backup mode set to SNAPSHOT and the BACKUP storage runs out of space during a backup.

The backup process continues and causes all of the KVM/QEMU VM’s in the backup schedule to be shutdown.

If any container backups fail because the backup storage is full, they continue to work.

Here is a sanitised snippet from the backup log.


INFO: gzip: stdout: No space left on device
ERROR: Backup of VM 100 failed - command '(cd /mnt/vzsnap0/private/100;find . '(' -regex '^\.$' ')' -o '(' -type 's' -prune ')' -o -print0|sed 's/\\/\\\\/g'|tar cpf - --totals --sparse --numeric-owner --no-recursion --one-file-system --null -T -|gzip) >/mnt/pve/Backup-Storage/dump/vzdump-openvz-100-2015_02_14-23_59_02.tar.dat' failed: exit code 1
cp: closing `/mnt/pve/Backup-Storage/dump/vzdump-openvz-100-2015_02_14-23_59_02.log': No space left on device
INFO: Starting Backup of VM 104 (openvz)
INFO: CTID 104 exist mounted running
INFO: status = running
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap-host1-0')
INFO: /dev/sdc1: read failed after 0 of 4096 at 103743488: Input/output error
INFO: /dev/sdc1: read failed after 0 of 4096 at 103800832: Input/output error
INFO: /dev/sdc1: read failed after 0 of 4096 at 0: Input/output error
INFO: /dev/sdc1: read failed after 0 of 4096 at 4096: Input/output error
INFO: Logical volume "vzsnap-host1-0" created
INFO: creating archive '/mnt/pve/Backup-Storage/dump/vzdump-openvz-104-2015_02_14-23_59_39.tar.gz'
INFO: gzip: stdout: No space left on device
ERROR: Backup of VM 104 failed - command '(cd /mnt/vzsnap0/private/104;find . '(' -regex '^\.$' ')' -o '(' -type 's' -prune ')' -o -print0|sed 's/\\/\\\\/g'|tar cpf - --totals --sparse --numeric-owner --no-recursion --one-file-system --null -T -|gzip) >/mnt/pve/Backup-Storage/dump/vzdump-openvz-104-2015_02_14-23_59_39.tar.dat' failed: exit code 1
cp: closing `/mnt/pve/Backup-Storage/dump/vzdump-openvz-104-2015_02_14-23_59_39.log': No space left on device
INFO: Starting Backup of VM 111 (openvz)
INFO: CTID 111 exist mounted running
INFO: status = running
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap-host1-0')
INFO: /dev/sdc1: read failed after 0 of 4096 at 103743488: Input/output error
INFO: /dev/sdc1: read failed after 0 of 4096 at 103800832: Input/output error
INFO: /dev/sdc1: read failed after 0 of 4096 at 0: Input/output error
INFO: /dev/sdc1: read failed after 0 of 4096 at 4096: Input/output error
INFO: Logical volume "vzsnap-host1-0" created
INFO: creating archive '/mnt/pve/Backup-Storage/dump/vzdump-openvz-111-2015_02_15-00_00_17.tar.gz'
INFO: gzip: stdout: No space left on device
ERROR: Backup of VM 111 failed - command '(cd /mnt/vzsnap0/private/111;find . '(' -regex '^\.$' ')' -o '(' -type 's' -prune ')' -o -print0|sed 's/\\/\\\\/g'|tar cpf - --totals --sparse --numeric-owner --no-recursion --one-file-system --null -T -|gzip) >/mnt/pve/Backup-Storage/dump/vzdump-openvz-111-2015_02_15-00_00_17.tar.dat' failed: exit code 1
cp: closing `/mnt/pve/Backup-Storage/dump/vzdump-openvz-111-2015_02_15-00_00_17.log': No space left on device
INFO: Starting Backup of VM 113 (qemu)
INFO: status = running
INFO: update VM 113: -lock backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/mnt/pve/Backup-Storage/dump/vzdump-qemu-113-2015_02_15-00_00_52.vma.gz'
ERROR: client closed connection
INFO: aborting backup job
ERROR: VM 113 not running
ERROR: Backup of VM 113 failed - client closed connection
 
Hi all,

I had the very same issue !!
I don’t understand why nobody answered you before, in my opinion this is a very serious bug and should be repaired as soon as possible !


Tonight I ran out of space on my nfs backup storage and guess what happened ? Of course the backup could not run but the problem is that ALL MY KVM MACHINES HAVE BEEN STOPPED BY THE VZDUMP PROCESS !!! (and of course i have not been warned because my supervision machine was on the same cluster… stupid me)


This is a very dangerous bug, at the beginning I did not understand what happened until i found your thread.


Here are some technical info (ask me if you need more)


first precision : I don’t know if it’s relevant but my backup are in snapshot mode


root@vha ~ # pveversion -v
proxmox-ve-2.6.32: 3.3-139 (running kernel: 2.6.32-34-pve)
pve-manager: 3.3-5 (running version: 3.3-5/bfebec03)
pve-kernel-2.6.32-33-pve: 2.6.32-138
pve-kernel-2.6.32-29-pve: 2.6.32-126
pve-kernel-2.6.32-34-pve: 2.6.32-140
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-1
pve-cluster: 3.0-15
qemu-server: 3.3-3
pve-firmware: 1.1-3
libpve-common-perl: 3.0-19
libpve-access-control: 3.0-15
libpve-storage-perl: 3.0-25
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-10
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1


exemple of backup log :
101: mars 12 22:00:03 INFO: Starting Backup of VM 101 (qemu)
101: mars 12 22:00:03 INFO: status = running
101: mars 12 22:00:03 INFO: update VM 101: -lock backup
101: mars 12 22:00:04 INFO: backup mode: snapshot
101: mars 12 22:00:04 INFO: ionice priority: 7
101: mars 12 22:00:05 INFO: creating archive '/mnt/pve/stock/dump/vzdump-qemu-101-2015_03_12-22_00_03.vma.lzo'
101: mars 12 22:00:05 ERROR: client closed connection
101: mars 12 22:00:05 INFO: aborting backup job
101: mars 12 22:00:05 ERROR: VM 101 not running
101: mars 12 22:00:06 ERROR: Backup of VM 101 failed - client closed connection



Now that i’m aware of this bug I will be more careful and won’t been catch again but i think it’s something that every users should be aware of !

PS I sent +/- The same message to the mailing list...
 
Never happened to me with KVM and snapshot backup. I've had problems with a full storage for many days but VMs were never turned off. Maybe something in your setup?

Code:
101: Feb 21 23:13:07 INFO: Starting Backup of VM 101 (qemu)
101: Feb 21 23:13:07 INFO: status = running
101: Feb 21 23:13:07 INFO: update VM 101: -lock backup
101: Feb 21 23:13:08 INFO: backup mode: snapshot
101: Feb 21 23:13:08 INFO: ionice priority: 7
101: Feb 21 23:13:08 INFO: creating archive '/mnt/pve/storbck/dump/vzdump-qemu-101-2015_02_21-23_13_07.vma.gz'
101: Feb 21 23:13:08 INFO: started backup task 'fcc38312-b1e5-4354-9608-2ab5089283da'
[...]
101: Feb 22 08:58:35 INFO: status: 71% (416964149248/584115552256), sparse 10% (63174115328), duration 35127, 11/10 MB/s
101: Feb 22 08:58:35 ERROR: vma_queue_write: write error - Broken pipe
101: Feb 22 08:58:35 INFO: aborting backup job
101: Feb 22 08:59:44 ERROR: Backup of VM 101 failed - vma_queue_write: write error - Broken pipe

104: Feb 22 08:59:44 INFO: Starting Backup of VM 104 (qemu)

PS: Just noticed: you're running out of space on the host, not on the backup device!
 
Last edited:
Thank you for you answer, I don't believe i have anything special in my setup, what do you have in mind ?
Now that you mention it a few weeks ago, i have another Proxmox install that ran out of space too and the machine did not stop, but it was a completely different setup, the backup storage was the same where the machines HD was stored and all the VM were crashed (witch is perfectly understandable) but not stopped.

What do you mean with "PS: Just noticed: you're running out of space on the host, not on the backup device!" I don't understand ?
My VMs had space, my virtualization Host had space too, It was just the "nfs backup only storage" witch was full...
 
Hi there.

I had a similar issue but with worst consequences.

I started a backup (snapshot backup) and after some minutes, the server rebooted... Note that with server, I mean the server, not the VM. Load was ok and the VM weights only a couple of Gb.

The backup storage is a NFS share with 5.5Tb of free storage so the space is not a problem here. The NFS mount was ok and I saw the backup file raising while running the backup.

Rebooting a running server with dozens of running VM should not be an acceptable option in my opinion.

If some specific log is needed to investigate, let me know.

In the meanwhile I'll manually do backups, just in case, you know :rolleyes:
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!