[SOLVED] VM crash by limited storage for vzdump

For several days we were struggling with a VM crash during the night.
Unfortunately this VM is currently a SPOF, causing a lot of problems.
PVE dashboard showed a yellow triangle with text "IO error".
VM frozen state, no access possible using ssh or console, until poweroff/on.
On the VM itself there was no clue, it just froze.
On the PVE side we noticed at the end of /var/log/vzdump qemu-110.log:

110: 2020-02-14 03:38:16 INFO: status: 71% (924994764800/1299227607040), sparse 5% (66562551808), duration 9288, read/write 13/12 MB/s
110: 2020-02-14 03:38:16 ERROR: vma_queue_write: write error - Broken pipe
110: 2020-02-14 03:38:16 INFO: aborting backup job
110: 2020-02-14 03:38:23 ERROR: Backup of VM 110 failed - vma_queue_write: write error - Broken pipe

From that moment the VM was frozen and not accessible by ssh or console.

Ofcourse we need to expand the storage pool for back-up's...
Work-around for now is to reduce the number of back-up's from 3 to 2.

The VM has a 10GB and 1200GB disk, BACKUP pool just had not enough space.

Just wanted to share this, as we really had no clue why the VM was frozen.
 
  • Like
Reactions: matrix
io-error is most often caused by a broken storage, e.g. an overprovisioned and full lvmthin storage
 
But what I wanted to emphasize is, that the VM became frozen, that was not expected.
if the qemu-process cannot write, there are several options what can be done to the vm, and in the default setup it is:
werror (writes) : enospc
and
rerror (reads): report
(both can be configured via cli/api, not on the gui though)

for more info see 'man kvm' and search for 'rerror' or 'werror'