Can a full datastore crash a VM while backup in progress?

christophe

Renowned Member
Mar 31, 2011
187
4
83
Hi,

Strange behaviour here (pve 6.3-6, pbs 1.0-9) where a virtual machine crashed (no console, no ping, no guest agent, no more CPU, disk or network graph, but RAM usage graph flat) but reported alive by PVE gui) at 21h22, 38s, a few seconds before first error message from PBS.

This VM are not subject to crash before that incident.
PBS datastore is NOT nfs mounted.

log :

Version:0.9 StartHTML:00000238 EndHTML:00013221 StartFragment:00000272 EndFragment:00013185
2021-03-16T21:11:31+01:00: download 'index.json.blob' from previous backup.
2021-03-16T21:11:31+01:00: register chunks in 'drive-virtio0.img.fidx' from previous backup.
2021-03-16T21:11:31+01:00: download 'drive-virtio0.img.fidx' from previous backup.
2021-03-16T21:11:31+01:00: created new fixed index 1 ("vm/313002/2021-03-16T20:11:14Z/drive-virtio0.img.fidx")
2021-03-16T21:11:31+01:00: register chunks in 'drive-virtio1.img.fidx' from previous backup.
2021-03-16T21:11:31+01:00: download 'drive-virtio1.img.fidx' from previous backup.
2021-03-16T21:11:32+01:00: created new fixed index 2 ("vm/313002/2021-03-16T20:11:14Z/drive-virtio1.img.fidx")
2021-03-16T21:11:32+01:00: add blob "/mnt/datastore/data/vm/313002/2021-03-16T20:11:14Z/qemu-server.conf.blob" (1051 bytes, comp: 1051)
2021-03-16T21:23:05+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:05+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:05+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:05+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:05+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:05+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:05+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:05+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:05+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:05+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:05+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:05+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:05+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:05+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:05+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:05+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:05+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:05+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:05+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:05+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:05+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:05+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:05+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:05+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:05+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:05+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:05+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:05+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:05+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-16T21:23:06+01:00: POST /fixed_chunk: 400 Bad Request: No space left on device (os error 28)
2021-03-17T09:11:50+01:00: backup ended and finish failed: backup ended but finished flag is not set.
2021-03-17T09:11:50+01:00: removing unfinished backup
2021-03-17T09:11:50+01:00: TASK ERROR: backup ended but finished flag is not set.

Backup finished this morning, after systemctl restart pvedaemon and restart VM.

Any idea?

Christophe.
 
Last edited:
Can you post 'pveversion -v' as well as the backup task log (and potentially journal) from the PVE node?
 
[root@px3-c:~]# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.73-1-pve)
pve-manager: 6.3-6 (running version: 6.3-6/2184247e)
pve-kernel-5.4: 6.3-7
pve-kernel-helper: 6.3-7
pve-kernel-5.4.103-1-pve: 5.4.103-1
pve-kernel-5.4.101-1-pve: 5.4.101-1
pve-kernel-5.4.98-1-pve: 5.4.98-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
pve-kernel-4.15: 5.4-19
pve-kernel-4.15.18-30-pve: 4.15.18-58
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-5
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.10-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-6
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.2.0-3
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-8
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.3-pve2
[root@px3-c:~]#

And :
Version:0.9 StartHTML:00000182 EndHTML:00004832 StartFragment:00000216 EndFragment:00004796
INFO: Starting Backup of VM 313002 (qemu)
INFO: Backup started at 2021-03-16 21:11:14
INFO: status = running
INFO: VM Name: srv-toto
INFO: include disk 'virtio0' 'Disques-VMs-HA-1:vm-313002-disk-0' 65G
INFO: include disk 'virtio1' 'Disques-VMs-HA-1:vm-313002-disk-2' 1990G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/313002/2021-03-16T20:11:14Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'bf3219c6-0507-4b9d-9ddf-870968200808'
INFO: resuming VM again
INFO: virtio0: dirty-bitmap status: OK (9.1 GiB of 65.0 GiB dirty)
INFO: virtio1: dirty-bitmap status: OK (32.2 GiB of 1.9 TiB dirty)
INFO: using fast incremental mode (dirty-bitmap), 41.3 GiB dirty of 2.0 TiB total
INFO: 0% (344.0 MiB of 41.3 GiB) in 3s, read: 114.7 MiB/s, write: 114.7 MiB/s
INFO: 1% (456.0 MiB of 41.3 GiB) in 7s, read: 28.0 MiB/s, write: 28.0 MiB/s
INFO: 2% (860.0 MiB of 41.3 GiB) in 20s, read: 31.1 MiB/s, write: 31.1 MiB/s
INFO: 3% (1.3 GiB of 41.3 GiB) in 33s, read: 34.2 MiB/s, write: 34.2 MiB/s
INFO: 4% (1.7 GiB of 41.3 GiB) in 51s, read: 22.0 MiB/s, write: 21.6 MiB/s
INFO: 5% (2.1 GiB of 41.3 GiB) in 1m 2s, read: 38.2 MiB/s, write: 38.2 MiB/s
INFO: 6% (2.5 GiB of 41.3 GiB) in 1m 15s, read: 33.5 MiB/s, write: 33.5 MiB/s
INFO: 7% (2.9 GiB of 41.3 GiB) in 1m 27s, read: 35.3 MiB/s, write: 35.0 MiB/s
INFO: 8% (3.3 GiB of 41.3 GiB) in 1m 45s, read: 23.8 MiB/s, write: 23.8 MiB/s
INFO: 9% (3.7 GiB of 41.3 GiB) in 1m 58s, read: 31.1 MiB/s, write: 30.5 MiB/s
INFO: 10% (4.2 GiB of 41.3 GiB) in 2m 25s, read: 16.3 MiB/s, write: 16.1 MiB/s
INFO: 11% (4.6 GiB of 41.3 GiB) in 2m 37s, read: 35.3 MiB/s, write: 35.3 MiB/s
INFO: 12% (5.0 GiB of 41.3 GiB) in 2m 49s, read: 34.0 MiB/s, write: 34.0 MiB/s
INFO: 13% (5.4 GiB of 41.3 GiB) in 3m 11s, read: 19.5 MiB/s, write: 19.5 MiB/s
INFO: 14% (5.8 GiB of 41.3 GiB) in 3m 23s, read: 35.3 MiB/s, write: 35.3 MiB/s
INFO: 15% (6.2 GiB of 41.3 GiB) in 3m 37s, read: 30.3 MiB/s, write: 30.0 MiB/s
INFO: 16% (6.6 GiB of 41.3 GiB) in 3m 56s, read: 23.4 MiB/s, write: 23.4 MiB/s
INFO: 17% (7.0 GiB of 41.3 GiB) in 4m 8s, read: 34.0 MiB/s, write: 34.0 MiB/s
INFO: 18% (7.5 GiB of 41.3 GiB) in 4m 20s, read: 36.7 MiB/s, write: 36.7 MiB/s
INFO: 19% (7.9 GiB of 41.3 GiB) in 4m 32s, read: 34.3 MiB/s, write: 34.3 MiB/s
INFO: 20% (8.3 GiB of 41.3 GiB) in 4m 54s, read: 19.8 MiB/s, write: 19.8 MiB/s
INFO: 21% (8.7 GiB of 41.3 GiB) in 5m 6s, read: 32.7 MiB/s, write: 32.7 MiB/s
INFO: 22% (9.1 GiB of 41.3 GiB) in 5m 25s, read: 22.3 MiB/s, write: 22.1 MiB/s
INFO: 23% (9.5 GiB of 41.3 GiB) in 5m 41s, read: 27.0 MiB/s, write: 27.0 MiB/s
INFO: 24% (9.9 GiB of 41.3 GiB) in 5m 54s, read: 33.5 MiB/s, write: 33.5 MiB/s
INFO: 25% (10.4 GiB of 41.3 GiB) in 6m 14s, read: 20.8 MiB/s, write: 20.8 MiB/s
INFO: 26% (10.8 GiB of 41.3 GiB) in 6m 26s, read: 35.7 MiB/s, write: 35.7 MiB/s
INFO: 27% (11.2 GiB of 41.3 GiB) in 6m 38s, read: 33.7 MiB/s, write: 33.7 MiB/s
INFO: 28% (11.6 GiB of 41.3 GiB) in 7m 5s, read: 16.1 MiB/s, write: 16.1 MiB/s
INFO: 29% (12.0 GiB of 41.3 GiB) in 7m 18s, read: 33.5 MiB/s, write: 33.5 MiB/s
INFO: 30% (12.4 GiB of 41.3 GiB) in 7m 31s, read: 31.1 MiB/s, write: 31.1 MiB/s
INFO: 31% (12.8 GiB of 41.3 GiB) in 8m 8s, read: 11.2 MiB/s, write: 11.2 MiB/s
INFO: 32% (13.2 GiB of 41.3 GiB) in 8m 23s, read: 29.1 MiB/s, write: 29.1 MiB/s
INFO: 33% (13.7 GiB of 41.3 GiB) in 8m 36s, read: 32.6 MiB/s, write: 32.6 MiB/s
INFO: 34% (14.1 GiB of 41.3 GiB) in 9m 13s, read: 11.6 MiB/s, write: 11.6 MiB/s
INFO: 35% (14.5 GiB of 41.3 GiB) in 9m 26s, read: 31.4 MiB/s, write: 31.4 MiB/s
INFO: 36% (14.9 GiB of 41.3 GiB) in 9m 40s, read: 31.4 MiB/s, write: 31.4 MiB/s
INFO: 37% (15.3 GiB of 41.3 GiB) in 10m 2s, read: 18.4 MiB/s, write: 18.4 MiB/s
INFO: 38% (15.7 GiB of 41.3 GiB) in 10m 25s, read: 19.7 MiB/s, write: 19.7 MiB/s
INFO: 39% (16.1 GiB of 41.3 GiB) in 10m 38s, read: 32.3 MiB/s, write: 32.3 MiB/s
INFO: 40% (16.6 GiB of 41.3 GiB) in 10m 51s, read: 32.6 MiB/s, write: 32.6 MiB/s
ERROR: VM 313002 qmp command 'query-backup' failed - got timeout
INFO: aborting backup job
ERROR: VM 313002 qmp command 'backup-cancel' failed - unable to connect to VM 313002 qmp socket - timeout after 5964 retries
ERROR: Backup of VM 313002 failed - VM 313002 qmp command 'query-backup' failed - got timeout

Christophe.
 
Last edited:
I tested the behaviour with the versions you specified and could not reproduce the issue. By any chance, was this VM running for a while before the issue occured? VMs will run with whatever QEMU version was installed the moment they were started, until you shut them down or reboot them (via PVE, not from within the guest). An older version of QEMU might exhibit this behaviour. To be sure, you could try to reproduce the behaviour on your setup too, but with a recently started VM using the newest QEMU version (you can check the version it is running with via qm status <vmid> --verbose).
 
  • Like
Reactions: d0xt0p

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!