VM crash during backup

matoa

New Member
Sep 4, 2023
8
0
1
Hi,

I've recently activated backup on my system thru PBS for some CT/VM after upgrading from 7 to 8. For the last few days, I have a VM that stops at random time during backups (always same VM).
Here is the log output:

Sep 04 08:00:53 prox01 pvescheduler[3146165]: INFO: Starting Backup of VM 201 (qemu)
Sep 04 08:00:54 prox01 QEMU[3747464]: kvm: ../block/block-copy.c:207: block_copy_task_create: Assertion `!reqlist_find_conflict(&s->reqs, offset, bytes)' failed.
Sep 04 08:00:54 prox01 pvescheduler[3146165]: VM 201 qmp command failed - VM 201 qmp command 'cont' failed - client closed connection
Sep 04 08:00:54 prox01 pvescheduler[3146165]: VM 201 qmp command failed - VM 201 not running
Sep 04 08:00:54 prox01 pvescheduler[3146165]: VM 201 qmp command failed - VM 201 not running
Sep 04 08:00:54 prox01 kernel: fwbr201i0: port 2(tap201i0) entered disabled state
Sep 04 08:00:54 prox01 kernel: fwbr201i0: port 2(tap201i0) entered disabled state
Sep 04 08:00:54 prox01 pvescheduler[3146165]: ERROR: Backup of VM 201 failed - VM 201 not running
Sep 04 08:00:54 prox01 pvescheduler[3146165]: INFO: Starting Backup of VM 203 (qemu)
Sep 04 08:00:54 prox01 systemd[1]: 201.scope: Deactivated successfully.
Sep 04 08:00:54 prox01 systemd[1]: 201.scope: Consumed 9h 24min 39.556s CPU time.
Sep 04 08:00:56 prox01 qmeventd[3149359]: Starting cleanup for 201
Sep 04 08:00:56 prox01 kernel: fwbr201i0: port 1(fwln201i0) entered disabled state
Sep 04 08:00:56 prox01 kernel: vmbr0: port 7(fwpr201p0) entered disabled state
Sep 04 08:00:56 prox01 kernel: device fwln201i0 left promiscuous mode
Sep 04 08:00:56 prox01 kernel: fwbr201i0: port 1(fwln201i0) entered disabled state
Sep 04 08:00:56 prox01 kernel: device fwpr201p0 left promiscuous mode
Sep 04 08:00:56 prox01 kernel: vmbr0: port 7(fwpr201p0) entered disabled state
Sep 04 08:00:56 prox01 qmeventd[3149359]: Finished cleanup for 201
Sep 04 08:00:59 prox01 kernel: zd96: p1 p2 < p5 >
Sep 04 08:00:59 prox01 lvm[3150434]: /dev/zd96p5 excluded: device is rejected by filter config.
Sep 04 08:01:36 prox01 pvescheduler[3146165]: INFO: Finished Backup of VM 203 (00:00:42)
Sep 04 08:01:36 prox01 pvescheduler[3146165]: INFO: Backup job finished with errors
Sep 04 08:01:36 prox01 pvescheduler[3146165]: job errors


I've googled and searched this forum for the error on line #2 without luck.

Anybody has a clue?

Thanks!
 
Hi Philipp:

# pveversion -v
proxmox-ve: 8.0.2 (running kernel: 6.2.16-6-pve)
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
proxmox-kernel-helper: 8.0.3
pve-kernel-5.15: 7.4-4
proxmox-kernel-6.2.16-10-pve: 6.2.16-10
proxmox-kernel-6.2: 6.2.16-10
proxmox-kernel-6.2.16-6-pve: 6.2.16-7
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.15.74-1-pve: 5.15.74-1
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.5
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.8
libpve-guest-common-perl: 5.0.4
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.5
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.2-1
proxmox-backup-file-restore: 3.0.2-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.6
pve-cluster: 8.0.3
pve-container: 5.0.4
pve-docs: 8.0.4
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.5
pve-qemu-kvm: 8.0.2-5
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.7
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1
 
Hi,
can you share a backup task log where it failed and the VM configuration?
 
Hi Fiona,

Here is attached the VM config.
Concerning the logs, this is what I have in /var/log/pve/tasks/:

INFO: Starting Backup of VM 201 (qemu)
INFO: Backup started at 2023-09-04 08:00:53
INFO: status = running
INFO: VM Name: vm-crashing
INFO: include disk 'scsi0' 'local-zfs:vm-201-disk-0' 32G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: snapshots found (not included into backup)
INFO: pending configuration changes found (not included into backup)
INFO: creating Proxmox Backup Server archive 'vm/201/2023-09-04T06:00:53Z'
INFO: enabling encryption
INFO: started backup task '6ac4f37c-398d-495d-83c1-XX'
INFO: resuming VM again
ERROR: VM 201 qmp command 'cont' failed - client closed connection
INFO: aborting backup job
ERROR: VM 201 not running
INFO: resuming VM again
ERROR: Backup of VM 201 failed - VM 201 not running
INFO: Failed at 2023-09-04 08:00:54
 

Attachments

Great! Thanks for the feedback.
In which production version should we seen the patch included?
 
Great! Thanks for the feedback.
In which production version should we seen the patch included?
If it gets reviewed and applied before the next bump, it will be in pve-qemu-kvm=8.0.2-6.
 
FYI, the package pve-qemu-kvm=8.0.2-6 with the fix is currently available on the pvetest repository. If you'd like to install the package, you can temporarily enable it, install and disable again (e.g. via the Repositories section in the UI).
 
Just installed pve-qemu-kvm=8.0.2-6 also. Let's hope our common boat gets better Cha0s.

Thanks again fiona!
 
Hi Fiona,

VM crashed again during a backup last night.

# pveversion -v | grep qemu-kvm
pve-qemu-kvm: 8.0.2-6

INFO: Starting Backup of VM 201 (qemu)
INFO: Backup started at 2023-09-13 00:31:19
INFO: status = running
INFO: VM Name: VM-XXX
INFO: include disk 'scsi0' 'local-zfs:vm-201-disk-0' 32G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: snapshots found (not included into backup)
INFO: creating Proxmox Backup Server archive 'vm/201/2023-09-12T22:31:19Z'
INFO: enabling encryption
INFO: started backup task '4597ed3b-0434-4cd5-9625-XXX'
INFO: resuming VM again
ERROR: VM 201 qmp command 'cont' failed - client closed connection
INFO: aborting backup job
ERROR: VM 201 not running
INFO: resuming VM again
ERROR: Backup of VM 201 failed - VM 201 not running
INFO: Failed at 2023-09-13 00:31:21
 
Hi Fiona,

VM crashed again during a backup last night.

# pveversion -v | grep qemu-kvm
pve-qemu-kvm: 8.0.2-6

INFO: Starting Backup of VM 201 (qemu)
INFO: Backup started at 2023-09-13 00:31:19
INFO: status = running
INFO: VM Name: VM-XXX
INFO: include disk 'scsi0' 'local-zfs:vm-201-disk-0' 32G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: snapshots found (not included into backup)
INFO: creating Proxmox Backup Server archive 'vm/201/2023-09-12T22:31:19Z'
INFO: enabling encryption
INFO: started backup task '4597ed3b-0434-4cd5-9625-XXX'
INFO: resuming VM again
ERROR: VM 201 qmp command 'cont' failed - client closed connection
INFO: aborting backup job
ERROR: VM 201 not running
INFO: resuming VM again
ERROR: Backup of VM 201 failed - VM 201 not running
INFO: Failed at 2023-09-13 00:31:21

Afaik, you need to fully stop and start again or migrate the VM forth (and back again, if the target node is not already too on the updated version) after a QEMU/KVM update, so that the VM actually runs with the new version; otherwise it will still run on the old version with which it was initially started.

Did you do so with this VM?
 
Afaik, you need to fully stop and start again or migrate the VM forth (and back again, if the target node is not already too on the updated version) after a QEMU/KVM update, so that the VM actually runs with the new version; otherwise it will still run on the old version with which it was initially started.

Did you do so with this VM?

Hum you're totally making sense. The VM was already running and I have just re-enabled backup after installing the new version but didn't stop / restart the VM.
I supposed that now it has crashed and has been restarted, it is running on the new version :)
 
Hum you're totally making sense. The VM was already running and I have just re-enabled backup after installing the new version but didn't stop / restart the VM.
I supposed that now it has crashed and has been restarted, it is running on the new version :)
yes. the normal way would be to do a live migration (onto a node with the update in place), since that also means a new process using the new binary is started ;)
 
Hi Philipp:

# pveversion -v
proxmox-ve: 8.0.2 (running kernel: 6.2.16-6-pve)
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
proxmox-kernel-helper: 8.0.3
pve-kernel-5.15: 7.4-4
proxmox-kernel-6.2.16-10-pve: 6.2.16-10
proxmox-kernel-6.2: 6.2.16-10
proxmox-kernel-6.2.16-6-pve: 6.2.16-7
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.15.74-1-pve: 5.15.74-1
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.5
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.8
libpve-guest-common-perl: 5.0.4
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.5
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.2-1
proxmox-backup-file-restore: 3.0.2-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.6
pve-cluster: 8.0.3
pve-container: 5.0.4
pve-docs: 8.0.4
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.5
pve-qemu-kvm: 8.0.2-5
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.7
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1
I am seeing something similiar - vm crashes during backup
windows 2022 vm - multidisc setup

pveversion -v | grep qemu
libproxmox-backup-qemu0: 1.4.1
pve-qemu-kvm: 8.1.5-2
qemu-server: 8.0.10


Backup log
INFO: starting new backup job: vzdump 102 106 109 --quiet 1 --notes-template '{{guestname}}' --storage devsdb --compress zstd --mailto jared.terrel@simplythebuds.com --mailnotification failure --notification-mode auto --mode snapshot
INFO: Starting Backup of VM 102 (qemu)
INFO: Backup started at 2024-03-23 21:00:00
INFO: status = running
INFO: VM Name: freepbx-new
INFO: include disk 'scsi0' 'biglvmsdb:vm-102-disk-0' 32G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/devsdb/dump/vzdump-qemu-102-2024_03_23-21_00_00.vma.zst'
INFO: started backup task 'ec410012-723b-4f9f-b69c-67187fc1175b'
INFO: resuming VM again
INFO: 2% (757.2 MiB of 32.0 GiB) in 3s, read: 252.4 MiB/s, write: 22.7 MiB/s
INFO: 4% (1.5 GiB of 32.0 GiB) in 6s, read: 258.7 MiB/s, write: 3.4 MiB/s
INFO: 6% (2.1 GiB of 32.0 GiB) in 9s, read: 218.8 MiB/s, write: 62.4 MiB/s
INFO: 8% (2.8 GiB of 32.0 GiB) in 12s, read: 227.2 MiB/s, write: 220.9 MiB/s
data truncated for brevity -
INFO: 96% (31.0 GiB of 32.0 GiB) in 2m 18s, read: 249.3 MiB/s, write: 0 B/s
INFO: 98% (31.6 GiB of 32.0 GiB) in 2m 21s, read: 217.3 MiB/s, write: 0 B/s
INFO: 100% (32.0 GiB of 32.0 GiB) in 2m 23s, read: 187.3 MiB/s, write: 0 B/s
INFO: backup is sparse: 15.74 GiB (49%) total zero data
INFO: transferred 32.00 GiB in 143 seconds (229.1 MiB/s)
INFO: archive file size: 7.43GB
INFO: adding notes to backup
INFO: Finished Backup of VM 102 (00:02:24)
INFO: Backup finished at 2024-03-23 21:02:24
INFO: Starting Backup of VM 106 (qemu)
INFO: Backup started at 2024-03-23 21:02:25
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: observium
INFO: include disk 'scsi0' 'local-lvm:vm-106-disk-0' 50G
INFO: creating vzdump archive '/mnt/pve/devsdb/dump/vzdump-qemu-106-2024_03_23-21_02_24.vma.zst'
INFO: starting kvm to execute backup task
INFO: started backup task '46da5ab3-a00a-4d97-8969-082e31942f31'
INFO: 1% (583.0 MiB of 50.0 GiB) in 3s, read: 194.3 MiB/s, write: 157.4 MiB/s
INFO: 2% (1.1 GiB of 50.0 GiB) in 6s, read: 174.2 MiB/s, write: 144.8 MiB/s
INFO: 3% (1.6 GiB of 50.0 GiB) in 9s, read: 169.2 MiB/s, write: 140.7 MiB/s
INFO: 4% (2.0 GiB of 50.0 GiB) in 12s, read: 161.9 MiB/s, write: 137.9 MiB/s
INFO: 5% (2.7 GiB of 50.0 GiB) in 15s, read: 230.8 MiB/s, write: 223.1 MiB/s
INFO: 7% (4.0 GiB of 50.0 GiB) in 18s, read: 424.1 MiB/s, write: 36.5 MiB/s
INFO: 9% (4.6 GiB of 50.0 GiB) in 21s, read: 221.1 MiB/s, write: 176.1 MiB/s
INFO: 10% (5.0 GiB of 50.0 GiB) in 24s, read: 145.0 MiB/s, write: 138.9 MiB/s
INFO: 11% (5.5 GiB of 50.0 GiB) in 27s, read: 159.5 MiB/s, write: 141.1 MiB/s
INFO: 12% (6.2 GiB of 50.0 GiB) in 31s, read: 178.5 MiB/s, write: 163.9 MiB/s
INFO: 13% (6.7 GiB of 50.0 GiB) in 34s, read: 171.6 MiB/s, write: 168.0 MiB/s
data truncated for brevity -
INFO: 88% (44.2 GiB of 50.0 GiB) in 2m 7s, read: 484.7 MiB/s, write: 0 B/s
INFO: 91% (45.7 GiB of 50.0 GiB) in 2m 10s, read: 497.7 MiB/s, write: 9.3 KiB/s
INFO: 94% (47.1 GiB of 50.0 GiB) in 2m 13s, read: 464.3 MiB/s, write: 1.3 KiB/s
INFO: 96% (48.3 GiB of 50.0 GiB) in 2m 16s, read: 418.7 MiB/s, write: 0 B/s
INFO: 99% (49.5 GiB of 50.0 GiB) in 2m 19s, read: 429.3 MiB/s, write: 2.7 KiB/s
INFO: 100% (50.0 GiB of 50.0 GiB) in 2m 20s, read: 470.0 MiB/s, write: 0 B/s
INFO: backup is sparse: 41.16 GiB (82%) total zero data
INFO: transferred 50.00 GiB in 140 seconds (365.7 MiB/s)
INFO: stopping kvm after backup task
INFO: archive file size: 3.14GB
INFO: adding notes to backup
INFO: Finished Backup of VM 106 (00:02:24)
INFO: Backup finished at 2024-03-23 21:04:48
INFO: Starting Backup of VM 109 (qemu)
INFO: Backup started at 2024-03-23 21:04:48
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: sightclone2
INFO: include disk 'scsi0' 'biglvmsdb:vm-109-disk-1' 150G
INFO: include disk 'scsi1' 'biglvmsdb:vm-109-disk-3' 3000G
INFO: include disk 'efidisk0' 'biglvmsdb:vm-109-disk-0' 4M
INFO: include disk 'tpmstate0' 'biglvmsdb:vm-109-disk-2' 4M
INFO: creating vzdump archive '/mnt/pve/devsdb/dump/vzdump-qemu-109-2024_03_23-21_04_48.vma.zst'
INFO: starting kvm to execute backup task
swtpm_setup: Not overwriting existing state file.
INFO: attaching TPM drive to QEMU for backup
INFO: started backup task '6dfb90f6-532e-4323-b317-440787f0c96f'
INFO: 0% (754.0 MiB of 3.1 TiB) in 3s, read: 251.3 MiB/s, write: 243.0 MiB/s
INFO: 1% (31.7 GiB of 3.1 TiB) in 2m 18s, read: 234.5 MiB/s, write: 154.0 MiB/s
INFO: 2% (63.0 GiB of 3.1 TiB) in 15m 13s, read: 41.5 MiB/s, write: 41.0 MiB/s
data truncated for brevity -
INFO: 43% (1.3 TiB of 3.1 TiB) in 14h 36m 41s, read: 21.7 MiB/s, write: 21.7 MiB/s
INFO: 44% (1.4 TiB of 3.1 TiB) in 15h 2m 27s, read: 20.9 MiB/s, write: 20.9 MiB/s
INFO: 45% (1.4 TiB of 3.1 TiB) in 15h 30m 3s, read: 19.5 MiB/s, write: 19.5 MiB/s
INFO: 46% (1.4 TiB of 3.1 TiB) in 15h 58m 24s, read: 19.0 MiB/s, write: 19.0 MiB/s
INFO: 47% (1.4 TiB of 3.1 TiB) in 16h 29m 44s, read: 17.2 MiB/s, write: 17.2 MiB/s
 
Hi,
I am seeing something similiar - vm crashes during backup

windows 2022 vm - multidisc setup
Please share the configuration file with qm config 109.
INFO: Starting Backup of VM 109 (qemu)
INFO: Backup started at 2024-03-23 21:04:48
INFO: status = stopped
INFO: backup mode: stop
The VM was already stopped, so I'm not sure what you mean by crash? Or did you resume it after the backup started?
INFO: ionice priority: 7
INFO: VM Name: sightclone2
INFO: include disk 'scsi0' 'biglvmsdb:vm-109-disk-1' 150G
INFO: include disk 'scsi1' 'biglvmsdb:vm-109-disk-3' 3000G
INFO: include disk 'efidisk0' 'biglvmsdb:vm-109-disk-0' 4M
INFO: include disk 'tpmstate0' 'biglvmsdb:vm-109-disk-2' 4M
INFO: creating vzdump archive '/mnt/pve/devsdb/dump/vzdump-qemu-109-2024_03_23-21_04_48.vma.zst'
INFO: starting kvm to execute backup task
swtpm_setup: Not overwriting existing state file.
INFO: attaching TPM drive to QEMU for backup
INFO: started backup task '6dfb90f6-532e-4323-b317-440787f0c96f'
INFO: 0% (754.0 MiB of 3.1 TiB) in 3s, read: 251.3 MiB/s, write: 243.0 MiB/s
INFO: 1% (31.7 GiB of 3.1 TiB) in 2m 18s, read: 234.5 MiB/s, write: 154.0 MiB/s
INFO: 2% (63.0 GiB of 3.1 TiB) in 15m 13s, read: 41.5 MiB/s, write: 41.0 MiB/s
data truncated for brevity -
INFO: 43% (1.3 TiB of 3.1 TiB) in 14h 36m 41s, read: 21.7 MiB/s, write: 21.7 MiB/s
INFO: 44% (1.4 TiB of 3.1 TiB) in 15h 2m 27s, read: 20.9 MiB/s, write: 20.9 MiB/s
INFO: 45% (1.4 TiB of 3.1 TiB) in 15h 30m 3s, read: 19.5 MiB/s, write: 19.5 MiB/s
INFO: 46% (1.4 TiB of 3.1 TiB) in 15h 58m 24s, read: 19.0 MiB/s, write: 19.0 MiB/s
INFO: 47% (1.4 TiB of 3.1 TiB) in 16h 29m 44s, read: 17.2 MiB/s, write: 17.2 MiB/s
Do you mean the backup task got stuck here? That might indicate an issue with the storage or network. Please check the output of ps faxl if a backup-related task is in D state. What do you get when you run qm status 109 --verbose?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!