VM crash during backup

matoa

New Member
Sep 4, 2023
8
0
1
Hi,

I've recently activated backup on my system thru PBS for some CT/VM after upgrading from 7 to 8. For the last few days, I have a VM that stops at random time during backups (always same VM).
Here is the log output:

Sep 04 08:00:53 prox01 pvescheduler[3146165]: INFO: Starting Backup of VM 201 (qemu)
Sep 04 08:00:54 prox01 QEMU[3747464]: kvm: ../block/block-copy.c:207: block_copy_task_create: Assertion `!reqlist_find_conflict(&s->reqs, offset, bytes)' failed.
Sep 04 08:00:54 prox01 pvescheduler[3146165]: VM 201 qmp command failed - VM 201 qmp command 'cont' failed - client closed connection
Sep 04 08:00:54 prox01 pvescheduler[3146165]: VM 201 qmp command failed - VM 201 not running
Sep 04 08:00:54 prox01 pvescheduler[3146165]: VM 201 qmp command failed - VM 201 not running
Sep 04 08:00:54 prox01 kernel: fwbr201i0: port 2(tap201i0) entered disabled state
Sep 04 08:00:54 prox01 kernel: fwbr201i0: port 2(tap201i0) entered disabled state
Sep 04 08:00:54 prox01 pvescheduler[3146165]: ERROR: Backup of VM 201 failed - VM 201 not running
Sep 04 08:00:54 prox01 pvescheduler[3146165]: INFO: Starting Backup of VM 203 (qemu)
Sep 04 08:00:54 prox01 systemd[1]: 201.scope: Deactivated successfully.
Sep 04 08:00:54 prox01 systemd[1]: 201.scope: Consumed 9h 24min 39.556s CPU time.
Sep 04 08:00:56 prox01 qmeventd[3149359]: Starting cleanup for 201
Sep 04 08:00:56 prox01 kernel: fwbr201i0: port 1(fwln201i0) entered disabled state
Sep 04 08:00:56 prox01 kernel: vmbr0: port 7(fwpr201p0) entered disabled state
Sep 04 08:00:56 prox01 kernel: device fwln201i0 left promiscuous mode
Sep 04 08:00:56 prox01 kernel: fwbr201i0: port 1(fwln201i0) entered disabled state
Sep 04 08:00:56 prox01 kernel: device fwpr201p0 left promiscuous mode
Sep 04 08:00:56 prox01 kernel: vmbr0: port 7(fwpr201p0) entered disabled state
Sep 04 08:00:56 prox01 qmeventd[3149359]: Finished cleanup for 201
Sep 04 08:00:59 prox01 kernel: zd96: p1 p2 < p5 >
Sep 04 08:00:59 prox01 lvm[3150434]: /dev/zd96p5 excluded: device is rejected by filter config.
Sep 04 08:01:36 prox01 pvescheduler[3146165]: INFO: Finished Backup of VM 203 (00:00:42)
Sep 04 08:01:36 prox01 pvescheduler[3146165]: INFO: Backup job finished with errors
Sep 04 08:01:36 prox01 pvescheduler[3146165]: job errors


I've googled and searched this forum for the error on line #2 without luck.

Anybody has a clue?

Thanks!
 
Hello.

Could you tell me your pveversion -v ?
 
Hi Philipp:

# pveversion -v
proxmox-ve: 8.0.2 (running kernel: 6.2.16-6-pve)
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
proxmox-kernel-helper: 8.0.3
pve-kernel-5.15: 7.4-4
proxmox-kernel-6.2.16-10-pve: 6.2.16-10
proxmox-kernel-6.2: 6.2.16-10
proxmox-kernel-6.2.16-6-pve: 6.2.16-7
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.15.74-1-pve: 5.15.74-1
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.5
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.8
libpve-guest-common-perl: 5.0.4
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.5
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.2-1
proxmox-backup-file-restore: 3.0.2-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.6
pve-cluster: 8.0.3
pve-container: 5.0.4
pve-docs: 8.0.4
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.5
pve-qemu-kvm: 8.0.2-5
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.7
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1
 
Hi,
can you share a backup task log where it failed and the VM configuration?
 
Hi Fiona,

Here is attached the VM config.
Concerning the logs, this is what I have in /var/log/pve/tasks/:

INFO: Starting Backup of VM 201 (qemu)
INFO: Backup started at 2023-09-04 08:00:53
INFO: status = running
INFO: VM Name: vm-crashing
INFO: include disk 'scsi0' 'local-zfs:vm-201-disk-0' 32G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: snapshots found (not included into backup)
INFO: pending configuration changes found (not included into backup)
INFO: creating Proxmox Backup Server archive 'vm/201/2023-09-04T06:00:53Z'
INFO: enabling encryption
INFO: started backup task '6ac4f37c-398d-495d-83c1-XX'
INFO: resuming VM again
ERROR: VM 201 qmp command 'cont' failed - client closed connection
INFO: aborting backup job
ERROR: VM 201 not running
INFO: resuming VM again
ERROR: Backup of VM 201 failed - VM 201 not running
INFO: Failed at 2023-09-04 08:00:54
 

Attachments

  • 201.conf.txt
    1.8 KB · Views: 1
Great! Thanks for the feedback.
In which production version should we seen the patch included?
 
Great! Thanks for the feedback.
In which production version should we seen the patch included?
If it gets reviewed and applied before the next bump, it will be in pve-qemu-kvm=8.0.2-6.
 
FYI, the package pve-qemu-kvm=8.0.2-6 with the fix is currently available on the pvetest repository. If you'd like to install the package, you can temporarily enable it, install and disable again (e.g. via the Repositories section in the UI).
 
Just installed pve-qemu-kvm=8.0.2-6 also. Let's hope our common boat gets better Cha0s.

Thanks again fiona!
 
Hi Fiona,

VM crashed again during a backup last night.

# pveversion -v | grep qemu-kvm
pve-qemu-kvm: 8.0.2-6

INFO: Starting Backup of VM 201 (qemu)
INFO: Backup started at 2023-09-13 00:31:19
INFO: status = running
INFO: VM Name: VM-XXX
INFO: include disk 'scsi0' 'local-zfs:vm-201-disk-0' 32G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: snapshots found (not included into backup)
INFO: creating Proxmox Backup Server archive 'vm/201/2023-09-12T22:31:19Z'
INFO: enabling encryption
INFO: started backup task '4597ed3b-0434-4cd5-9625-XXX'
INFO: resuming VM again
ERROR: VM 201 qmp command 'cont' failed - client closed connection
INFO: aborting backup job
ERROR: VM 201 not running
INFO: resuming VM again
ERROR: Backup of VM 201 failed - VM 201 not running
INFO: Failed at 2023-09-13 00:31:21
 
Hi Fiona,

VM crashed again during a backup last night.

# pveversion -v | grep qemu-kvm
pve-qemu-kvm: 8.0.2-6

INFO: Starting Backup of VM 201 (qemu)
INFO: Backup started at 2023-09-13 00:31:19
INFO: status = running
INFO: VM Name: VM-XXX
INFO: include disk 'scsi0' 'local-zfs:vm-201-disk-0' 32G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: snapshots found (not included into backup)
INFO: creating Proxmox Backup Server archive 'vm/201/2023-09-12T22:31:19Z'
INFO: enabling encryption
INFO: started backup task '4597ed3b-0434-4cd5-9625-XXX'
INFO: resuming VM again
ERROR: VM 201 qmp command 'cont' failed - client closed connection
INFO: aborting backup job
ERROR: VM 201 not running
INFO: resuming VM again
ERROR: Backup of VM 201 failed - VM 201 not running
INFO: Failed at 2023-09-13 00:31:21

Afaik, you need to fully stop and start again or migrate the VM forth (and back again, if the target node is not already too on the updated version) after a QEMU/KVM update, so that the VM actually runs with the new version; otherwise it will still run on the old version with which it was initially started.

Did you do so with this VM?
 
Afaik, you need to fully stop and start again or migrate the VM forth (and back again, if the target node is not already too on the updated version) after a QEMU/KVM update, so that the VM actually runs with the new version; otherwise it will still run on the old version with which it was initially started.

Did you do so with this VM?

Hum you're totally making sense. The VM was already running and I have just re-enabled backup after installing the new version but didn't stop / restart the VM.
I supposed that now it has crashed and has been restarted, it is running on the new version :)
 
Hum you're totally making sense. The VM was already running and I have just re-enabled backup after installing the new version but didn't stop / restart the VM.
I supposed that now it has crashed and has been restarted, it is running on the new version :)
yes. the normal way would be to do a live migration (onto a node with the update in place), since that also means a new process using the new binary is started ;)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!