VMs going down during backup randomly

robhost

Active Member
Jun 15, 2014
224
9
38
Dresden
www.robhost.de
Hi,

with latest PVE 5.1 we have sometimes VMs (KVM) going down and need to be started manually during backup process. It appeas random across our VMs and hosts. We use NFS storage and "snapshot" mode with LZO compression.

Example:

845: 2018-04-11 13:31:40 INFO: status: 74% (397318029312/536870912000), sparse 12% (68007620608), duration 5491, read/write 63/58 MB/s
845: 2018-04-11 13:32:45 INFO: status: 75% (402724487168/536870912000), sparse 12% (68617400320), duration 5556, read/write 83/73 MB/s
845: 2018-04-11 13:33:31 ERROR: VM 845 not running
845: 2018-04-11 13:33:31 INFO: aborting backup job
845: 2018-04-11 13:33:31 ERROR: VM 845 not running
845: 2018-04-11 13:33:43 ERROR: Backup of VM 845 failed - VM 845 not running


Any idea whats wrong or how to fix this?
 
Hi,

it is hard to say with this little information.
Network problems?
The NFS is hanging?
What OS has the VM's?
 
Hi,

there are no network problems und no NFS hangings, because other backup jobs (from other nodes) are running fine.

VMs are Linux (CentOS 7). But the VMs are stopped and there does not exists a KVM process anymore, so it does not seem like a OS problem. Qemu Guest Agent is installed in all VMs.
 
Hi,

there are no network problems und no NFS hangings, because other backup jobs (from other nodes) are running fine.

VMs are Linux (CentOS 7). But the VMs are stopped and there does not exists a KVM process anymore, so it does not seem like a OS problem. Qemu Guest Agent is installed in all VMs.

do the logs show anything out of the ordinary? e.g. a segfaulted kvm process?
 
if you can reproduce this using a test VM, it might make sense to attempt to reproduce it with tracing output and/or under gdb
 
Is the disk in qcow2 format? If yes, please check that file for errors with 'qemu-img check'
 
To OP: I am having a similar vzdump backup issue with a legacy XP KVM using remote NFS storage. The problem started about 3 weeks ago after running fine for a long time. On a whim, yesterday I disabled lzo compression. I need several more backup runs before deciding lzo is the culprit. I don't know if that might help you. :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!