Why does backing up crash my VM every @#! time?

proxwolfe

Well-Known Member
Jun 20, 2020
499
50
48
49
Hi,

I have a three node PVE cluster in my home lab on which I am running a couple of VMs. Backups are done hourly/daily/weekly depending on the speed with which the VMs change, how important they are and how difficult it is to recreate them. The backup target is a PBS.

All of my VMs can be backed up easily and without any issues -- all but one: The most important one (Nextcloud) with all my valuable data. Backing up this VM used to be easy as well but over the last couple of days / few weeks the number of incidents has increased to a level where now not even one backup will succeed (that is done with the VM running - with the VM stopped it does succeed):

The backup issues the fs-freeze command and then just stops.

INFO: issuing guest-agent 'fs-freeze' command

Normally, it would then also issue the fs-thaw command but that does not happen. The VM becomes unresponsive and the only way to get it working again is resetting it (or stopping and restarting it). Needless to say that that is not great for the integrity of the file system.

I don't understand why this is happening and why this is happening only with this one VM (the others are fine). The only difference I see is that this VM has a large second virtual disk (1TB) for data while the others either don't have a second virtual disk at all or a much smaller one. Other than that they are more or less the same (the same guest operating system: Debian 11 and a rootless docker for the actual payload).

When I do stop the backup job, the fs-thaw command is issued:

closing with read buffer at /usr/share/perl5/IO/Multiplex.pm line 927. ERROR: interrupted by signal INFO: issuing guest-agent 'fs-thaw' command

I then also need to unlock the VM but it won't recover anymore. The console starts showing a bunch of processes that have been blocked for more than 120 seconds and I have to reset/stop-restart the VM.

What is going on there?

Thanks!
 
don't understand why this is happening and why this is happening only with this one VM (the others are fine).
Likely because it is the biggest?
Do you freeze the memory with the snapshot?
This leads to high disk io which can cause to stun the VM
 
Just for testing: you may exclude that large disk from backup and then test the behavior: "Edit: hard disk" --> tick "Advanced" and untick "Backup [ ]".

Good luck!
 
Do you freeze the memory with the snapshot?
I guess so - is there a switch somewhere to turn this on or off?
Just for testing: you may exclude that large disk from backup and then test the behavior: "Edit: hard disk" --> tick "Advanced" and untick "Backup [ ]".
Tried that - VM still freezes und never wakes up.


I found some other threads with similar problems in the past (some regression in qemu-kvm). But I understand a fix was developed and releases so I would not expect that this is was is happening to me.
 
Okay, so changing the SCSI controller of the affected VM from "Virtio SCSI" to "Virtio SCSI single" made the backup succeed again.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!