Why does backing up crash my VM every @#! time?

proxwolfe · Feb 4, 2023

Hi,

I have a three node PVE cluster in my home lab on which I am running a couple of VMs. Backups are done hourly/daily/weekly depending on the speed with which the VMs change, how important they are and how difficult it is to recreate them. The backup target is a PBS.

All of my VMs can be backed up easily and without any issues -- all but one: The most important one (Nextcloud) with all my valuable data. Backing up this VM used to be easy as well but over the last couple of days / few weeks the number of incidents has increased to a level where now not even one backup will succeed (that is done with the VM running - with the VM stopped it does succeed):

The backup issues the fs-freeze command and then just stops.

INFO: issuing guest-agent 'fs-freeze' command

Normally, it would then also issue the fs-thaw command but that does not happen. The VM becomes unresponsive and the only way to get it working again is resetting it (or stopping and restarting it). Needless to say that that is not great for the integrity of the file system.

I don't understand why this is happening and why this is happening only with this one VM (the others are fine). The only difference I see is that this VM has a large second virtual disk (1TB) for data while the others either don't have a second virtual disk at all or a much smaller one. Other than that they are more or less the same (the same guest operating system: Debian 11 and a rootless docker for the actual payload).

When I do stop the backup job, the fs-thaw command is issued:

closing with read buffer at /usr/share/perl5/IO/Multiplex.pm line 927.
ERROR: interrupted by signal
INFO: issuing guest-agent 'fs-thaw' command

I then also need to unlock the VM but it won't recover anymore. The console starts showing a bunch of processes that have been blocked for more than 120 seconds and I have to reset/stop-restart the VM.

What is going on there?

Thanks!

apoc · Feb 4, 2023

proxwolfe said:
don't understand why this is happening and why this is happening only with this one VM (the others are fine).

Likely because it is the biggest?
Do you freeze the memory with the snapshot?
This leads to high disk io which can cause to stun the VM

UdoB · Feb 4, 2023

Just for testing: you may exclude that large disk from backup and then test the behavior: "Edit: hard disk" --> tick "Advanced" and untick "Backup [ ]".

Good luck!

proxwolfe · Feb 4, 2023

apoc said:
Do you freeze the memory with the snapshot?

I guess so - is there a switch somewhere to turn this on or off?

UdoB said:
Just for testing: you may exclude that large disk from backup and then test the behavior: "Edit: hard disk" --> tick "Advanced" and untick "Backup [ ]".

Tried that - VM still freezes und never wakes up.

I found some other threads with similar problems in the past (some regression in qemu-kvm). But I understand a fix was developed and releases so I would not expect that this is was is happening to me.

proxwolfe · Feb 4, 2023

Okay, so changing the SCSI controller of the affected VM from "Virtio SCSI" to "Virtio SCSI single" made the backup succeed again.

Search

Search

Why does backing up crash my VM every @#! time?

proxwolfe

Renowned Member

apoc

Famous Member

UdoB

Distinguished Member

proxwolfe

Renowned Member

proxwolfe

Renowned Member

We value your privacy