Possible bug after upgrading to 7.2: VM freeze if backing up large disks

No cpanel, no mysql (but a mariadb).

/var/log/syslog has nothing helpful. It shows the freeze command and the next entry is after the reset when the machine started again:

Feb 5 15:00:02 grouphub-nextcloud-new qemu-ga: info: guest-ping called Feb 5 15:00:02 grouphub-nextcloud-new qemu-ga: info: guest-fsfreeze called Feb 5 15:40:49 grouphub-nextcloud-new kernel: [ 0.000000] Linux version 5.10.0-20-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.158-2 (2022-12-13)

pveversion -v:

Code:
# pveversion -v
proxmox-ve: 7.3-1 (running kernel: 5.15.83-1-pve)
pve-manager: 7.3-4 (running version: 7.3-4/d69b70d4)
pve-kernel-5.15: 7.3-1
pve-kernel-helper: 7.3-1
pve-kernel-5.15.83-1-pve: 5.15.83-1
pve-kernel-5.15.64-1-pve: 5.15.64-1
pve-kernel-5.15.53-1-pve: 5.15.53-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph: 17.2.5-pve1
ceph-fuse: 17.2.5-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.3
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.3-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-1
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.3-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.3.2-1
proxmox-backup-file-restore: 2.3.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.0-1
proxmox-widget-toolkit: 3.5.3
pve-cluster: 7.3-1
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.6-2
pve-ha-manager: 3.5.1
pve-i18n: 2.8-1
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.7-pve2
 
That's interesting, because

- at the qemu bug tracker (your link), people apparently have been reporting this bug several months ago while my problems only started a couple of days ago (this particular VM is only a few weeks old, but the same payload was installed in its predecessor for several months).
- I have other VMs that also run mariadb and that don't experience this issue.
- even on this VM, it does not happen all the time (it did work for a couple of times).

So maybe this is yet another issue? Although this would seem to be quite the coincidence...

One person in the qemu bug tracker explained that they were working around this issue by replicating the VM, shutting the replica down and backing up the replica then (as a shut down VM can't freeze). I had been considering that too, but I understand that replication in Proxmox only works on a ZFS file system, whereas my VM sits on Ceph. So I discarded that idea for my situation...
 
- I have other VMs that also run mariadb and that don't experience this issue.
Damnit, I may have jinxed it... :)

Today, one of those VMs experienced got stuck after fs-freeze, after more than 100 successful backups over the last weeks/months.

After that, I turned on Virtio SCSI single for this VM, too. And for this VM as well as the first affected VM I also turned on iothread.

I will monitor the situation and report back.
 
Damnit, I may have jinxed it... :)

Today, one of those VMs experienced got stuck after fs-freeze, after more than 100 successful backups over the last weeks/months.

After that, I turned on Virtio SCSI single for this VM, too. And for this VM as well as the first affected VM I also turned on iothread.

I will monitor the situation and report back.
After turning on iothread in addition to Virtio SCSI single yesterday, the second VMs has been backed up approx. 24 times. So I risked backing up the first VM (with the large second disk), too. But, alas, it did not come back from fs-freeze.

So, unfortunately, I have to conclude that Virtio SCSI single, whether in combination with iothread or not, does *not* mitigate the problem.
 
ok for us we have exact same issue for a very long time, probably more than a year now.

We are currently testing this:

Running a stop command just before backups run for qemu agent.
Then when backups are done a start command

We use a cron to stop it when backup is schedule for a VM.
systemctl stop qemu-guest-agent

Then another cron when we know the backups arent running.
systemctl start qemu-guest-agent

As we run backups during the night at 11pm we stop and at 4am we start it again,

Not ideal but seems to work for us in our case and we now have basically zero downtime. We had downtime of 3 to 5 mins where servers would freeze when backkups run and as a hosting provider that is just not acceptable so the above solution seems to work for us for now.

Will run this for a few weeks and see how it goes but so far so good.
 
ok for us we have exact same issue for a very long time, probably more than a year now.

We are currently testing this:

Running a stop command just before backups run for qemu agent.
Then when backups are done a start command

We use a cron to stop it when backup is schedule for a VM.
systemctl stop qemu-guest-agent

Then another cron when we know the backups arent running.
systemctl start qemu-guest-agent

As we run backups during the night at 11pm we stop and at 4am we start it again,

Not ideal but seems to work for us in our case and we now have basically zero downtime. We had downtime of 3 to 5 mins where servers would freeze when backkups run and as a hosting provider that is just not acceptable so the above solution seems to work for us for now.

Will run this for a few weeks and see how it goes but so far so good.
Why don't you just configure the backup process to stop the vm instead of doing a snapshot? That's what I am doing. Would save you the fumbling around with cronjobs...
 
well that would cause downtime. VMs are 2TB in size each. Cant have any downtime.
So far we having no downtime this way and its working well.
 
Last edited:
Okay, maybe I misunderstood what you are saying here:

Running a stop command just before backups run for qemu agent.

So do you mean you are stopping the VM through qemu agent? In that case: How is this different from having the backup process stop the VM?

Or are you saying that you are stopping only *the qemu agent* and then are taking a snapshot backup of the running VM while qemu agent is turned off? In that case: How exactly do you do that?

Thanks!
 
Inside the VM I am running the command to stop the service (agent).
systemctl stop qemu-guest-agent
From my understanding that is the same as if Qemu is disabled on the VM but atleast we dont have to stop our VMs at all to get this to work.
Then we run a backup
 
Understood. Thanks for the clarification.

I think I read somewhere that it is not recommended to snapshot a VM that is running a database application without qemu-guest-agent present - don't know anymore why (it may have to do with unfinished writes). You might want to double check that, if your VM is running a db.
 
Are there any news on the freeze issue?

This keeps tripping me up: I thought I had found an acceptable compromise by doing only one nightly backup after stopping the VM (and relying on the Ceph cluster to preserve the VM data in the meantime). But I realized that the backups now fail due to the VMs having been set up for HA (because apparently you can't stop-backup a VM on HA, you need to snapshot-backup which, of course, does not work because of the freezing issue).

It is getting really annoying.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!