Problems with Proxmox 8.1.4 and creating snapshots

Luca De Andreis · Jan 26, 2024

Hello everyone,
since yesterday morning (that I upgraded several systems from 8.1.3 to 8.1.4 in reposistory enterprise) I have serious problems when creating snapshots.
N.3 clusters with 3 nodes each with CEPH datastore
N.5 servers with ZFS datastore

The behavior is the same.

When I activate a snapshot (I tried without RAM) if the machine is big enough (during several attempts I had no problems with small VMs, let's say with a VM from 300 Gbyte up, yes, always) it happens that the snapshot window remains active and the procedure does not finish.
The VM loses connectivity on the AGENT (in fact the IP of the VM is no longer displayed in the Proxmox panel), the agent seems to be working, it is running, but the only way to get back to full functionality is to restart the VM. Verified with both Debian VMs, Ubuntu, Windows 2019 server, etc.

I attach a few screenshots.

Luca De Andreis · Jan 26, 2024

Same additions:

Snapshot freeze.

Guest agent go offline

No useful LOG

sb-jw · Jan 26, 2024

Maybe disabling iothread will help (See https://forum.proxmox.com/threads/vms-hung-after-backup.137286/page-2#post-627915).

Otherwise, please post a config of the affected VM and also an overview of the storage, since there was a message from ZFS that you had no more storage space.

Luca De Andreis · Jan 26, 2024

The "no more storage space" message is a false error (during the snapshot stop process).
The datastore has several Tbytes free and the same problem happens to me with both ZFS and CEPH on different servers.
In fact, many machines use IOTHREAD, but... backups happen correctly without the slightest problem, it's just the snapshots that don't work.

sb-jw · Jan 26, 2024

So you don't want help?

Luca De Andreis · Jan 26, 2024

This one:

root@iml-host-px03:/etc/pve/qemu-server# cat 326.conf
#SONDA YOROI TYPE 2 NODE 3
#
#- MANAGEMENT
#- DMZ
#- DMZ-2
agent: 1
balloon: 0
boot: order=scsi0;ide2
cores: 4
cpu: Skylake-Server-noTSX-IBRS
ide2: none,media=cdrom
memory: 16384
meta: creation-qemu=7.2.0,ctime=1682428407
name: SRVSNDGENKUT2N3
net0: virtio=72:7C:2D:AD:C1:48,bridge=vmbr8
net1: virtio=AE:03:95:04:B1:F8,bridge=vmbr1
net2: virtio=CE:A4:EA:86:A6:0B,bridge=vmbr2
numa: 0
onboot: 1
ostype: l26
protection: 1
scsi0: CEPH-RBD:vm-326-disk-0,discard=on,iothread=1,size=300G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=86a1e7d9-b8f8-42ef-9a07-d43ad43ebac3
sockets: 2
tablet: 0
tags: sonda;sonda_genku_06
vmgenid: 78d8c65e-2909-412f-bc7c-445b9f353926

*********************************

... but it's just an example
I might try disabling IOTHREAD...

sb-jw · Jan 26, 2024

Thank you! I see that you have protection activated. Does each of the affected VMs have this? Theoretically it shouldn't matter, but perhaps an error has crept in somewhere in combination with this. Have you tried it this way or generally with a newly created VM?

Luca De Andreis · Jan 26, 2024

Hmmmm, I tried disabling the IOTHREAD and... now on this VM the snapshots are instantaneous.
Regarding storage with IOTHREAD I have a long ongoing issue.
I currently use QEMU in patchlevel 5 and not the latest release, 6, because 6 like 4 gave me stuck problems on datastores with IOTHREAD active.
Patchlevel 5 on the other hand solves this problem but introduces excessive CPU load. I know, but VMs run "better."

I know Fiona is working on a final patch on this, it was already posted in DEVEL yesterday.

Yes, all the VMs I use use protection. But I don't think it's related to that the problem, in fact on some VMs the snapshot works on others it doesn't, even on different clusters.

It could really be a problem related to the IOTHREAD....

Luca De Andreis · Jan 26, 2024

Interesting

A VM that was methodically giving me error yesterday in activating the snapshot, now restarted with IOTHREAD disabled, activates the snapshot in a few moments.

I don't know if it's size related, I've seen VMs with 50 Gbytes of storage activate the snapshot with no problems and this one with 2 disks one of which was 6 Tbytes always gave problems.

Now it works great.

I guess I'll have to wait for patchlevel 7 that Fiona is working on to finally solve the problem.

I will still do more testing, but at the moment the cause seems to be the active IOTHREAD....

PS "sb-jw", ... I didn't mean to be rude in the way I answered you earlier.

fiona · Jan 26, 2024

Hi,

Luca De Andreis said:
A VM that was methodically giving me error yesterday in activating the snapshot, now restarted with IOTHREAD disabled, activates the snapshot in a few moments.

so this was also with pve-qemu-kvm=8.1.2-5?

Is the krbd setting active for the CEPH-RBD storage in /etc/pve/storage.cfg? If not, then RBD snapshots will be taken via QEMU -> librbd rather than via the RBD storage plugin in Proxmox VE. That is also something you could test.

Luca De Andreis · Jan 26, 2024

Hi Fiona,

yes, on all clusters I have (CEPH datastore) and on all individual servers (ZFS datastore) I use pve-qemu-kvm version 8.1.2-5.
I know it loads the CPU abnormally but I have experienced stuck storage on the -4 version (and if I understand correctly the -6 is a rollback to the -4), so I opted for the most convenient solution for me (until the problem is resolved).

In the meantime I'll tell you that I've done other tests on other VMs that were methodically going "snapshot freeze" and now by disabling IOTHREAD storage they are running smoothly.

And no, KRBD is disabled, this is the configuration of my datastores:

rbd: CEPH-RBD
content rootdir,images
krbd 0
pool ceph_pool

Luca De Andreis · Jan 26, 2024

After dozens more trials and tests, I can confirm.
Where yesterday the VM snapshot would freeze, disabling IOTHREAD now works perfectly.
Never a problem after dozens and dozens of tests.
Please note that I use pve-qemu-kvm patchlevel -5, I don't know if -4 would present different behavior.

Instead, with backups (on the PBS), no problem, with and without IOTHREAD (OK as well known there is an additional CPU load with patchlevel 5, but other than that...).

Luca De Andreis · Jan 30, 2024

Hi,
@fiona Is it planned to release the fixes in the current version of QEMU (now in Proxmox) or to go up to version 8.1.5 ? Will this contain the fixes related to the IOTHREAD issues ?

Thanks !!!

fiona · Jan 30, 2024

If only the patches for the rebase onto 8.1.5 are applied, the fixes for the IOthread issue won't be included. But they are not in conflict (except for patch numbering which is easily fixed up), so both series can be applied.

Luca De Andreis · Jan 30, 2024

fiona said:
If only the patches for the rebase onto 8.1.5 are applied, the fixes for the IOthread issue won't be included. But they are not in conflict (except for patch numbering which is easily fixed up), so both series can be applied.

OK, so we will have to wait for a subsequent revision of QEMU 8.1.5 for the inclusion of IOTHREAD-related patches, perhaps after a stabilization process....

fiona · Jan 31, 2024

That depends on when the patch for the iothread issue is applied. If it is applied before the next version bump of the package, it would be in the first revision.

Luca De Andreis · Feb 4, 2024

Hi @fiona ,

I saw that in testing the 8.1.5 release of Qemu was moved, reading the list of the various patches introduced I don't seem to have seen the one related to IOTHREAD, correct ? In that case I will stay on 8.1.2-5, the only one that works for me without problems (apart from an additional CPU load).

fiona · Feb 5, 2024

Yes, while the patch was applied in git now, the version with that fix has not yet been moved to public repositories. It should be soonish if no problems pop up during internal testing.

fiona · Feb 7, 2024

FYI, the version with the fix pve-qemu-kvm=8.1.5-2 is now available on the no-subscription repository.

Luca De Andreis · Feb 7, 2024

Thanks @fiona !!!

Problems with Proxmox 8.1.4 and creating snapshots

New Member

New Member

Famous Member

New Member

Famous Member

New Member

Famous Member

New Member

New Member

Proxmox Staff Member

New Member

New Member

New Member

Proxmox Staff Member

New Member

Proxmox Staff Member

New Member

Proxmox Staff Member

Proxmox Staff Member

New Member