Deleting snapshots extremly slow and VM's freeze

netbela

Member
Jan 18, 2023
11
1
8
Hi there,

I've been using Proxmox for some time now and am quite happy with it, however I do have a single issue with my VM's and snapshots.
If I delete a snapshot that is even just a day-old it'll take quite a long time to delete the snapshot (up to 2 minutes). In the meanwhile the VM is 'frozen' and completely unresponsive.
The VM's are all stored on ZFS storage served from TrueNAS over NFS.

Now I've read somewhere that this can be the result of using slow storage amongst others but I dont think that my storage solution is that slow. Storage specs:
Data VDEVs
2 x RAIDZ1 | 4 wide | 3.49 TiB --- (All: Samsung PM1633a MZILS3T8HMLH0D4 3.84TB SAS 12Gb/s or equivalent)
Log VDEVs
1 x DISK | 1 wide | 260.83 GiB -- (Intel 900P Optane (SSDPED1D280GA)


The NAS has a 40Gbit uplink and the hypervisors all have a 25Gbit link, MTU is set to 9000 and the storage speed is quite OK, here is a `yabs` result:
Code:
fio Disk Speed Tests (Mixed R/W 50/50) (Partition 10.0.75.10:/mnt/zfs-01/px-nfs):
---------------------------------
Block Size | 4k            (IOPS) | 64k           (IOPS)
  ------   | ---            ----  | ----           ----
Read       | 122.97 MB/s  (30.7k) | 441.55 MB/s   (6.8k)
Write      | 123.29 MB/s  (30.8k) | 443.87 MB/s   (6.9k)
Total      | 246.27 MB/s  (61.5k) | 885.43 MB/s  (13.8k)
           |                      |
Block Size | 512k          (IOPS) | 1m            (IOPS)
  ------   | ---            ----  | ----           ----
Read       | 493.56 MB/s    (964) | 523.89 MB/s    (511)
Write      | 519.79 MB/s   (1.0k) | 558.78 MB/s    (545)
Total      | 1.01 GB/s     (1.9k) | 1.08 GB/s     (1.0k)


So, te storage is quite performant I would say.

Here is the config of a example VM that is having issues:
Code:
root@am-prm-01:/mnt/pve/px-nfs# qm config 138
agent: 1
balloon: 4096
boot: order=scsi0;ide2;net0
cores: 24
cpu: EPYC
hotplug: disk,network,usb
ide2: none,media=cdrom
memory: 49152
meta: creation-qemu=9.0.2,ctime=1751440419
name: xxx
net0: virtio=BC:24:11:33:5A:8A,bridge=vmbr50,firewall=1
numa: 0
ostype: l26
parent: NTB-pre-update-20260504T061508685003
scsi0: px-nfs:138/vm-138-disk-0.qcow2,discard=on,iothread=1,size=200G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=d474f662-80ec-48a1-8898-ec8aa0933921
sockets: 1
tags: ntb-mgmt
vmgenid: 7e9b01ee-5ccb-4045-ae19-3c5af511b596

Is the combination of qcow2 + NFS my issue here?

I have some automated jobs in-place where (on a weekly basis) snapshot all VM's, update them, reboot them and 2 days later delete the snapshots. However the downtime I now have because of the snapshot removal is killing me.
 

Attachments

  • Screenshot 2026-05-09 at 18.56.28.png
    Screenshot 2026-05-09 at 18.56.28.png
    122.3 KB · Views: 4
s the combination of qcow2 + NFS my issue here?
Probably: yes.

If I remember correctly removing a snapshot means to integrate the modified blocks (since the point in time when the snapshot was taken) into one single file. That's plain and simple a slow operation. Problems like like this result in the recommendation to use modern block-devices, not files for VM images.

Disclaimer: I do not use qcow2 and my memory may be unreliable...
 
  • Like
Reactions: fiona
Do the snapshots in your truenas zfs and even use raw files instead of qcow for your images over nfs.
 
Do the snapshots in your truenas zfs and even use raw files instead of qcow for your images over nfs.
I already create snapshots from the TrueNAS side, but those are not (easily) restorable from the Proxmox side

Using RAW is not a viable solution since that'll lose my ability for Thin-provisioning and VM Snapshots.
 
Thin provisioning is a form of resource oversubscription—specifically, simulating storage that one does not actually possess. Running out of space will give you much more problems as you think of the pros while even cost performance. Restoring a VM snapshot is simply a copy operation from the dataset's `.zfs` directory.
 
but those are not (easily) restorable from the Proxmox side
By using ZFS over iSCSI, you can create and restore snapshots via the API without directly accessing Truenas, and thin provisioning is also available.

I recommend looking into it and testing to see if it behaves as you expect.
 
Last edited:
  • Like
Reactions: fiona
if you are using PVE >= 9.0 you can try the new "snapshot-as-volume-chain" feature ..... it should perform better on snapshot creation/deletion then internal qcow2 snapshots .... caveat: still technology preview .... qcow2 on ZFS (CoW on CoW) is also not an ideal setup
 
Last edited:
I've tried installing the TrueNAS Proxmox plugin (https://github.com/truenas/truenas-proxmox-plugin) and was able to configure it with a different TrueNAS system for now (need to update my primairy one). It seems to work quite well and fast also. I'll update my primairy node later this week and see if switching from NFS to ZFS over ISCSI solves my issues.
 
Alright, so i've updated my TrueNAS system to 25.10 and tested with the ZFS-over-ISCSI using the plugin describe above. I assumed I would've get more performance, but the results do not lie.

ZFS-over-ISCSI results:
Code:
fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/sda1):
---------------------------------
Block Size | 4k            (IOPS) | 64k           (IOPS)
  ------   | ---            ----  | ----           ----
Read       | 75.75 MB/s   (18.9k) | 417.57 MB/s   (6.5k)
Write      | 75.95 MB/s   (18.9k) | 419.77 MB/s   (6.5k)
Total      | 151.71 MB/s  (37.9k) | 837.35 MB/s  (13.0k)
           |                      |
Block Size | 512k          (IOPS) | 1m            (IOPS)
  ------   | ---            ----  | ----           ----
Read       | 537.45 MB/s   (1.0k) | 57.99 MB/s      (56)
Write      | 566.01 MB/s   (1.1k) | 62.62 MB/s      (61)
Total      | 1.10 GB/s     (2.1k) | 120.62 MB/s    (117)

NFS Results (after updating TrueNAS):
Code:
fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/sda1):
---------------------------------
Block Size | 4k            (IOPS) | 64k           (IOPS)
  ------   | ---            ----  | ----           ----
Read       | 159.02 MB/s  (39.7k) | 1.22 GB/s    (19.1k)
Write      | 159.44 MB/s  (39.8k) | 1.23 GB/s    (19.2k)
Total      | 318.47 MB/s  (79.6k) | 2.45 GB/s    (38.3k)
           |                      |
Block Size | 512k          (IOPS) | 1m            (IOPS)
  ------   | ---            ----  | ----           ----
Read       | 1.80 GB/s     (3.5k) | 1.74 GB/s     (1.7k)
Write      | 1.90 GB/s     (3.7k) | 1.86 GB/s     (1.8k)
Total      | 3.70 GB/s     (7.2k) | 3.60 GB/s     (3.5k)

I do not see any reason to use ZFS-over-ISCSI in this-case. I'll see if the issues I had with snapshot deletion are solved since the NFS-mounted storage seems to be MUCH faster than it was before.
 
Hi,
Is the combination of qcow2 + NFS my issue here?
as others already pointed out, yes (from the docs):
2: On file based storages, snapshots are possible with the qcow2 format,either using the internal snapshot function, or snapshots as volume chains4.Creating and deleting internal qcow2 snapshots will block a running VM andis not an efficient operation. The performance is particularly bad with networkstorages like NFS. On some setups and for large disks (multiple hundred GiB orTiB sized), these operations may take several minutes, or in extreme cases, evenhours. If your setup is affected, create and remove snapshots while the VM isshut down, expecting a long task duration.

ZFS over iSCSI is recommended in such a case, rather than adding two more layers with NFS and qcow2.
 
Last edited:
  • Like
Reactions: UdoB