VM freezes during snapshot/backup on NFS (Proxmox 9, NetApp AFF, no errors in logs)

joserosa · 2026-04-21T10:37:18+0200

I have a Proxmox VE 9.1 production environment with 2 nodes and 1 qdevice (for quorum) using shared NFS storage.

During snapshot or backup operations, VMs (especially Windows) become completely unresponsive:

VM freezes completely
Loses network connectivity (no ping)
Console is frozen
VM recovers only after the snapshot/backup finishes

This can take up to 3–5 minutes, and makes it impossible to work with the VMs (e.g. SQL servers).

my environment is:

Proxmox VE: 9.1.5
Kernel: 6.17.4-2-pve
Disk format: qcow2
Storage: NFS 4.1
Backend: NetApp AFF C190
Network: 10Gb SFP+ (Cisco Nexus)
MTU: 9000 end-to-end
Dedicated VLAN for NFS traffic

I currently have the NFS configured as follows, following netapp recomendations and some others parameters:

https://docs.netapp.com/us-en/netap...ox-ontap-nfs.html#storage-administrator-tasks

Bash:

nfs: TEST_DS_PROXMOX
        export /TEST_DS_PROXMOX
        path /mnt/pve/TEST_DS_PROXMOX
        server x.x.x.x
        content images
        options vers=4.1,nconnect=4,timeo=600,retrans=2,_netdev,x-systemd.automount
        prune-backups keep-all=1

NFS its also a dedicated vlan only for storage comunication.

Observed behavior

Snapshot starts → VM freezes immediately
No response to ping or console
No errors in:
- journalctl
- dmesg
Task finishes successfully

The VM appears to be completely stalled during the operation (no CPU activity, no I/O progress, no network response).

What I tested

Network verified (no drops, no saturation)
NFS works correctly outside snapshot operations
Issue is consistently reproducible
Happens mainly on Windows VMs

Additional context

I have been researching similar issues and found multiple discussions and reports related to:

VM freezes during snapshot on NFS
NFS performance degradation under load
possible kernel regressions (6.14 / 6.17) affecting NFS behavior

From what I understand, this could be related to synchronous I/O (fsync) behavior during snapshot operations over NFS, but I am not sure if this is expected or indicates a problem.

Some of the references I reviewed:

However, I have not found a clear root cause or confirmed solution.

The behavior I observe (VM completely unresponsive during snapshot without any errors in logs) seems more like an I/O stall during synchronous write/flush operations rather than a failure.

Questions

Is this expected behavior when using NFS + qcow2 + snapshots?
Are there recommended configurations that allow reliable snapshots without VM freeze?
Is NFS suitable for this type of workload in production?
What storage architecture is typically used in 24/7 production environments where snapshots are mandatory?

Requirements (important)

VM snapshots are a mandatory requirement in this environment.

Due to operational and application constraints, it is not possible to:

avoid snapshots
use stop-mode backups

The expected behavior is that snapshot operations should not cause prolonged VM unresponsiveness, especially in production workloads.

Aditional information
pveversion -v

Code:

proxmox-ve: 9.1.0 (running kernel: 6.17.4-2-pve)
pve-manager: 9.1.5 (running version: 9.1.5/80cf92a64bef6889)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.17.4-2-pve-signed: 6.17.4-2
proxmox-kernel-6.17: 6.17.4-2
proxmox-kernel-6.17.2-1-pve-signed: 6.17.2-1
ceph-fuse: 19.2.3-pve2
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.4.1-1+pve1
ifupdown2: 3.3.0-1+pmx11
intel-microcode: 3.20251111.1~deb13u1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.2
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.5
libpve-apiclient-perl: 3.4.2
libpve-cluster-api-perl: 9.0.7
libpve-cluster-perl: 9.0.7
libpve-common-perl: 9.1.7
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.5
libpve-network-perl: 1.2.5
libpve-rs-perl: 0.11.4
libpve-storage-perl: 9.1.0
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-4
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.1.2-1
proxmox-backup-file-restore: 4.1.2-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.2.1
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.3
proxmox-widget-toolkit: 5.1.5
pve-cluster: 9.0.7
pve-container: 6.1.0
pve-docs: 9.1.2
pve-edk2-firmware: 4.2025.05-2
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.4
pve-firmware: 3.17-2
pve-ha-manager: 5.1.0
pve-i18n: 3.6.6
pve-qemu-kvm: 10.1.2-5
pve-xtermjs: 5.5.0-3
qemu-server: 9.1.4
smartmontools: 7.4-pve1
spiceterm: 3.4.1
swtpm: 0.8.0+pve3
vncterm: 1.9.1
zfsutils-linux: 2.3.4-pve1

YaZoal · 2026-04-21T15:37:33+0200

I have been researching similar issues and found multiple discussions and reports related to:

VM freezes during snapshot on NFS

NFS performance degradation under load

possible kernel regressions (6.14 / 6.17) affecting NFS behavior

From what I understand, this could be related to synchronous I/O (fsync) behavior during snapshot operations over NFS, but I am not sure if this is expected or indicates a problem.

Did you test with kernel 6.14 to rule out a possible issue with kernel 6.17? There have been reports of performance issues in kernel 6.17 [3] involving TCP stack behavior with MTU 9000.

joserosa · 2026-04-21T18:02:58+0200

Hey @YaZoal , thank you for your response.

From what I have seen, similar issues have also been reported with kernel 6.14, especially related to NFS freezes and I/O stalls.

For example:
- https://forum.proxmox.com/threads/s...-6-14-8-2-pve-when-mounting-nfs-shares.169571
- https://forum.proxmox.com/threads/bad-nfs-performance-with-proxmox-9.174881

This makes me think the issue might not be limited to kernel 6.17 specifically, but could be related more generally to NFS behavior under synchronous I/O workloads (e.j. snapshot/fsync).

I have also seen discussions (including some involving Proxmox staff on the forum and mailing lists on the lore page), but I haven’t found a clear root cause or a confirmed solution yet.

What makes this more confusing is that the issue does not seem to be related to the backup tool itself, but specifically to the moment when the snapshot is taken.

I also considered whether this could be related to VirtIO drivers or the guest agent (VSS interaction), but I have already tested with the latest stable versions without any change in behavior.

At this point, I am trying to understand whether this is:
- expected behavior under certain storage conditions
- a limitation of NFS with synchronous I/O
- or a kernel/storage interaction issue

Any insights or suggestions would be greatly appreciated.

If anyone from the Proxmox team (e.g. @fiona or @Maximiliano ) has any input or guidance on this type of issue, it would be very helpful.

Search

Search

VM freezes during snapshot/backup on NFS (Proxmox 9, NetApp AFF, no errors in logs)

joserosa

New Member

YaZoal

Member

joserosa

New Member

We value your privacy