I have a Proxmox VE 9.1 production environment with 2 nodes and 1 qdevice (for quorum) using shared NFS storage.
During snapshot or backup operations, VMs (especially Windows) become completely unresponsive:
This can take up to 3–5 minutes, and makes it impossible to work with the VMs (e.g. SQL servers).
my environment is:
I currently have the NFS configured as follows, following netapp recomendations and some others parameters:
https://docs.netapp.com/us-en/netap...ox-ontap-nfs.html#storage-administrator-tasks
NFS its also a dedicated vlan only for storage comunication.
Observed behavior
What I tested
I have been researching similar issues and found multiple discussions and reports related to:
Some of the references I reviewed:
However, I have not found a clear root cause or confirmed solution.
The behavior I observe (VM completely unresponsive during snapshot without any errors in logs) seems more like an I/O stall during synchronous write/flush operations rather than a failure.
Questions
The expected behavior is that snapshot operations should not cause prolonged VM unresponsiveness, especially in production workloads.
Aditional information
During snapshot or backup operations, VMs (especially Windows) become completely unresponsive:
- VM freezes completely
- Loses network connectivity (no ping)
- Console is frozen
- VM recovers only after the snapshot/backup finishes
This can take up to 3–5 minutes, and makes it impossible to work with the VMs (e.g. SQL servers).
my environment is:
- Proxmox VE: 9.1.5
- Kernel: 6.17.4-2-pve
- Disk format: qcow2
- Storage: NFS 4.1
- Backend: NetApp AFF C190
- Network: 10Gb SFP+ (Cisco Nexus)
- MTU: 9000 end-to-end
- Dedicated VLAN for NFS traffic
I currently have the NFS configured as follows, following netapp recomendations and some others parameters:
https://docs.netapp.com/us-en/netap...ox-ontap-nfs.html#storage-administrator-tasks
Bash:
nfs: TEST_DS_PROXMOX
export /TEST_DS_PROXMOX
path /mnt/pve/TEST_DS_PROXMOX
server x.x.x.x
content images
options vers=4.1,nconnect=4,timeo=600,retrans=2,_netdev,x-systemd.automount
prune-backups keep-all=1
NFS its also a dedicated vlan only for storage comunication.
Observed behavior
- Snapshot starts → VM freezes immediately
- No response to ping or console
- No errors in:
- journalctl
- dmesg
- Task finishes successfully
What I tested
- Network verified (no drops, no saturation)
- NFS works correctly outside snapshot operations
- Issue is consistently reproducible
- Happens mainly on Windows VMs
I have been researching similar issues and found multiple discussions and reports related to:
- VM freezes during snapshot on NFS
- NFS performance degradation under load
- possible kernel regressions (6.14 / 6.17) affecting NFS behavior
Some of the references I reviewed:
- https://forum.proxmox.com/threads/snapshot-causes-vm-to-become-unresponsive.153483
- https://forum.proxmox.com/threads/bad-nfs-performance-with-proxmox-9.174881
- https://forum.proxmox.com/threads/severe-system-freeze-with-nfs-on-proxmox-9-running-kernel-6-14-8-2-pve-when-mounting-nfs-shares.169571
- https://bugzilla.kernel.org/show_bug.cgi?id=219508
- https://bugzilla.proxmox.com/show_bug.cgi?id=1989
However, I have not found a clear root cause or confirmed solution.
The behavior I observe (VM completely unresponsive during snapshot without any errors in logs) seems more like an I/O stall during synchronous write/flush operations rather than a failure.
Questions
- Is this expected behavior when using NFS + qcow2 + snapshots?
- Are there recommended configurations that allow reliable snapshots without VM freeze?
- Is NFS suitable for this type of workload in production?
- What storage architecture is typically used in 24/7 production environments where snapshots are mandatory?
- VM snapshots are a mandatory requirement in this environment.
- avoid snapshots
- use stop-mode backups
The expected behavior is that snapshot operations should not cause prolonged VM unresponsiveness, especially in production workloads.
Aditional information
pveversion -v
Code:
proxmox-ve: 9.1.0 (running kernel: 6.17.4-2-pve)
pve-manager: 9.1.5 (running version: 9.1.5/80cf92a64bef6889)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.17.4-2-pve-signed: 6.17.4-2
proxmox-kernel-6.17: 6.17.4-2
proxmox-kernel-6.17.2-1-pve-signed: 6.17.2-1
ceph-fuse: 19.2.3-pve2
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.4.1-1+pve1
ifupdown2: 3.3.0-1+pmx11
intel-microcode: 3.20251111.1~deb13u1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.2
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.5
libpve-apiclient-perl: 3.4.2
libpve-cluster-api-perl: 9.0.7
libpve-cluster-perl: 9.0.7
libpve-common-perl: 9.1.7
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.5
libpve-network-perl: 1.2.5
libpve-rs-perl: 0.11.4
libpve-storage-perl: 9.1.0
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-4
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.1.2-1
proxmox-backup-file-restore: 4.1.2-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.2.1
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.3
proxmox-widget-toolkit: 5.1.5
pve-cluster: 9.0.7
pve-container: 6.1.0
pve-docs: 9.1.2
pve-edk2-firmware: 4.2025.05-2
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.4
pve-firmware: 3.17-2
pve-ha-manager: 5.1.0
pve-i18n: 3.6.6
pve-qemu-kvm: 10.1.2-5
pve-xtermjs: 5.5.0-3
qemu-server: 9.1.4
smartmontools: 7.4-pve1
spiceterm: 3.4.1
swtpm: 0.8.0+pve3
vncterm: 1.9.1
zfsutils-linux: 2.3.4-pve1