Severe system freeze with NFS on Proxmox 9 running kernel 6.14.8-2-pve when mounting NFS shares

+1 to this. Running PVE 9.1.5 with kernel 6.14. I have an OpenMediaVault VM which has a disk passed through and exposes a few NFS shares that are mounted in various LXCs and the PVE host itself. During heavy writes CPUs in the OMV MV become hung and occasionally deadlocks the entire host.

I've seen similar behavior running PVE 9.1.5 with kernel 6.17 and also with OMV 8.1 kernel 6.18 and OMV 7.4 with kernels 6.1 and 6.12.
 
Has this just "magically" gone away for people or just me?
I have had a huge media conversion job running for the past 8 hours and it has not hung once (normally it cant do more than 30 mins).
 
Has this just "magically" gone away for people or just me?
I have had a huge media conversion job running for the past 8 hours and it has not hung once (normally it cant do more than 30 mins).
What version are you on? 9.1.6 seams to have improved things, but I don't know if it is 100% resolved.

I have a Dell t440 with TrueNAS and Plex in VMs. Yesterday one of my friends got "Please check that this file exist and the necessary drive is mounted." which has been indicative of this issue, but overall I'm not seeing the kinds of errors or other issues I was before.
 
What version are you on? 9.1.6 seams to have improved things, but I don't know if it is 100% resolved.

I have a Dell t440 with TrueNAS and Plex in VMs. Yesterday one of my friends got "Please check that this file exist and the necessary drive is mounted." which has been indicative of this issue, but overall I'm not seeing the kinds of errors or other issues I was before.
I am on 9.1.6
pve-container is 6.1.2 (which was the thing that had the update that initially started causing me issues. `(dpkg -l | grep pve-container)`
 
+1 to this. Running PVE 9.1.5 with kernel 6.14. I have an OpenMediaVault VM which has a disk passed through and exposes a few NFS shares that are mounted in various LXCs and the PVE host itself. During heavy writes CPUs in the OMV MV become hung and occasionally deadlocks the entire host.

I've seen similar behavior running PVE 9.1.5 with kernel 6.17 and also with OMV 8.1 kernel 6.18 and OMV 7.4 with kernels 6.1 and 6.12.
FWIW, I switched my one share that experiences heavy writes to CIFS shortly after posting this and it has been stable for ~1 week so far.
 
I'm also suffering this bug:

Hardware:

Proxmox host CPU:
AMD Phenom II X6 1090T (AM3)
Proxmox host RAM: 32GB ECC DDR3
NVMe: WD Black SN7100 1TB (VMs/LXC storage)
NFS server: Debian 13 VM running inside the same Proxmox host, ZFS RAIDZ1 4×14TB (passed through), 16GB RAM, ZFS ARC limited to 10GB

Software:

Proxmox VE 9.1 / kernel 6.17.13-1-pve
NFS server:
NFSv4.2, sync mount
vzdump backing up to NFS storage (backup-NAS)

Symptoms:
During vzdump backup to NFS storage, the entire Proxmox host freezes and requires hard reboot. The WebUI becomes unresponsive and SSH dies. Interestingly, VMs keep running and communicating with each other via the internal bridge (vmbr0): a Home Assistant VM continued receiving sensor data from an ebusd LXC container via MQTT, with no gaps in the historical graphs - proving that internal VM-to-VM traffic was unaffected while host networking was completely frozen.

Note on bandwidth limiting: Limiting vzdump bandwidth via bwlimit in /etc/vzdump.conf delays the freeze but does not prevent it. The deadlock occurs regardless of write speed, confirming it is not a saturation issue but a fundamental problem in the NFS client under sustained write load.

The NFS mount options at time of freeze:

Code:
nas:/backup/proxmox-VMs on /mnt/pve/backup-NAS type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,fatal_neterrors=none,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.1.40,local_lock=none,addr=192.168.1.250)

dmesg from Proxmox host (kernel 6.17.13-1-pve):

Code:
[ 1107.487583] INFO: task CPU 2/KVM:1947 blocked for more than 245 seconds.
[ 1107.487591]       Tainted: P           O        6.17.13-1-pve #1
...
[ 1107.487626]  __schedule+0x468/0x1310
[ 1107.487660]  schedule+0x27/0xf0
[ 1107.487673]  folio_wait_bit_common+0x124/0x2f0
[ 1107.487708]  folio_wait_writeback+0x2b/0xa0
[ 1107.487721]  nfs_wb_folio+0x94/0x1e0 [nfs]
[ 1107.487893]  nfs_release_folio+0x72/0x110 [nfs]
...
[ 5285.096199] INFO: task iou-wrk-1941:24182 is blocked on an rw-semaphore
[ 5285.096224] task:iou-wrk-1941 state:D ...
[ 5285.096296]  rwsem_down_read_slowpath+0x24e/0x540
[ 5285.096308]  down_read+0x48/0xc0
[ 5285.096328]  do_exit+0x1f2/0xa20
[ 5285.096339]  io_wq_worker+0x2d6/0x390
...
[ 5343.886173] systemd[1]: systemd-journald.service: start operation timed out. Terminating.

Additional evidence isolating the NFS client as the root cause:
The same NFS server exports shares to other physical machines on the network running Debian 13 with kernel 6.12.73+deb13-amd64, all working without any issues under heavy load. This rules out a server-side problem and strongly suggests the issue is specific to the NFSv4.2 client implementation in the Proxmox 6.17-pve kernel, as the standard Debian 6.12 kernel is unaffected.

To further rule out vzdump as a factor, a direct copy of a qcow2 file from the Proxmox host to the NFS mount was performed:

Code:
cp /mnt/pve/nvme-vms/images/202/vm-202-disk-0.qcow2 /mnt/pve/backup-NAS/test.qcow2

This produced identical symptoms: host freeze, unresponsive WebUI, SSH dead, requiring hard reboot. This confirms the issue is purely the NFS client in kernel 6.17-pve under heavy write load, completely unrelated to vzdump.


Workaround:
Pinning kernel to 6.8.12-18-pve resolved the issue completely. Backups and direct NFS copies now run at full speed (~120 MB/s) without any freezes:

Code:
proxmox-boot-tool kernel pin 6.8.12-18-pve
proxmox-boot-tool refresh
reboot

Maybe Related:

Mainline kernel bug report: https://bugzilla.kernel.org/show_bug.cgi?id=219508 (NFS write lockup introduced in 6.11)

Additional notes:

Reproduced consistently with both async and sync NFS mounts
ZFS ARC on NFS server was also limited to 10GB (from default unlimited) as a precaution, but this alone did not solve the issue
vzdump tmpdir set to local NVMe (/etc/vzdump.conf: tmpdir: /mnt/pve/nvme-vms/vzdump-tmp) for performance reasons, but unrelated to the freeze
 
Last edited:
  • Like
Reactions: LonelyLou
Workaround:
Pinning kernel to 6.8.12-18-pve resolved the issue completely. Backups now run at full speed (~120 MB/s) without any freezes nor direct NFS copies:

Code:
proxmox-boot-tool kernel pin 6.8.12-18-pve
proxmox-boot-tool refresh
reboot

Thank you for putting this all together!! I will try this.
 
Last edited:
+1 to this. Running PVE 9.1.5 with kernel 6.14. I have an OpenMediaVault VM which has a disk passed through and exposes a few NFS shares that are mounted in various LXCs and the PVE host itself. During heavy writes CPUs in the OMV MV become hung and occasionally deadlocks the entire host.

I've seen similar behavior running PVE 9.1.5 with kernel 6.17 and also with OMV 8.1 kernel 6.18 and OMV 7.4 with kernels 6.1 and 6.12.
Hi @kouellette,

I'm having the same issue with PVE 9.1.6 (kernel 6.17) and a similar OpenMediaVault setup. For me, the OMV VM usually breaks first, while other VMs may stay available for a day or two. If it hasn't crashed yet, the entire host will definitely go down when I try to stop the OMV VM.

Have you found a solution yet (e.g., pinning one of the proposed kernels)?