Severe system freeze with NFS on Proxmox 9 running kernel 6.14.8-2-pve when mounting NFS shares

I'm a little disappointed that no one from the team or the developers has commented on the problem yet.
Usually, that means there's nothing concrete to say yet.

This is a real issue, but it's also been noticeably difficult to pin down the causes on it. Something's going on in 6.14 with NFS, but not for everyone, and it doesn't seem easily repeatable in a way that would make testing easier.

Not to mention NFS setups themselves tend to get heavily tweaked, so incoming reports all have different underlying configurations.
 
Hey all, just seen this thread as a reply to a post I made a few days ago and seems related (although im using CIFS / Synology) some of it does sound the same

under normal usage it presents itself as random disconnections. (sonarr processing stalls or playback of media just stops)

Id replicated my setup from a pi4 (docker with jellyfin/sonarr/nextcloud but using a nice silent minipc with better cpu/ram and I prefer proxmox's ability for backup/restore and have tested restoring the container to a different device & get exactly the same symtoms.

(I first noticed just watching shows/films, some days its fine, some days random freezes) at first i just restarted the nas/proxmox but as it carried on ive been looking more into it, & it just appears that any large network transfers (e.g proxmox network backup, sonarr handling files, tdarr health checks etc) cause the mounts to vanish,

seen some posts mentioning intel nic issues and turning tso/gso/gro off which i tried last night, made zero difference (i can reliably reproduce this issue within 20-30 secs just by manually running a jellyfin media segment scan/keyframe extract task) so is easy to test things to see if anything fixes it, thus far nothing has changed and is broken.

NAS settings smb2/smb3/oplocks on/off etc
Network - tried jumbo frames / different ports / different cables (but i dont suspect network as the pi4 setup works flawless)
Firewall - unsered IDS/DOS protection bypassed
Proxmox network - tried tso/gso/gro off, tried a different USB3 nic - no change


would be nice to get this running reliably, the pi4 setup works flawlessly (but obviously less ram/cpu/slower processing) so proxmox on an i7/16gb/nvme should theoretically wipe the floor with it (and it does, for a short while!) its just so unreliable and unstable with this mounts issue.

for now for my sanity & reliability / usability i'll stick with the Pi4b setup, but if anyone has any solid fix for this, would love to hear what fixed it!

thanks
 
Hi everyone,
I’m dealing with persistent I/O stalls in a setup where:
  • Proxmox VE (host) runs TrueNAS SCALE as a VM
  • The Proxmox host mounts NFS/SMB exports from the TrueNAS VM
  • These mounts are bind-mounted into unprivileged LXC containers (Plex, Tdarr, TubeArchivist)
Under heavier read/write workloads, I consistently see I/O freezes, kernel logs with NFS “not responding”, CIFS reconnect loops, and occasional hung tasks (Tdarr_Server, ffprobe, HandBrakeCLI, dmx0:matroska,w) stuck in state D.

Here is my environment:
  • Proxmox VE: kernel 6.14.11-4-pve
  • TrueNAS SCALE: running as a VM (VirtIO NIC + disks via HBA passthrough)
  • TrueNAS VM: 4 vCPUs (tested with 8 vCPUs too), ZFS on raw disks
  • Networking: 10 GbE (LACP bond, MTU 1500, RSTP + flow control disabled, VirtIO interface)
  • LXC containers (unprivileged):
    • 117: Plex – mostly read, rare writes
    • 120: Tdarr – transcodes (heavy RW)
    • 130: TubeArchivist – RW for metadata and thumbnails
SYMPTOMS/LOGS:
NFS (on Proxmox host):
nfs: server 192.168.30.104 not responding, still trying
...
nfs: server 192.168.30.104 OK

Typically repeating in long cycles:
[31958.410247] nfs: server 192.168.30.104 not responding, still trying
[32741.776511] nfs: server 192.168.30.104 OK
[33028.495824] nfs: server 192.168.30.104 not responding, still trying
[33749.397840] nfs: server 192.168.30.104 OK

And frequent hung tasks (state D) during flush/close:
INFO: task Tdarr_Server:661925 blocked for more than 122 seconds.
...
nfs_wb_all -> nfs4_file_flush -> filp_flush -> __x64_sys_close

SMB (on Proxmox host):
CIFS: VFS: \\192.168.30.104 has not responded in 45 seconds. Reconnecting...
CIFS: trying to dequeue a deleted mid

What I’ve already tested:
NFS and SMB tuning – various option profiles, Mount separation (each container gets its own NFS/SMB mount), Network & kernel tuning (Flow Control + RSTP disabled)).

Result: despite all above, NFS “not responding” and CIFS “Reconnecting…” persist, especially during concurrent RW (Tdarr) and RO (Plex) workloads.
Occasionally Tdarr processes enter D state due to kernel waits in nfs_wb_all or netfs_write.

To be clear, in my previous version of Proxmox, which was 8, everything worked perfectly without any problems with NFS or SMB mounts in unprivileged LXC containers.
 
I have NFS stalls as well when doing high I/O operations like disk moves. I tried NFS version 3 , also applied different tunning parameters to NFS mount and nothing seem to help stalls and freezes continue to happen.