Severe system freeze with NFS on Proxmox 9 running kernel 6.14.8-2-pve when mounting NFS shares

I've been watching this thread for a while - glad people are making progress.

I'm running an OpenMediaVault VM using normal disk passthrough of a 4TB NVME and a couple of old sata hard drives. I'd seen that NFS client activity from an LXC to this VM's NFS server can lock things up. NFS traffic from an machine external to this VM seems solid. My less-that-ideal workaround was to have the LXC use sshfs to connect to OMV. If I can figure out how to move to PCI passthrough of the NVME without breaking things I probably will. I'm curious what people think about doing PCI passthrough of the SATA controller for the two hard drives. I currently don't NFS share these drives (they are just used for backup of the NVME drive). This would make the OMV instance look more like a physical machine (it would see SMART, control spindown of all the storage drives etc), but is there some downside I don't see?
 
Some more data points.

b550/3700x
NFS exports defined in a truenas vm
NFS client was the pve host itself
"cat /dev/random > filename.ext" in the exported folder to geneate nfs load
Monitoring tn pool status with "zpool iostat 1" in the tn vm

Installed prox 8.2 from scratch and retested.. same nfs lockups/disconnects with the virtual disk as well as 2.5" external hd via usb 3.0, kernel 6.8.4-2. I couldn't test with the hba as that was unavailable.

Upgrading 8.2 to the latest 8.4 with latest 6.8.12 (?) kernel was successful. No crashes writing to either the virtual disk or the usb drive. Wrote over 20GB using either method.

Upgraded to the latest pve 9.1 with 6.17.4 kernel. Lockup/disconnect with both external usb and virtual disk again. Box is still on 9.1. I so added the hba back in and retested. As expected it worked with the hba in passthrough. Retested virtual disk again too (with hba installed and passed through). This time it worked fine!@# Test a few more times, writing to the hba disk was 100% reliable. Writing to the virt disk with the hba passed through wasn't as reliable. It didn't crash/lock every time, but did once or twice.

With the hba removed from pass through, writing to the virt disk locked/disconnected each time.

Virtual disks (and phsyical disks passed through?) appears to be buggy when used in a vm based nas over nfs. So if one wants to use pve 9.1, stick to using an hba would be my advice. I'm done testing.

Given the above, I don't perceive I should encounter issues on the production box upgrading to 9.1 as it has the sata controllers (equiv to the hba) in pass through and there are no virtual disks or physical disks in passthrough. However, at this point, I think I will update to the latest 8.4 and leave it there until pve 9.2 or 9.3 rolls out. It's a single node server doing modest tasks. Nic pinning would be useful as it is headless and sometimes adding pci cards does change pci enumeration.
 
Last edited:
After upgrade PVE 8->9 confirming lockups with errors in dmesg like:
```
INFO: task nfsd:546 blocked for more than 122 seconds.

```
environment:

HW: x370/x3700x
nfsd -> enabled only 4.2 (for server-side copying)
nfs clients -> on same server inside containers.

both kernels 6.17 and 6.14 locking up after io-intensive load.

Currently "fixed" this by using 6.5 kernel from PVE8 repo.
So very likely some bug in NFS client or NFS server modules at some point in time on (or before) 6.14 kernel
 
Currently "fixed" this by using 6.5 kernel from PVE8 repo.

i know this is not the topic here but is there an easy guide how to do this? i listed all kernels in my 9.1 install but only have these 3

6.14.11-5-pve
6.17.4-1-pve
6.17.4-2-pve

how do i get older kernel from PVE8 on my PVE9 install?

Thanks!!
 
how do i get older kernel from PVE8 on my PVE9 install?
1. add
Code:
deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription
to your sources
2. apt-get update
3. apt-get install proxmox-headers-6.5 proxmox-kernel-6.5
On this stage "proxmox-boot-tool kernel list" will NOT show this kernel, so you must get it from dpkg: "dpkg --get-selections | grep 6.5"
4. Then "proxmox-boot-tool kernel pin 6.5.13-6-pve"
5. reboot
 
  • Like
Reactions: LonelyLou
using 6.5 kernel didnt work for me. disconnect under heave load again. I will have to do the fresh install of 8.4.... i was so hoping i wouldnt have to...
 
Thanks for sharing. With the issues i ran into above (see previous page), I thought it was related to how disks were passed through to the vm. Yours seems more like a case where the cifs client (nfs too likely) is broken somehow in 9.x in certain scenarios.
 
  • Like
Reactions: LonelyLou
Just a heads up for those trying to fix this (both Proxmox devs and everyone else). I had this issue even before upgraded to 9.1. I was running the `6.8.12-18` kernel. I'm not sure if anyone here who is having problems can confirm that one also has the issue. I did not try `6.8.12-17` since I didn't have it installed.

However I *can* confirm that `6.8.12-16` does NOT have the issue. When transferring massive files my "IO Pressure Stall" goes up to about 17% and then comes right back down after the file finishes transferring (a couple seconds at most). Previously it was stuck at between 50-60% and never came back down.

I'd love it if someone could test `6.8.12-17` and `6.8.12-18` to see if either of those have the issue for anyone else. I think it would be MUCH better if they did since I'm sure the code changes are far fewer for those from `6.8.12-16` than the `6.14` and `6.17` releases.
 
  • Like
Reactions: LonelyLou
My solution.
nano /etc/fstab

Add vers=3,lookupcache=none to line
Code:
192.168.2.1:/mnt/folder/ /mnt/folder nfs auto,nofail,noatime,nolock,intr,tcp,vers=3,lookupcache=none,actimeo=1800 0 0

mount -a
systemctl daemon-reload
 
  • Like
Reactions: LonelyLou