Proxmox NFS Storage Is Slow

jdorny

New Member
May 30, 2024
2
0
1
Ok so after a ton of debuging on Proxmox 8.0.3 with slow nfs storage performance here is what I found:

Setup 3 nodes 10Gbe network
Truenas server with mirrored NVME drives
iperf shows full 10Gbe between nodes

debian bare metal host on same network gets full NFS speed to truenas server nvme share roughtly 600MiB/sec

Observations / Question:
1. When a Storage device is setup for a NFS share (on the Truenas server) via the web gui and then from the actual PVE host shell you run fio test against /mnt/pve/<mount name> max speed is about 40MiB/sec.
2. When a NFS mount is created manually on the pve host via the shell (with /test mount point) to the same truenas server the speed is roughly 600MiB/sec

Any idea what I have not done correctly?
 
Updated tonight as well to latest pve version. No change. Also looked at /proc/mounts to see if my temp nfs mount had any different parameters the the ones created by pve in /mnt/pve.

Same parameters I'm stumped how can two nfs mounts with same parameters to same location have different speeds?

Thoughts?
 
I have the same problem. Upgraded from 7.4 to 8.3, fresh install and that when the FUN started.
I have each of the 3 hosts export local 1TB NVME drive via NFS, so that the VM can be migrated between hosts, without the need to migrate storage.

In 7.4 this setup had run without problems for couple of years, but as soon as I switched to 8.4 the performance became incredibly sluggish. with network speed on NFS share dropping down to 40-50 mbs, sometimes completely stalling the physical host. Eventhough the NFS config did not change.
 
I cannot complain about /mnt/pve mount performance in general as (for us normal xfs) mounts get 300-1000MB/s. There's 1 zfs raidz1 (3x2TB) mount with the lxc templates and os isos which just gives about 1MB/s with peaks up to 40MB/s but just tuesday updated to 6.8.12-5 kernel and no access was since reboot and arc completely empty and as known without help of arc zfs performance is terrible so I'm not wondered about the behave of that mount yet.
 
What is interesting, this seems to happen only if I mount NFS share via Proxmox Storage menu.
If I mount the share via NFS in fstab and add the Directory Storage, everything runs smoothly, at least today I have written and read couple of terrabytes in and the 10G link saturated to 100% utilisation. strange.
 
Yes, that's strange but I tested exactly your problem over /mnt/pve mounts instead of manual mount and can't reproduce your problem bahavior like this.
 
We have 8.3.1 with 6.8.12-5 now and last week was 8.3.0 with 6.8.12-4 even without showing any misperformance.
 
We have 8.3.1 with 6.8.12-5 now and last week was 8.3.0 with 6.8.12-4 even without showing any misperformance.
I am on 8.3.1 and kernel 6.11.0-2-pve

Will keep an eye and will try to test again in couple of days, as I have updated and rebooted all three hosts today.
 
Are there any new insights on this issue yet? I think I might be facing the same problem. The performance on the storage itself is excellent, I would say. DD gives me 3.2 GB/s, and iperf between the node and storage shows ~10 Gbits/sec, but the performance via NFS (mounted through the Proxmox storage menu) leaves much to be desired.

@kovaga: May I ask how you mounted the storage via fstab? Specifically, what options did you use?
 
Are there any new insights on this issue yet? I think I might be facing the same problem. The performance on the storage itself is excellent, I would say. DD gives me 3.2 GB/s, and iperf between the node and storage shows ~10 Gbits/sec, but the performance via NFS (mounted through the Proxmox storage menu) leaves much to be desired.
In /etc/pve/storage.cfg set for your nfs storage config-block additional line "options nconnect=2" if using 10Gb network.
And after your nfs mount is re-etablished (remount and eg. by systemctl unit, do) "echo 8192 > /sys/class/bdi/$(mountpoint -d /my_nfs_mount_path)/read_ahead_kb" (edit to your path !!)
 
Last edited:
  • Like
Reactions: yjjoe
Are there any new insights on this issue yet? I think I might be facing the same problem. The performance on the storage itself is excellent, I would say. DD gives me 3.2 GB/s, and iperf between the node and storage shows ~10 Gbits/sec, but the performance via NFS (mounted through the Proxmox storage menu) leaves much to be desired.

@kovaga: May I ask how you mounted the storage via fstab? Specifically, what options did you use?
What I've done is I have started mounting nfs via /etc/fstab and use the soft vs hard mode, so that the system does not completely halts when there is an issue with the NFS. and add the directory as a local storage in PVE.
the fstab mount options are
soft,_netdev,nofail,x-systemd.device-timeout=10
the exportfs on the NFS server options are (rw,no_root_squash,insecure,no_subtree_check,async)

However, if something happens with the disk behind that share, i.e. it falls out or switches into the read mode, my 3 PVE hosts just exhibit crazy behaviour, slowliness, task hangs etc. depsite the soft nfs share mode. even sometimes resultings in crahed ceph osd nodes running on those PVE servers.
 
the fstab mount options are
soft,_netdev,nofail,x-systemd.device-timeout=10
the exportfs on the NFS server options are (rw,no_root_squash,insecure,no_subtree_check,async)

However, if something happens with the disk behind that share, i.e. it falls out or switches into the read mode, my 3 PVE hosts just exhibit crazy behaviour, slowliness, task hangs etc. depsite the soft nfs share mode. even sometimes resultings in crahed ceph osd nodes running on those PVE servers.
Nfs mount option "soft" should never (!!) be used as that result in your crazy behaviour if nfs server "die" even in to wrong local writes.
Don't do export option "async" on nfs server which could result in data loss.
And there is no need to mount by fstab as works fine by datacenter definition (auto-written into /etc/pve/storage.cfg).
Nfs server should be monitored properly and so normally a service defect is known and could be fixed before the server share is unavailable to pve's.
 
In /etc/pve/storage.cfg set for your nfs storage config-block additional line "options nconnect=2" if using 10Gb network.
And after your nfs mount is re-etablished (remount and eg. by systemctl unit, do) "echo 8192 > /sys/class/bdi/$(mountpoint -d /my_nfs_mount_path)/read_ahead_kb" (edit to your path !!)
Does someone that are in 10gb tried this solution with NFS share from TrueNas (hosted as VM in the cluster) and ended up increasing their IO by a good margin? If true, I feel like Proxmox should detect your NIC speed at install and get prepared to asssess the NFS share (at least add an option when creating NFS chares)
 
Last edited:
Nfs mount option "soft" should never (!!) be used as that result in your crazy behaviour if nfs server "die" even in to wrong local writes.
Don't do export option "async" on nfs server which could result in data loss.
And there is no need to mount by fstab as works fine by datacenter definition (auto-written into /etc/pve/storage.cfg).
Nfs server should be monitored properly and so normally a service defect is known and could be fixed before the server share is unavailable to pve's.
nfs >>>hard<<< link result in the completely unacceptable hanging of the client. the soft mount option at least allows you to gracefully umount and remount the directory. Anyway, NFS for the last month and a bit is used only for storing the Backups and fetching the ISOs, the VM disks are moved to ceph.

you are right, there is absolutely no need togo via the fstab + local storage root, but it is simply much easier way to debug the issue and experiment with different mount options, independent from PVE. including the async option, which in my case (weekly backups and ISOs) is perfectly sensible.
As I have written above, the NFS setup via PVE was working fine for me in th 7.x for couple of years straight, no issue whatsoever. as soon as I migrated to 8.x the slowliness kicked in.
 
nfs >>>hard<<< link result in the completely unacceptable hanging of the client.
That (hard mount) is the absolute wanted behaviour so that nfs data transfer could go on when service is available again and "app" (here vm's) processes are waiting for !!! Gracefully umount is still possible with "umount -l /my_mount_path".
 
  • Like
Reactions: UdoB
Ok so after a ton of debuging on Proxmox 8.0.3 with slow nfs storage performance here is what I found:

Setup 3 nodes 10Gbe network
Truenas server with mirrored NVME drives
iperf shows full 10Gbe between nodes

debian bare metal host on same network gets full NFS speed to truenas server nvme share roughtly 600MiB/sec

Observations / Question:
1. When a Storage device is setup for a NFS share (on the Truenas server) via the web gui and then from the actual PVE host shell you run fio test against /mnt/pve/<mount name> max speed is about 40MiB/sec.
2. When a NFS mount is created manually on the pve host via the shell (with /test mount point) to the same truenas server the speed is roughly 600MiB/sec

Any idea what I have not done correctly?
Have you filed a bug report on this? You shouldn't have to manually edit the storage.cfg to get the desired behavior if it works as expected with fstab; it definitely seems like a bug, or at the very least like the default behavior in storage.cfg might need to be adjusted.

When you say you "created manually" the NFS share, how exactly are you doing that? What options/fstab entry are you using?

@waltar , thanks for the instructions on the tweaks. I wasn't aware of either of these.
In /etc/pve/storage.cfg set for your nfs storage config-block additional line "options nconnect=2" if using 10Gb network.
And after your nfs mount is re-etablished (remount and eg. by systemctl unit, do) "echo 8192 > /sys/class/bdi/$(mountpoint -d /my_nfs_mount_path)/read_ahead_kb" (edit to your path !!)

For anyone else who had never heard of the nconnect option before, see: https://www.suse.com/support/kb/doc/?id=000019933
Usage: nconnect=<value> should be set to a number from 1 to 16 (inclusive). This will set the number of TCP connections which the client will form between itself and the NFS server, to handle all NFS work for that version of the NFS protocol.

This setting will only take effect during the client's first mount for that particular NFS server/version combination. If the client executes another NFS mount for the same NFS server/version, it will join in sharing whatever connections were established by the first mount. Subsequent mount commands cannot override the nconnect value already established. To set a new nconnect value, all of a client's mounted NFS file systems which point to a certain NFS server/version must be umounted, and then the first NFS file system to be remounted must set the desired nconnect value.

If nconnect is not specified during the first mount, then "nconnect=1" is assumed, which results in the same traditional behavior described in the Situation section above.

"mount" output and /proc/mounts contents will show the real "nconnect=<value>" in effect, with the exception that it will not show "nconnect=1" unless "nconnect=1" was explicitly specified when the mount was executed.

Although nconnect can cause many connections to open initially, they can be closed and reopened later as needed. Thus, the number of active connection which can be confirmed through netstat or other means may sometimes be less than the nconnect value.

@waltar, what is the purpose of the echo command? You'd need to do it after each reboot or use syfsutils if you wanted it to be permanent.
 
That (hard mount) is the absolute wanted behaviour so that nfs data transfer could go on when service is available again and "app" (here vm's) processes are waiting for !!! Gracefully umount is still possible with "umount -l /my_mount_path".

That (hard mount) is the absolute wanted behaviour so that nfs data transfer could go on when service is available again and "app" (here vm's) processes are waiting for !!! Gracefully umount is still possible with "umount -l /my_mount_path".
Please take time to read what I wrote, I dont use NFS share for the VM storage anymore. only for the backups.
In addition to "umount -l" I would also consider adding "-f " switch.
 
Yes, that doesn't persist a reboot. The problem is when a host boot you cannot set the value if the mount isn't done and so that could only be done later while even good for every nfs mount there could be a systemd unit which runs permanently for checking to new nfs mounts and there read_ahead.
 
  • Like
Reactions: SInisterPisces