Lessons learned about using NFS shared storage in a ProxMox environment (w/ Synology?)

MrPete

Active Member
Aug 6, 2021
125
62
33
67
It's possible some of the following is specific to my environment... but having read a ton of threads to solve my issues, I suspect these are more general insights gained.

  • NFS has had serious reliability problems when it comes to rebuilding broken links... until version 4.1 Before that version, if something (eg a crash or VM host reboot) interrupts a session, you get to carefully reboot everything. I've had much better reliability now that I specify 4.1 on both sides of the links.
  • Not sure why but I initially had link-saturating performance with NFS for a while... then with no config change it suddenly dropped to piss-poor (~~ 1MB a sec instead of over 100!)... until I set explicit options designed for good performance. I now have the following options line in the nfs: sections of /etc/pve/storage.cfg --
    options vers=4.1,nconnect=4,async,rsize=131072,wsize=131072
  • A very handy performance test (and you can swap if/of as needed):
    dd if=/dev/zero of=/mnt/pve/<<name of your nfs share>>/test.img bs=1M count=1000
  • Think carefully about performance and reliability before moving VM images to shared storage. I felt pretty dumb after moving my router/firewall VM's to NFS. Even with static IP's everywhere, the network completely broke when I did that, and it was quite a hassle to recover. I also have a few large VM's... that slow down wayyy too much when the image is on the network. Not running Mellanox Infiniband here LOL.
 
NFS has had serious reliability problems when it comes to rebuilding broken links... until version 4.1 Before that version, if something (eg a crash or VM host reboot) interrupts a session, you get to carefully reboot everything.
This is correct, NFSv3 is stateless, when the session is broken the client is often stuck and locks, mount handle and file handles change when the host comes back after crash. NFSv4 is stateful. And if its serious production and uses hard mounts (as it should) - then the only recovery is reboot.
A very handy performance test (and you can swap if/of as needed):
dd of zeroes is rarely a good test, the OS will likely optimize it out. I would recommend "fio".

In general, a lot in NFS performance depends on a) nfs server implementation b) disk, network, cpu speed of server and client

That said, your findings are not unusual.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
fio gives interesting results... on write, I can easily saturate the link. Reads? Have not found a way to get anywhere close. I suspect the far end somehow... but needs more time to dig than I have right now.

fio --name=fio-test --rw=read --direct=1 --ioengine=libaio --bs=32k --numjobs=4 --iodepth=4 --size=1G --runtime=30 --group_reporting --directory=<<target folder>>
 
fio gives interesting results... on write, I can easily saturate the link. Reads? Have not found a way to get anywhere close.
The usual reasoning is this: on write the far end lies - it is just buffering your fio-data without actually writing it. But on read it has to talk to the disks to really read the data before being able to deliver it to you.

Just my personal interpretation...
 
It's possible some of the following is specific to my environment... but having read a ton of threads to solve my issues, I suspect these are more general insights gained.

  • NFS has had serious reliability problems when it comes to rebuilding broken links... until version 4.1 Before that version, if something (eg a crash or VM host reboot) interrupts a session, you get to carefully reboot everything. I've had much better reliability now that I specify 4.1 on both sides of the links.
  • Not sure why but I initially had link-saturating performance with NFS for a while... then with no config change it suddenly dropped to piss-poor (~~ 1MB a sec instead of over 100!)... until I set explicit options designed for good performance. I now have the following options line in the nfs: sections of /etc/pve/storage.cfg --
  • A very handy performance test (and you can swap if/of as needed):
  • Think carefully about performance and reliability before moving VM images to shared storage. I felt pretty dumb after moving my router/firewall VM's to NFS. Even with static IP's everywhere, the network completely broke when I did that, and it was quite a hassle to recover. I also have a few large VM's... that slow down wayyy too much when the image is on the network. Not running Mellanox Infiniband here LOL.
What backend are you using to for the NFS share?
I have truenas core (may move to scale soon) and not sure if it is supporting NFSv4.1 or just 4.0 I'd like to set it to 4.1 or 4.2 if possible.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!