It's possible some of the following is specific to my environment... but having read a ton of threads to solve my issues, I suspect these are more general insights gained.
- NFS has had serious reliability problems when it comes to rebuilding broken links... until version 4.1 Before that version, if something (eg a crash or VM host reboot) interrupts a session, you get to carefully reboot everything. I've had much better reliability now that I specify 4.1 on both sides of the links.
- Not sure why but I initially had link-saturating performance with NFS for a while... then with no config change it suddenly dropped to piss-poor (~~ 1MB a sec instead of over 100!)... until I set explicit options designed for good performance. I now have the following options line in the nfs: sections of /etc/pve/storage.cfg --
options vers=4.1,nconnect=4,async,rsize=131072,wsize=131072 - A very handy performance test (and you can swap if/of as needed):
dd if=/dev/zero of=/mnt/pve/<<name of your nfs share>>/test.img bs=1M count=1000
- Think carefully about performance and reliability before moving VM images to shared storage. I felt pretty dumb after moving my router/firewall VM's to NFS. Even with static IP's everywhere, the network completely broke when I did that, and it was quite a hassle to recover. I also have a few large VM's... that slow down wayyy too much when the image is on the network. Not running Mellanox Infiniband here LOL.