Proxmox Cluster - Netapp Storage NFS read slow performance

menk

Renowned Member
Apr 3, 2014
70
4
73
Italy - Forlì
Good morning

we have 2 proxmox nodes (Dell R440 and Dell R640) connected to a Netapp AFF-A150 via 10GBit network cards (x2 LACP layer4).

Both nodes mount an NFS export (version 4.2) as a datastore.

The Devices involved are (IPs dedicated to NFS protocol)

  • pve01 192.168.98.51(Proxmox version 9.1)
  • pve02 192.168.98.52 (Proxmox version 9.1)
  • Netapp 192.168.98.101 (ONTAP NetApp Release 9.18.1)
  • HPE Networking Instant On 1960 12p 10GBT 4p SFP+ Switch x2 Switch JL805A (STACK)(last firmware)


We have the following problem:

we detect performance problems in reads (about 110MB/s), while we detect no errors in writes (about 900MB/s).

The problem manifests itself after about 10/15 minutes that the servers are active, until then even the reading performance is ok.






I attach tests performed with fio by both servers to Netapp both in read and write:

Image that contains text, screenshot, Font  AI-generated content may be incorrect.Image Containing Text, Screenshot, Black, Font  AI-generated content may be incorrect.Image that contains text, screenshot, Font  AI-generated content may be incorrect.



We ran tests with iperf3 to verify the goodness of the network connections between the two servers and on the netapp side, without detecting performance problems on the 10gbit network.



Image that contains text, screenshot, screen  AI-generated content may be incorrect.Image that contains text, screenshot, font, menu  AI-generated content may be incorrect.



I have checked the hp switches and I do not detect any errors.

I did a test by connecting one of the two Dell servers with iscsi protocol and the problem presents itself in the same way.



Doing a test and connecting another server with hyperv to the netapp does not cause the problem.

Does anyone have any ideas?

Thank you
 

Attachments

  • 1769011165589.png
    1769011165589.png
    65.8 KB · Views: 5
  • 1769011195292.png
    1769011195292.png
    141.3 KB · Views: 5
  • 1769011165636.png
    1769011165636.png
    141.3 KB · Views: 5
  • 1769011165616.png
    1769011165616.png
    160.8 KB · Views: 5
  • 1769011165624.png
    1769011165624.png
    110.2 KB · Views: 5
  • 1769011165598.png
    1769011165598.png
    100.3 KB · Views: 5
  • 1769011165606.png
    1769011165606.png
    101.5 KB · Views: 5
cat /etc/pve/storage.cfg
dir: local
path /var/lib/vz
content backup,iso,vztmpl

lvmthin: local-lvm
thinpool data
vgname pve
content rootdir,images

nfs: store01
export /store01
path /mnt/pve/store01
server 192.168.98.101
content iso,images,rootdir
nodes pve02,pve01
options nconnect=4,vers=4.2
prune-backups keep-all=1

esxi: ESX
disable
server 192.168.100.31
username root
content import
skip-cert-verification 1
 
NFSv4 uses a single TCP connection by default, which often limits read speeds to around 100–120 MB/s, even on 10 Gb links, while writes appear faster due to caching. LACP doesn’t help unless multiple connections are used. Try mounting the NFS share with nconnect=4 or nconnect=8 and make sure you’re using large I/O sizes (rsize/wsize set to 1 MB). Also verify that MTU settings are consistent end-to-end and that no QoS limits are applied on the NetApp side.
 
  • Like
Reactions: menk
With 10Gbit nfs you can get easily 1,1GB/s r+w without any nconnect settings.
With 100Gbit you come up to 3GB/s without nconnect, with additional nconnect=4 up to 10GB/s, with nconnect=6 the line gets full with 11,5GB/s (depending if the disk raidset are fast enough behind).