Proxmox HOST NFS Server OVER RDMA (ROCE)

v95klima

Member
Jun 24, 2022
35
4
13
Hi
anyone have success with NFS Server over RDMA (ROCE) on the Proxmox Host directly?

I love the low energy consumption avoiding TRUENAS VMs and Windows Server VMs, and running mostly LXCs and Host services.
My network Card is a Connectx-4, and works well with SR-IOV vfs, but I'm hoping to activate an NFS on the Host with ROCE.
Thanks in advance.

I found this link for CentOS, that give the inspiration for this being possible on Proxmox Host?
https://enterprise-support.nvidia.com/s/article/howto-configure-nfs-over-rdma--roce-x
 
Proxmox Host is on confirmed RDNA link active, enp2s0f0v4

root@epyc5:~# rdma link
link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev enp2s0f0np0
link mlx5_1/1 state DOWN physical_state DISABLED netdev enp2s0f1np1
link mlx5_3/1 state DOWN physical_state DISABLED netdev enp2s0f0v1
link mlx5_4/1 state DOWN physical_state DISABLED netdev enp2s0f0v2
link mlx5_5/1 state ACTIVE physical_state LINK_UP
link mlx5_6/1 state ACTIVE physical_state LINK_UP netdev enp2s0f0v4
link mlx5_7/1 state DOWN physical_state DISABLED netdev enp2s0f0v5
link mlx5_8/1 state DOWN physical_state DISABLED netdev enp2s0f0v6
link mlx5_9/1 state DOWN physical_state DISABLED netdev enp2s0f0v7
 
Got NFS with RDMA to work with ZFS

Still trying on the the regular under /etc/exports for non zfs folders....but seems hard for me... any help on that appreciated)

FOR ZFS

On Proxmox with NFS Server:
stop nfs server:
systemctl stop nfs-kernel-server.service

enable module and add port:
/sbin/modprobe rpcrdma
echo 'rdma 20049' | tee /proc/fs/nfsd/portlist
echo 'tcp 2049' | tee /proc/fs/nfsd/portlist
confirm with:
cat /proc/fs/nfsd/portlist

restart (not start, start seem to reset the ports)
systemctl restart nfs-kernel-server.service

zfs set sharenfs="rw=@192.168.3.3/24,no_root_squash,async" poolname/datafolder
confirm with:
exportfs -v

On Proxmox with NFS Client:
enable module:
/sbin/modprobe rpcrdma
mount 192.168.3.3:/poolname/datafolder /nfsRDMA2 -o rdma,port=20049,async,noatime,nodiratime -vvvv


Credit to:
https://blog.sparktour.me/en/posts/2023/08/24/mount-nfs-via-rdma-on-mlnx-card/
BUT:
To work on PROXMOX I first followed the full description and downloaded and installed the latest MLNX_OFED package for Debian from NVIDIA, but the included install script inside TGZ downloaded did not agree that Proxmox = Debian 12.1.
So manually installed three packages from inside the downloaded TGZ:
dpkg -i mlnx-tools_24.04.0.2404066-1_amd64.deb
dpkg -i mlnx-ofed-kernel-utils_24.04.OFED.24.04.0.7.0.1-1_amd64.deb
dpkg -i mlnx-ofed-kernel-dkms_24.04.OFED.24.04.0.7.0.1-1_all.deb
rebooted and and several new tools and function related to MLNX_OFED

but it broke
/sbin/modprobe rpcrdma
and NFS RDMA would not work.

So then i decided to roll back by:
dpkg -r mlnx-ofed-kernel-dkms
then rebooted without the MLNX DKMS
this made the
/sbin/modprobe rpcrdma
work again.
The rest of the instructions I followed as described above!
 
Last edited:
Speed test with above NFS RDMA = Full Speed!

fio --name=testfile --directory=/nfsRDMA2 --size=2G --numjobs=10 --rw=write --bs=1000M --ioengine=libaio --fdatasync=1 --runtime=60 --time_based --group_reporting --eta-newline=1s
testfile: (g=0): rw=write, bs=(R) 1000MiB-1000MiB, (W) 1000MiB-1000MiB, (T) 1000MiB-1000MiB, ioengine=libaio, iodepth=1
...
fio-3.33
Starting 10 processes
testfile: Laying out IO file (1 file / 2048MiB)
testfile: Laying out IO file (1 file / 2048MiB)
testfile: Laying out IO file (1 file / 2048MiB)
testfile: Laying out IO file (1 file / 2048MiB)
testfile: Laying out IO file (1 file / 2048MiB)
testfile: Laying out IO file (1 file / 2048MiB)
testfile: Laying out IO file (1 file / 2048MiB)
testfile: Laying out IO file (1 file / 2048MiB)
testfile: Laying out IO file (1 file / 2048MiB)
testfile: Laying out IO file (1 file / 2048MiB)
Jobs: 10 (f=10): [W(10)][4.9%][eta 00m:58s]
Jobs: 10 (f=10): [W(10)][6.6%][eta 00m:57s]
Jobs: 10 (f=10): [W(10)][8.3%][w=8008MiB/s][w=8 IOPS][eta 00m:55s]
Jobs: 10 (f=10): [W(10)][11.7%][eta 00m:53s]
Jobs: 10 (f=10): [W(10)][15.0%][w=1001MiB/s][w=1 IOPS][eta 00m:51s]
Jobs: 10 (f=10): [W(10)][18.3%][eta 00m:49s]
Jobs: 10 (f=10): [W(10)][21.7%][eta 00m:47s]
Jobs: 10 (f=10): [W(10)][25.0%][w=3000MiB/s][w=3 IOPS][eta 00m:45s]
Jobs: 10 (f=10): [W(10)][28.3%][eta 00m:43s]
Jobs: 10 (f=10): [W(10)][31.7%][w=1000MiB/s][w=1 IOPS][eta 00m:41s]
Jobs: 10 (f=10): [W(10)][35.0%][w=1000MiB/s][w=1 IOPS][eta 00m:39s]
Jobs: 10 (f=10): [W(10)][39.0%][w=1001MiB/s][w=1 IOPS][eta 00m:36s]
Jobs: 10 (f=10): [W(10)][41.7%][w=9009MiB/s][w=9 IOPS][eta 00m:35s]
Jobs: 10 (f=10): [W(10)][45.0%][w=1000MiB/s][w=1 IOPS][eta 00m:33s]
Jobs: 10 (f=10): [W(10)][48.3%][w=7000MiB/s][w=7 IOPS][eta 00m:31s]
Jobs: 10 (f=10): [W(10)][52.5%][eta 00m:28s]
Jobs: 10 (f=10): [W(10)][55.9%][w=4000MiB/s][w=4 IOPS][eta 00m:26s]
Jobs: 10 (f=10): [W(10)][59.3%][eta 00m:24s]
Jobs: 10 (f=10): [W(10)][61.7%][w=2002MiB/s][w=2 IOPS][eta 00m:23s]
Jobs: 10 (f=10): [W(10)][65.0%][w=6000MiB/s][w=6 IOPS][eta 00m:21s]
Jobs: 10 (f=10): [W(10)][68.3%][eta 00m:19s]
Jobs: 10 (f=10): [W(10)][71.7%][w=3003MiB/s][w=3 IOPS][eta 00m:17s]
Jobs: 10 (f=10): [W(10)][75.0%][w=5000MiB/s][w=5 IOPS][eta 00m:15s]
Jobs: 10 (f=10): [W(10)][78.3%][w=2000MiB/s][w=2 IOPS][eta 00m:13s]
Jobs: 10 (f=10): [W(10)][81.7%][w=1000MiB/s][w=1 IOPS][eta 00m:11s]
Jobs: 10 (f=10): [W(10)][85.0%][w=3003MiB/s][w=3 IOPS][eta 00m:09s]
Jobs: 10 (f=10): [W(10)][88.3%][eta 00m:07s]
Jobs: 10 (f=10): [W(10)][91.7%][w=7000MiB/s][w=7 IOPS][eta 00m:05s]
Jobs: 10 (f=10): [W(10)][95.0%][w=1001MiB/s][w=1 IOPS][eta 00m:03s]
Jobs: 10 (f=10): [W(10)][98.3%][w=5000MiB/s][w=5 IOPS][eta 00m:01s]
Jobs: 10 (f=10): [W(10)][100.0%][w=2000MiB/s][w=2 IOPS][eta 00m:00s]
Jobs: 2 (f=2): [f(2),_(8)][100.0%][w=9.77GiB/s][w=10 IOPS][eta 00m:00s]
testfile: (groupid=0, jobs=10): err= 0: pid=69013: Thu Jul 4 18:45:46 2024
write: IOPS=2, BW=2700MiB/s (2831MB/s)(161GiB/61119msec); 0 zone resets
slat (msec): min=954, max=7524, avg=3553.49, stdev=875.68
clat (usec): min=2, max=51049, avg=453.58, stdev=4013.11
lat (msec): min=954, max=7525, avg=3553.95, stdev=875.72
clat percentiles (usec):
| 1.00th=[ 3], 5.00th=[ 4], 10.00th=[ 5], 20.00th=[ 5],
| 30.00th=[ 6], 40.00th=[ 7], 50.00th=[ 7], 60.00th=[ 8],
| 70.00th=[ 12], 80.00th=[ 61], 90.00th=[ 297], 95.00th=[ 693],
| 99.00th=[ 6390], 99.50th=[51119], 99.90th=[51119], 99.95th=[51119],
| 99.99th=[51119]
bw ( MiB/s): min=20000, max=20004, per=100.00%, avg=20000.86, stdev= 0.50, samples=155
iops : min= 20, max= 20, avg=20.00, stdev= 0.00, samples=155
lat (usec) : 4=9.09%, 10=58.79%, 20=9.09%, 50=0.61%, 100=6.67%
lat (usec) : 250=4.24%, 500=5.45%, 750=1.21%, 1000=1.21%
lat (msec) : 2=1.21%, 4=0.61%, 10=1.21%, 100=0.61%
fsync/fdatasync/sync_file_range:
sync (msec): min=7, max=301, avg=116.38, stdev=63.11
sync percentiles (msec):
| 1.00th=[ 12], 5.00th=[ 31], 10.00th=[ 42], 20.00th=[ 59],
| 30.00th=[ 71], 40.00th=[ 87], 50.00th=[ 113], 60.00th=[ 133],
| 70.00th=[ 148], 80.00th=[ 169], 90.00th=[ 203], 95.00th=[ 234],
| 99.00th=[ 279], 99.50th=[ 292], 99.90th=[ 300], 99.95th=[ 300],
| 99.99th=[ 300]
cpu : usr=1.22%, sys=28.97%, ctx=1180399, majf=755122, minf=5863483
IO depths : 1=240.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,165,0,0 short=231,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
WRITE: bw=2700MiB/s (2831MB/s), 2700MiB/s-2700MiB/s (2831MB/s-2831MB/s), io=161GiB (173GB), run=61119-61119msec

Second test only 10 seconds duration gave even better results:

Run status group 0 (all jobs):
WRITE: bw=2889MiB/s (3029MB/s), 2889MiB/s-2889MiB/s (3029MB/s-3029MB/s), io=30.3GiB (32.5GB), run=10732-10732msec
 
Last edited:
Same real world file transfer speeds as inside a SMB Direct Windows Server session with RDMA.
The ZFS pool is on a PCIE Gen 3 NVME and CPU 2.5 GHz, boost max 3.0 GHz.
Client is PCIE Gen 5 NVME and 6 GHz CPU,
 

Attachments

  • Screenshot from 2024-07-04 19-40-48.png
    Screenshot from 2024-07-04 19-40-48.png
    930.4 KB · Views: 3
  • Screenshot from 2024-07-04 19-40-58.png
    Screenshot from 2024-07-04 19-40-58.png
    936 KB · Views: 3
  • Screenshot from 2024-07-04 19-41-08.png
    Screenshot from 2024-07-04 19-41-08.png
    932.7 KB · Views: 3

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!