My results are mixed. Load on the storage server is lower and latency is usually better with RDMA, but beside that it's hard to tell whether you gain or lose with RDMA. Running iometer on a Windows VM I get the following results (14 SATA disks with 2 RAID1 Intel DC3700 100GB version as flashcache):
NO RDMA:
[TH="bgcolor: #666688"]Test name[/TH]
[TH="bgcolor: #666688"]Latency[/TH]
[TH="bgcolor: #666688"]Avg iops[/TH]
[TH="bgcolor: #666688"]Avg MBps[/TH]
[TH="bgcolor: #666688"]cpu load[/TH]
[TD="bgcolor: #111144"]
Max Throughput-100%Read
[/TD]
[TD="bgcolor: #111144"]3.35[/TD]
[TD="bgcolor: #111144"]5104[/TD]
[TD="bgcolor: #111144"]159[/TD]
[TD="bgcolor: #111144"]95%[/TD]
[TD="bgcolor: #111144"]
RealLife-60%Rand-65%Read
[/TD]
[TD="bgcolor: #111144"]3.35[/TD]
[TD="bgcolor: #111144"]4621[/TD]
[TD="bgcolor: #111144"]36[/TD]
[TD="bgcolor: #111144"]84%[/TD]
[TD="bgcolor: #111144"]
Max Throughput-50%Read
[/TD]
[TD="bgcolor: #111144"]3.77[/TD]
[TD="bgcolor: #111144"]3524[/TD]
[TD="bgcolor: #111144"]110[/TD]
[TD="bgcolor: #111144"]93%[/TD]
[TD="bgcolor: #111144"]
Random-8k-70%Read
[/TD]
[TD="bgcolor: #111144"]2.09[/TD]
[TD="bgcolor: #111144"]4760[/TD]
[TD="bgcolor: #111144"]37[/TD]
[TD="bgcolor: #111144"]91%[/TD]
[TD="bgcolor: #111144"]
4k-Max Throu-100%Read-100%Random
[/TD]
[TD="bgcolor: #111144"]1.63[/TD]
[TD="bgcolor: #111144"]5203[/TD]
[TD="bgcolor: #111144"]20[/TD]
[TD="bgcolor: #111144"]89%[/TD]
[TD="bgcolor: #111144"]
4k-Max Throu-100%Write-100%Random
[/TD]
[TD="bgcolor: #111144"]2.04[/TD]
[TD="bgcolor: #111144"]4670[/TD]
[TD="bgcolor: #111144"]18[/TD]
[TD="bgcolor: #111144"]92%[/TD]
RDMA:
[TH="bgcolor: #666688"]Test name[/TH]
[TH="bgcolor: #666688"]Latency[/TH]
[TH="bgcolor: #666688"]Avg iops[/TH]
[TH="bgcolor: #666688"]Avg MBps[/TH]
[TH="bgcolor: #666688"]cpu load[/TH]
[TD="bgcolor: #111144"]
Max Throughput-100%Read
[/TD]
[TD="bgcolor: #111144"]2.75[/TD]
[TD="bgcolor: #111144"]4597[/TD]
[TD="bgcolor: #111144"]143[/TD]
[TD="bgcolor: #111144"]94%[/TD]
[TD="bgcolor: #111144"]
RealLife-60%Rand-65%Read
[/TD]
[TD="bgcolor: #111144"]2.06[/TD]
[TD="bgcolor: #111144"]4538[/TD]
[TD="bgcolor: #111144"]35[/TD]
[TD="bgcolor: #111144"]92%[/TD]
[TD="bgcolor: #111144"]
Max Throughput-50%Read
[/TD]
[TD="bgcolor: #111144"]2.82[/TD]
[TD="bgcolor: #111144"]4260[/TD]
[TD="bgcolor: #111144"]133[/TD]
[TD="bgcolor: #111144"]92%[/TD]
[TD="bgcolor: #111144"]
Random-8k-70%Read
[/TD]
[TD="bgcolor: #111144"]1.94[/TD]
[TD="bgcolor: #111144"]4606[/TD]
[TD="bgcolor: #111144"]35[/TD]
[TD="bgcolor: #111144"]90%[/TD]
[TD="bgcolor: #111144"]
4k-Max Throu-100%Read-100%Random
[/TD]
[TD="bgcolor: #111144"]1.95[/TD]
[TD="bgcolor: #111144"]4758[/TD]
[TD="bgcolor: #111144"]18[/TD]
[TD="bgcolor: #111144"]92%[/TD]
[TD="bgcolor: #111144"]
4k-Max Throu-100%Write-100%Random
[/TD]
[TD="bgcolor: #111144"]2.42[/TD]
[TD="bgcolor: #111144"]4415[/TD]
[TD="bgcolor: #111144"]17[/TD]
[TD="bgcolor: #111144"]91%[/TD]
I use Mellanox Technologies MT25418 [ConnectX VPI PCIe 2.0 2.5GT/s - IB DDR / 10GigE] dual port cards on Proxmox servers and Mellanox Technologies MT25208 InfiniHost III Ex (Tavor compatibility mode) dual port cards on storage servers.
The results are based on Iometer test from here:
http://vmblog.pl/OpenPerformanceTest32-4k-Random.icf