performance of SAN nvme storage

Spiros Pap

Renowned Member
Aug 1, 2017
91
2
73
45
Hi all,

i have a proxmox cluster that utilizes shared luns (LVM) from an IBM 9500 flashsystem. The storage is attached to the hosts via FC16 for main use and via 100G for iscsi/NVMEoverTCP ethernet connections. The storage can supposely reach 1M+ iops.
I was doing various tests the other day and i realised that the performance was not what i expected. So can you advise if you believe that the numbers i give below are good enough or bad enough and what my expectations should be?

Hosts use 10G adapters for ethernet storage connections. Proxmox using linux bridge for vmbr2 which is what the VM uses for storage connections. VM ethernet adapter for storage is virtio with 4 queues. VM is using MTU 1500.

This is the command i use:
fio --name=iscsi-test --filename=/dev/xxxx --rw=randrw --bs=4k --iodepth=32 --numjobs=4 --runtime=20s --time_based --group_reporting --direct=1 --ioengine=libaio --rwmixread=70

These are the results:
sdb disk attached to VM from the host which is attached to storage via FibreChannel 16. This is a virtio disk attached to the VM.
read: IOPS=36.0k, BW=140MiB/s (147MB/s)
write: IOPS=15.5k, BW=60.4MiB/s (63.3MB/s)

nvme0n1 disk attached to VM via NVME over TCP single path (NVMEoTCP running inside VM)
read: IOPS=41.2k, BW=161MiB/s (169MB/s)
write: IOPS=17.7k, BW=69.2MiB/s (72.6MB/s)

sdc disk attached to VM via iSCSI single path(iscsi running inside VM)
read: IOPS=40.6k, BW=159MiB/s (166MB/s)
write: IOPS=17.5k, BW=68.2MiB/s (71.5MB/s)

/dev/mapper/mpatha disk attached to VM via iSCSI and using two paths(iscsi running inside VM)
read: IOPS=47.6k, BW=186MiB/s (195MB/s)
write: IOPS=20.5k, BW=80.0MiB/s (83.9MB/s)

nvme0n1 disk attached to VM via NVME over TCP using two paths (NVMEoTCP running inside VM)
read: IOPS=40.0k, BW=156MiB/s (164MB/s)
write: IOPS=17.2k, BW=67.1MiB/s (70.4MB/s)

While the fio is doing random read-write, are the above numbers too small? When i do sequential operations i get the full 10G performance with much less IOPS.
Is the 60K-70K IOPs total, my limit when the storage claims to have millions of IOPs? Is there anything i can do to improve performance?

Regards!
sp

PS: I haven't done tests with the hosts themselves (over FC/iSCSI/NVME_o_TCP) but in any case i am mainly interested in the performance of VMs .
 
Last edited:
The results seem quite low indeed. Qemu itself is able to reach 200~300k iops with 1 core. (I'm working to add support for mutiple iothread to increase the performance up to 600kiops by disk, but you are far from reaching the current cpu limit).

(I'm not sure about the performanc when doing NVMEoTCP inside the vm direct through virtio, but your host performance is not great too)
 
  • Like
Reactions: _gabriel
Thanks for sharing all this data.

From what I can tell, one of two things is likely true: either the storage system isn't delivering the expected performance (possibly due to its disk configuration), or the host hardware setup isn't able to fully utilize it. This doesn't appear to be a Linux or Proxmox issue.

As Spirit mentioned, a single VM can typically achieve around 200–300K IOPS with a single QEMU disk. Reaching the higher end of that range does require the right hardware configuration. For reference, we do have customers running Kubernetes/CSI with NVMe/TCP devices from inside the VM, seeing roughly double that performance on a single disk. So yes... virtio networking combined with NVMe/TCP can indeed outperform a native QEMU disk.

I'd recommend running tests directly on the bare metal to establish a baseline for your SAN's capabilities.

One quick question: how is the storage attached to the VM? (e.g., LVM, LVM+QCOW, etc.)


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Thanks for sharing all this data.

From what I can tell, one of two things is likely true: either the storage system isn't delivering the expected performance (possibly due to its disk configuration), or the host hardware setup isn't able to fully utilize it. This doesn't appear to be a Linux or Proxmox issue.

As Spirit mentioned, a single VM can typically achieve around 200–300K IOPS with a single QEMU disk. Reaching the higher end of that range does require the right hardware configuration. For reference, we do have customers running Kubernetes/CSI with NVMe/TCP devices from inside the VM, seeing roughly double that performance on a single disk. So yes... virtio networking combined with NVMe/TCP can indeed outperform a native QEMU disk.

I'd recommend running tests directly on the bare metal to establish a baseline for your SAN's capabilities.

One quick question: how is the storage attached to the VM? (e.g., LVM, LVM+QCOW, etc.)


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
we are using LVM. The storage presents LUNs, which are initialized by LVM and then proxmox cuts LVs for each VM disk. But this is only for any disk that is attached to the VM by proxmox. The other tests,are tests where the VM itself is connecting to the storage via iscsi or nvme_o_tcp, so host storage does not really matter (networking does).

Sp
 
Last edited:
we are using LVM. The storage presents LUNs, which are initialized by LVM and then proxmox cuts LVs for each VM disk.
Understood. LVM does have performance limits, but you're well below them. If it were me, I'd start by testing read-only workloads directly on the LV - no infrastructure fiddling required.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
4K IOPS or larger ones?


Yes. For comparison, I have an Dell ME5024 over 32 Gb FC with SAS-SSDs and 12x 1.92 TB ADAPT yield with numjobs=16 229k IOPS/s 4K qd32.


Maybe look at this good documentation about storage performance analysis
https://kb.blockbridge.com/technote/proxmox-iscsi-vs-nvmetcp/
They are quoting numbers like 100G/s and 8million IOPS 4k (warm) and 2.5M IOPs 4K readmiss.... The drives are FCM4 19TB flash modules. The question how is it possible to be so far from this spec. The hosts are HPE380G10 with Xeon(R) Gold 6148 CPU @ 2.40GHz. They are not the best but I guess they are ok.
 
The storage is attached to the hosts via FC16 for main use and via 100G for iscsi/NVMEoverTCP ethernet connections. The storage can supposely reach 1M+ iops.
that is patently impossible to a single channel. 16gbit/s yields a maximum THEORETICAL 409.6 kIOPs at 4k/iop. In PRACTICE your actual performance depends on multiple factors and will never actually reach the theoretical max.

All your iops test results appear to be the same- you are source bound; what is your fio command line? is this on a VM or container, and how many cores are assigned? (random operations require rng which is cpu bound.)
 
The fio command line arguments are in my initial post. the fio tests are always run from inside a VM on proxmox. The VM has 4 cores but I don't think the performance is CPU bounded, in the sense that my cores are 2.4Ghz (Xeon Gold 6148) and if only get 70Kiops, what CPU should i have to reach 280Kiops as others have done? a 10Ghz core?