Shared access to single hardware RAID pool by multiple VMs. Possible?

starportal

New Member
Aug 14, 2024
7
0
1
Hello,

I have a bit of a niche setup.

I have 8x NVMe drives in a custom-built computer and what I am trying to do is maximize processing performance by eliminating as many bottlenecks as I can.

These 8x NVMe drives are attached to the motherboard by PCIe. (4x NVMe disks in each PCIe ASUS Hyper RAID M2 card)

I have 3x Windows 10 Virtual machines, each running its own processing software and utilizing vGPU profiles for a boost in processing speed.

This software outputs the processed data into a shared E:\ volume shared by SMB to the virtual machines as a software RAID by a TrueNAS VM in the same proxmox host. This RAIDZ2 pool relates to the total capacity of the 8x 4TB NVMe disks. I get about 25TB total capacity out of this pool.

I don't want to use SMB sharing and I do not want to use software RAID any longer. I do not want any sharing via any networking protocol any longer if possible as the TrueNAS instance is bottlenecking the processing performance due to all VMs writing to the SMB share via it's 10Gbps vNIC. Furthermore, the TrueNAS instance is only able to have 2 CPU cores assigned to it. So this is a less than ideal set-up.

Each NVMe disk is a PCIe 4.0 disk with 7000MB/s/7,300MB/s read/write speeds which translates to ~56Gbps.

A 10Gbps vNIC simply will not cut it.

I instead want to implement a hardware RAID card via PCIe to offload RAID management and I/O to that card, thus eliminating the 10Gbps vNIC bottleneck and take any processing burden off the central CPU for software RAID management.

I found this RAID card called Broadcom MegaRAID 9560-16i: https://www.broadcom.com/products/storage/raid-controllers/megaraid-9560-16i

This RAID card can RAID together up to 32x NVMe disks using RAID-on-Chip technology. I want to install this RAID card into the motherboard and use this to present the NVMe RAID array to Proxmox.

Once I do this, my primary aim is to present the single hardware RAID pool to all 3x Windows 10 virtual machines so that they can access the same data and hopefully read/write the same data without going via any network sharing protocol or vNIC/software RAID/etc.

I'm not sure if I can create a single virtual disk for the entire hardware RAID pool and have all virtual machines share that same virtual disk? So far it doesn't seem like that could be possible?

Or otherwise I can create 3x virtual disks in that RAID pool - e.g. 10TB each and attach each separate virtual disk to each VM, but that means each VM will see their own separated volume, and furthermore, if I want to implement extra VMs in the future for processing, I would have to divide this entire RAID pool even further which will shrink storage further on each VM which is not something that can be afforded.

Another idea is to do a 1:1 passthrough of the RAID controller to one single VM and install the controller driver in that primary Windows guest. But then I believe I will still be limited to vNIC access if I want to share the volume to other VMs?

Also, it does not appear I can virtualize the RAID card with SR-IOV like I can with the GPU, so won't be able to attach virtual RAID controller functions to the VMs.

Here are the hardware specs of the build:

ASUS WRX80E SAGE WIFI II motherboard
AMD Threadripper 5975WX CPU
384GB DDR4 memory array (forgot the manufacturer)
NVIDIA RTX A5000 GPU
8x Kingston Fury NVMe disks
2x ASUS Hyper-RAID cards

The proxmox kernel version on this host is 6.5.13-6

Any ideas would be greatly appreciated!
 
Last edited:
NVMe disk is a PCIe 4.0 disk with 7000MB/s/7,300MB/s read/write speeds
it's only peak perf in multi threads and long queue.
vNIC isn't limited to 10 Gb/s, it's only cosmetic, it's limited to CPU as it's an emulated NIC.
imo, it's ZFS which slowdown your process.
Try a LXC Ubuntu as Samba Server, as LXC use bridge instead a vNIC.

I'm not sure if I can create a single virtual disk for the entire hardware RAID pool and have all virtual machines share that same virtual disk? So far it doesn't seem like that could be possible?
non, Windows cannot do this.

otherwise I can create 3x virtual disks in that RAID pool
fastest way but not shared.
 
  • Like
Reactions: Kingneutron
it's only peak perf in multi threads and long queue.
vNIC isn't limited to 10 Gb/s, it's only cosmetic, it's limited to CPU as it's an emulated NIC.
imo, it's ZFS which slowdown your process.
Try a LXC Ubuntu as Samba Server, as LXC use bridge instead a vNIC.


non, Windows cannot do this.


fastest way but not shared.
Thanks for the insight.

That has definitely cleared up confusion about vNIC for me. Didn't realize the vNIC speed is only cosmetic.

As for using LXC vs vNIC... Would I still be using the hardware RAID controller to create the single RAID pool and pass that through to the LXC Ubuntu instance? And then share that out using Samba? Wouldn't there still be performance impact on the CPU for hosting the Samba share?

Thanks again.
 
Warning: For nvme's on raid ctrl you need newest generation in the multi 4digit price otherwise you spend an amount of money and were really disappointed to write performance of non nvme optimized controller. ZFS for nvme's works excellent (raidz/draid) and is best value for you, would take raidz2 !!!
 
Last edited:
Would I still be using the hardware RAID controller to create the single RAID pool
I don't think hw controller will help you.
If you need one big space, I would use regular software raid mdadm. if you can split your process, create one datastore per disk.
 
weird, ZFS should be slower than mdadm. but I haven't exp neither hw with mdadm over nvme.
what numbers for single nvme vs mdadm ?
 
Dell R750 with 16x 1,92TB (spec ~r:3,5,w:2,5) direct nvme without raidctrl:
mdadm raid5 (14+1+1s)+xfs: r: 27GB/s, w: bad 2,5GB/s; raidz1 (14+1+1s) or draid1 (15+1 incl.1vs): r: 19GB/s, w: 18GB/s - no questions what to choose in that case without changing hw config.
 
Last edited:
Come to think of it... the raid controller has only dual core 1600MHz ARM processor.. vs my 2 cores of threadripper at 3.6GHz, I suppose the software raid would be better performance wise computationally. The raid controller does come with extra features though that might increase performance though I'm not sure.

As for LXC vs vNIC... do you guys reckon the performance benefit would be drastic switching to LXC?
 
what about mdadm Raid10, which is recommended ?
raid5 isn't performance oriented.
mdadm raid10 made no sense on that host because of being a beegfs server, you just give in config 1 to x storages with "," and beegfs will do the "raid0" over the number of servers, so "local raid10" would be in essence a raid100 ... then.
Raid5 is like raid6 performance orientated as with every additional disk it becomes faster ! There were servers in the past each with 96 hdd's (4x24) behind a hw-raid-ctrl., running brilliant 2x xfs in "raid6 as 42+2+4s" configs (HW was old and so they decided 4 hotspares/raid6).
But raid5+6 are not iops optimized but that could be solved with filesystem cache, never had raid1/10 for data as user always need space and more and more ... :)
 
As for LXC vs vNIC... do you guys reckon the performance benefit would be drastic switching to LXC?
iperf3 single cpu thread test :

edit: Windows VM to Windows VM running Epyc 7302 , 12,5 Gb/s. Windows VM to LXC = 18 Gb/s
(same speed in reverse -R , 10 Gb/s)

host > LXC 45 Gb/s vs host > Windows VM 10 Gb/s (on Laptop i7-6700HQ)

Some others iperf3 :
with host > LXC https://forum.proxmox.com/threads/v...th-proxmox-host-10gbps-nic.132157/post-581346
with host > VM https://forum.proxmox.com/threads/pfsense-vm-very-slow-network-throughput.125862/post-588921
 
Last edited:
Maybe, our pve cluster(4) which is just for testing anythink has about 40 vm's and 10 lxc's running all nfs defined and we don't have any I/O problem (iostat) onto the fileserver while it just has 4cores in front of 1 raid6 (8+2), all 1x1 + 1x10Gb network. So yet we don't hitting our storage with a database or "bad" app ... :) Manual vm live migrations took seconds with few hundred MB/s, lxc's stop/start by migration, we switched from "manual" to "maintenance", so just 1 enable cmd and all is flying over, reboot pve after update, 1 disable cmd and vm's/lxc's come back after that set next host into maintenance ... pve works brilliant :)
 
Last edited:
Maybe, our pve cluster(4) which is just for testing anythink has about 40 vm's and 10 lxc's running all nfs defined and we don't have any I/O problem (iostat) onto the fileserver while it just has 4cores in front of 1 raid6 (8+2), all 1x1 + 1x10Gb network. So yet we don't hitting our storage with a database or "bad" app ... :) Manual vm live migrations took seconds with few hundred MB/s, lxc's stop/start by migration, we switched from "manual" to "maintenance", so just 1 enable cmd and all is flying over, reboot pve after update, 1 disable cmd and vm's/lxc's come back after that set next host into maintenance ... pve works brilliant :)

Your setup sounds good. When you mention raid6, thats a raidz2 pool you've got configured, right?

You also mention a 1x10Gb network, as in 10gbe vnic? I thought vnics aren't used any longer if you're running a lxc? (is your file server a lxc?)

I just found that a windows copy job yesterday from the TrueNAS vm into the windows 10 environment made the Explorer based copy enter not responding state numerous times. Eventually checked the TrueNAS cpu utilization and found it spiked to 105% at that time. Thus, I am going to upgrade TrueNAS from 2x threadripper cores to 4 and hopefully that helps improve things.

Plus, the TrueNAS share is SMB. I want to switch it now to NFS as I feel this is a faster protocol than SMB.

Soon enough, I reckon I'll create a LXC container in the proxmox host with a NFS share through a bridged interface rather than vnic. (Or... can I just straight up create a NFS share in proxmox host directly without LXC container and bridge the share directly to the Windows VMs from the proxmox host itself?)
 
iperf3 single cpu thread test :

edit: Windows VM to Windows VM running Epyc 7302 , 12,5 Gb/s. Windows VM to LXC = 18 Gb/s
(same speed in reverse -R , 10 Gb/s)

host > LXC 45 Gb/s vs host > Windows VM 10 Gb/s (on Laptop i7-6700HQ)

Some others iperf3 :
with host > LXC https://forum.proxmox.com/threads/v...th-proxmox-host-10gbps-nic.132157/post-581346
with host > VM https://forum.proxmox.com/threads/pfsense-vm-very-slow-network-throughput.125862/post-588921
That's good to know. There definitely appears to be a discernable improvement going from vnic -> lxc vs vnic -> vnic.

Btw, whats the point of having a "1Gbps vnic" / "10Gbps vnic", etc if its just cosmetic anyway? Sounds like it doesn't matter and that speed depends on CPU performance at the end of the day.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!