INFINIBAND ?!

fuas

New Member
Jul 7, 2022
2
0
1
53
Frankfurt
www.frankfurt-university.de
Hi all,

at the moment we are in the planing stage of a small proxmox setup and I while digging for enterprise NVME to hold our virtualized machines while they are active, I thought about infiniband for some reason:
BTW, I chose KIOXIA CM-V SSDs for budget reasons. So having some machines which will all use ZFS for storage (one fast pool made of NVME to hold the live machines and one slow pool Z-2 made of SATA spinning disks to hold inactive machines and backups) I thought about sharing the NVME (pool) using Remote DMA.
Infiniband is used for special applications (e.g. the ALICE experiment at CERN has 200GBit Infiniband to transfer "picture data" between compute clusters) where data is just copied into RAM by a remote machine, hence the name remote dma.

A NVME disk is "just a pcie device" ....
- I must think about the history of disk drives: IDE means Integrated Device Electronics, the IDE discs of the late 1980s had their disc controller (no MFM or RLL controller board needed anymore) integrated into the disk drive electronics board, which contained stepper motor controllers, head read/write amplifiers and a microcontroller for house keeping. So they "just needed to be connected to the AT-Bus". This was done in the first "controller boards" using address decoders and some latches in order to save on lines for the address bus of ISA and to pass the ISA bus as IDE-bus to the disk.

Today the same step has been made with pci-express discs, called nvme. They are just atteched to the PCI-e bus and behave like a pci device like the IDE disk behaves like a ISA bus harddisk controller in the oldes operating mode.

So as pci-e devices share address room with the machine or other pci-e devices they can be read and written to using memory mapped I/O. If they are bus master they could also write into memory if they want to (the adress space is limited by configuration and hopefully enforced by IOMMU).

So an infiniband host bus adapter is also just a pci express device which can do DMA into the memory of the computer. So why not just take infiniband to remote connect the NVME disks to another server of a group of servers, by "stealing" their memory transfers controlled by the remote machine.

A more advanced method would be to somehow share a storage pool using zfs via infiniband (e.g. 56Gbit) so that all machines can "see" the pool and do read/write operations there. So some intermediate level of the ZFS driver needs to use infiniband, I guess.
So the access to the pool needs to be distributed using the RDMA approach of infiniband in order to get similar performance compared to a ZFS-pool connected to the local server.

I there any research in this field? The advantage would be better utilization of NVME drives distributed between servers while not needing to buy any esoteric proprietary boxes which bring the concept of fibre channel to NVME
Any help or ideas are welcome.

Have fun with your computers.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!