EDIT: Learning more by the minute, Leaving the below for the record, and those who follow.
Questions:
1) What exactly causes hardware backed raid controller to be bad for ceph?
--> Pve-docs says "Ceph is designed to handle whole disks on it’s own, without any abstraction in between. RAID controller are not designed for the Ceph use case and may complicate things and sometimes even reduce performance, as their write and caching algorithms may interfere with the ones from Ceph.", however, this doens't seem to too bad for those who just need the storage clustered, but I get the impression that "may complicate things" really means "just don't do it, it will be awful". How bad is it, and why?
1.1) I read elsewhere that the risk is in recovery time, if you setup multiple 2 drive pairs as raid 0, I seem to thin that this still sucks, but still not sure why. Is this assumption correct?
2) GlusterFS, as I read more, seems actually way more viable than I initially assumed, however, when looking at:
https://www.gigenet.com/blog/increasing-storage-speed-availability-glusterfs/
I get the impression that the storage avaialbility I am sitting on (weighted to a single node) may not be best. Is that correct?
====Original====
Request: Storage configuration on cluster of hardware backed raids
I have four nodes in my cluster. Most have a hardware backed raid array which does not support JBOD / IT mode, etc.
While I hope in future to migrate these machines to raid cards that support passing disks through, in the mean time I want to get moving on the learning project I am working towards (specifically I am working towards parsing and storing large portions of the CommonCrawl dataset, which requires way more than typical home use storage, but statically stored.)
As such, I am trying to figure out the best way to make several drives/raid arrays available to each machine ideally as a single large storage device to cut down the headache of keeping track otherwise.
Machine 1 (Primary, most powerful, ~800GB of ram, if that matters) (JBOD Available)
- Proxmox Drive (280GB SAS)
- Two M.2 SSD Drive - PCIE (Currently using one as an LVM cache)
- One ~280GB SAS
- Two 1TB Sata
Machine 2
- Proxmox Drive (2TB SATA)
- Eight 1TB drives (Currently in a HW Raid 5) - Open to whatever
- Two 4TB drives (Currently in a HW Raid 0) - Open to whatever
Machine 3
- Proxmox Drive (150GB SAS)
- Raid Array: 700GB (Few more small SAS drives)
- Raid Array: 550GB (300GB SATA drives)
Machine 4 (old gaming rig)
- Proxmox Drive (500GB SATA)
- 1TB Sata Drive
NAS - 12TB
- Current have a 4TB LUN hosting a few machines via ISCSI multipath.
I believe the hardware raid eliminates the ideal setup of Ceph, and briefly looked at glusterFS before I came to the conclusion that's probably not what I need. (maybe?)
My ideal setup is to be able to mount something similar to mounting an NFS, a mount point with as much of this storage combined into one location, specifically for loose file storage, not necessarily block store, as I am not actually going to host VMs or containers in this space.
Replication would be nice, but I intend to backup to a cloud cold storage service (AWS, Gcloud maybe) for a cloud backup, but ideally will work locally.
I am hoping someone can make a recommendation, or the name of something to look into.
Thank you!
Questions:
1) What exactly causes hardware backed raid controller to be bad for ceph?
--> Pve-docs says "Ceph is designed to handle whole disks on it’s own, without any abstraction in between. RAID controller are not designed for the Ceph use case and may complicate things and sometimes even reduce performance, as their write and caching algorithms may interfere with the ones from Ceph.", however, this doens't seem to too bad for those who just need the storage clustered, but I get the impression that "may complicate things" really means "just don't do it, it will be awful". How bad is it, and why?
1.1) I read elsewhere that the risk is in recovery time, if you setup multiple 2 drive pairs as raid 0, I seem to thin that this still sucks, but still not sure why. Is this assumption correct?
2) GlusterFS, as I read more, seems actually way more viable than I initially assumed, however, when looking at:
https://www.gigenet.com/blog/increasing-storage-speed-availability-glusterfs/
I get the impression that the storage avaialbility I am sitting on (weighted to a single node) may not be best. Is that correct?
====Original====
Request: Storage configuration on cluster of hardware backed raids
I have four nodes in my cluster. Most have a hardware backed raid array which does not support JBOD / IT mode, etc.
While I hope in future to migrate these machines to raid cards that support passing disks through, in the mean time I want to get moving on the learning project I am working towards (specifically I am working towards parsing and storing large portions of the CommonCrawl dataset, which requires way more than typical home use storage, but statically stored.)
As such, I am trying to figure out the best way to make several drives/raid arrays available to each machine ideally as a single large storage device to cut down the headache of keeping track otherwise.
Machine 1 (Primary, most powerful, ~800GB of ram, if that matters) (JBOD Available)
- Proxmox Drive (280GB SAS)
- Two M.2 SSD Drive - PCIE (Currently using one as an LVM cache)
- One ~280GB SAS
- Two 1TB Sata
Machine 2
- Proxmox Drive (2TB SATA)
- Eight 1TB drives (Currently in a HW Raid 5) - Open to whatever
- Two 4TB drives (Currently in a HW Raid 0) - Open to whatever
Machine 3
- Proxmox Drive (150GB SAS)
- Raid Array: 700GB (Few more small SAS drives)
- Raid Array: 550GB (300GB SATA drives)
Machine 4 (old gaming rig)
- Proxmox Drive (500GB SATA)
- 1TB Sata Drive
NAS - 12TB
- Current have a 4TB LUN hosting a few machines via ISCSI multipath.
I believe the hardware raid eliminates the ideal setup of Ceph, and briefly looked at glusterFS before I came to the conclusion that's probably not what I need. (maybe?)
My ideal setup is to be able to mount something similar to mounting an NFS, a mount point with as much of this storage combined into one location, specifically for loose file storage, not necessarily block store, as I am not actually going to host VMs or containers in this space.
Replication would be nice, but I intend to backup to a cloud cold storage service (AWS, Gcloud maybe) for a cloud backup, but ideally will work locally.
I am hoping someone can make a recommendation, or the name of something to look into.
Thank you!
Last edited: