CEPH Experimental POC - Non-Prod

arankaspar1

New Member
Apr 11, 2025
17
1
3
I have a 3 node cluster. Has a bunch of drives, 1TB cold rust, 512 warm SataSSD And three 512 non-PLP NVMe that are Gen3.
(1 Samsung SN730 and 2 Inland TN320)
I know not to expect much - this is pre-prod -
plan is to eventually get PLPs next year.
10Gb Emulex CNA is working very well with FRR OSPF over ip6 but never get's over 3.6Gbit with these NVMes.

Writes are trash - like 10-20MBps - Win10 AsyncIO=threads,cache=non,discard/iothread=on ....tried 2/1 CEPH pool not much better.
I wanted to prove the concept, learn more about CEPH so I can speak to my ability of managing it (i.e. justify PLP costs).
I'd like to get more than 10MBps writes and sacrifice redundancy at the benefit of performance.
Plan would be to just backup weekly to local storage. Nothing is that critical - just learning mostly.
What performance tips are there for this purpose?

I know very little but of these where do you think the time is best spent? ...
NUMA node pinning, Disabling CPU sleep states, tuning individual OSDs, bluestore or pg tuning, RockDB / WAL, dropping 3/2 to 2/1?
For the sake of argument... Is there an easy way to flip the crush map to say stripe data over hosts instead of mirror?
Since I'm backing up to local ZFS or LVM I'm not concerned with the CEPH pool persistence or integrity.
Would glusterfs be better for this hardware?
 
Last edited:
Try removing the spinning drives. And per other threads PLP can make a big difference.

What are the network speeds? 10 Gbps at least?

Re crush, a given block/PG is copied to one drive on each host. You can restrict by drive type or set a primary read, or have a failure domain of drive instead of host. The latter might save a PG to three drives on the same server. But there’s no mirroring in the sense I think you’re asking about.

2/1 replicas is much less reliable but you stated that’s ok.
 
I have a 3 node cluster.
Just to be sure, did you see this? --> https://forum.proxmox.com/threads/fabu-can-i-use-ceph-in-a-_very_-small-cluster.159671/

Would glusterfs be better for this hardware?
From my (possibly wrong) understanding Gluster is deprecated, at least Redhat states: https://access.redhat.com/support/policy/updates/rhs/

But even if other people do maintain/develop it, my main point would be that it is not officially supported in the PVE context: https://pve.proxmox.com/pve-docs/chapter-pvesm.html#_storage_types

Of course this is Debian and you can install whatever you want :-)
 
  • Like
Reactions: Johannes S
Yeah I would never put them in one pool - that would be mad lol. I suppose 3 node ceph is mad too.
What is a good way to establish a shared volume outside of ceph if it's not gluster?
ZFS replication isn't a "shared volume" but something else I considered - seems involved to setup.
If you have all your drives in 1 pool, it will use the spinning drives equally to the SSD. You need 3 pools for the different tiers of storage.
 
Just to be sure, did you see this? --> https://forum.proxmox.com/threads/fabu-can-i-use-ceph-in-a-_very_-small-cluster.159671/


From my (possibly wrong) understanding Gluster is deprecated, at least Redhat states: https://access.redhat.com/support/policy/updates/rhs/

But even if other people do maintain/develop it, my main point would be that it is not officially supported in the PVE context: https://pve.proxmox.com/pve-docs/chapter-pvesm.html#_storage_types

Of course this is Debian and you can install whatever you want :-)
I did read that a few times - you sent it to me before haha. That's a nice write up and easy read.
I'm just trying to get shared storage. An iSCSI or NFS share would be amazing.
Could you serve a share using those and mount it in Datacenter storage with it being considered "shared".
It needs to be something clustered right? Unless you're doing periodic replication behind the scenes.
Have you seen any configs like this?
 
Last edited:
  • Like
Reactions: UdoB
An iSCSI or NFS share would be amazing.
Could you serve a share using those and mount it in Datacenter storage with it being considered "shared".
I am not a storage guru, sorry. But yeah, NFS is simple, old+stable and shared. A "simple" setup will create a SPOF though!

For PVE I am utilizing ZFS with replication. It gives me the performance of local drives, does not introduce networking dependencies and qualifies as "shared" - as long as you can live with some data loss of one replication interval.

Other benefits: https://forum.proxmox.com/threads/f...y-a-few-disks-should-i-use-zfs-at-all.160037/ ;-)