Multiple OSDs for NVMe drives?

ozdjh

Well-Known Member
Oct 8, 2019
114
26
48
Hi

We'll be evaluating proxmox & ceph over the coming weeks and want to ensure we have a good starting point for benchmarking. We've been running a hyperconverged all-flash platform for about 7 years but it's not based on ceph. We're reading heaps trying to understand the best deployment model.

The tuning guide for all-flash deployments on the ceph.com site states that running a single OSD per physical NVMe device cannot take advantage of the performance available. We will be running 100% NVMe devices for storage (2TB drives) so this is important to us. That article was posted over 2 years ago so I'm wondering if it's still valid with the improvements to ceph?

The article recommends running 4 OSDs per device. If that's the best configuration I assume we'll have to set that up manually as I haven't seen any way to define an OSD through the GUI that doesn't reference the entire disk. Also, it looks like Ceph uses 2 partitions per OSD (metadata and storage). If we need to create 8 partitions to support 4 OSDs is there a defined size ratio between metadata and storage partitions?

Any feedback on getting the most out of an all NVMe platform would be appreciated.


Thanks

David
...
 
That article was posted over 2 years ago so I'm wondering if it's still valid with the improvements to ceph?

It can still help quite a bit, so if you have the time it probably would be the best to test it out and compare yourself for your specific setup.
But as you probably have seen in the article, there are quite a few other tunings possible too. I'd maybe go easy first, some of them can be change now live anyway, so you can play a bit around later too. Rounding up pg(p) numbers and the Threads per shard (osd_op_num_threads_per_shard) config option can help too.

Maybe @Alwin has some advise in mind, he played around with some more beefy ceph setups here.

The article recommends running 4 OSDs per device. If that's the best configuration I assume we'll have to set that up manually as I haven't seen any way to define an OSD through the GUI that doesn't reference the entire disk. Also, it looks like Ceph uses 2 partitions per OSD (metadata and storage). If we need to create 8 partitions to support 4 OSDs is there a defined size ratio between metadata and storage partitions?

You can also just use the ceph-volume lvm batch --osds-per-device <numberofosd> /dev/sdX command after you've done the ceph installation and configuration over the Proxmox VE interface, then all should be well integrated and it's not much extra effort to do manually.
 
Thanks Thomas, I hadn't seen the '--osds-per-device' option to ceph-volume. That simplifies things a lot. We'll start running this up on Friday. Once we're happy with the configuration we'll contribute back to the ceph benchmark thread to share our results.


David
...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!