Ceph/Hardware - Looking to build out a sizable Proxmox Hosting Cluster

nzurku · Feb 22, 2019

We're looking to migrate away from a large OnApp installation, and Proxmox is looking to be our solution. We have quite a large budget to get this done properly, so we were hoping if someone would be able to give us some best practices.

Our biggest concerns have been around Ceph within PVE. We're going to need about 60TB of usable capacity (after redundancy) within the next year so hyper-converged is not going to be likely although we may start with it on day one. Our current distributed storage platform stores 3 copies of all data. One on SSD for performance, and two for redundancy on HDDs. It has worked well and storage performance isn't really a reason we need to move away from OnApp... but we'd like to ensure we're not taking a step backwards.

Does anyone have any experience with running a large PVE cluster with an external Ceph or PVE-Ceph installation, and do you have any recommendations for it?

All our nodes will have 2x10GbE LACP for Ceph, 2x10GbE LACP for VM networking, and 2x1GbE LACP for sync and management. We typically use Juniper QFX5100s for 10GbE and EX4300s for 1GbE.

I had a crazy idea of running the PVE internal Ceph instance with all SSDs in the hypervisor (6 OSDs per hypervisor) and a few additional "storage bricks" that had only spinning disk at the bottom of the rack. We would then be able to serve the first copy of data from SSDs on the hypervisors, and then copies #2 and #3 would be on the slow storage servers at the bottom. How bad of an idea is this?

I do like the PVECeph GUI and management - is there any way to have storage only nodes within a PVECeph cluster? I'd like to have host systems that won't have a VMs assigned to it via the cluster.

If there is a better way to get answers on this via enterprise support via Proxmox, we'd be willing to go that route.

alexskysilk · Feb 22, 2019

The DEPLOYMENT of ceph alongside proxmox is the simplest and most easily manageable way to get ceph storage alongside a hypervisor. It doesnt mean you HAVE to use the same nodes to provide compute, OSD, and/or MON functions- you have the flexibility to use it however makes the most sense for you. You can have OSDs on nodes 1-6, Monitors on nodes 7-9, and allow compute on nodes 10-16 if you so choose.

nzurku said:
I had a crazy idea of running the PVE internal Ceph instance with all SSDs in the hypervisor (6 OSDs per hypervisor) and a few additional "storage bricks" that had only spinning disk at the bottom of the rack.

Yes you can do that. you can also have different redundancy policies for different class disk, eg 3R for vm disks on SSD OSDs and EC for cephfs data on spinning rust.

nzurku said:
We would then be able to serve the first copy of data from SSDs on the hypervisors, and then copies #2 and #3 would be on the slow storage servers at the bottom.

This is not a ceph feature. You'd need to devise this strategy yourself- I'm not aware of a "free" method to do this hot (eg, that the secondary data set comes online on its own.) There is an option to have a cache tier, but it has never worked well for me.

nzurku said:
I do like the PVECeph GUI and management - is there any way to have storage only nodes within a PVECeph cluster?

see above

nzurku · Feb 22, 2019

alexskysilk said:
The DEPLOYMENT of ceph alongside proxmox is the simplest and most easily manageable way to get ceph storage alongside a hypervisor. It doesnt mean you HAVE to use the same nodes to provide compute, OSD, and/or MON functions- you have the flexibility to use it however makes the most sense for you. You can have OSDs on nodes 1-6, Monitors on nodes 7-9, and allow compute on nodes 10-16 if you so choose.

Yes you can do that. you can also have different redundancy policies for different class disk, eg 3R for vm disks on SSD OSDs and EC for cephfs data on spinning rust.

This is not a ceph feature. You'd need to devise this strategy yourself- I'm not aware of a "free" method to do this hot (eg, that the secondary data set comes online on its own.) There is an option to have a cache tier, but it has never worked well for me.

see above

Thanks for the input!

Which method would you use for having Proxmox OS running on a server, but not allow VMs on it? Basically I'd like to just have Proxmox running on a system to act as a storage node, but I want to make sure VMs never get put on it by the cluster or HA migrations.

mir · Feb 22, 2019

nzurku said:
Basically I'd like to just have Proxmox running on a system to act as a storage node, but I want to make sure VMs never get put on it by the cluster or HA migrations.

Simply configure HA groups. See pve-docs/chapter-ha-manager.html#ha_manager_groups

udo · Feb 22, 2019

nzurku said:
We're looking to migrate away from a large OnApp installation, and Proxmox is looking to be our solution. We have quite a large budget to get this done properly, so we were hoping if someone would be able to give us some best practices.

Our biggest concerns have been around Ceph within PVE. We're going to need about 60TB of usable capacity (after redundancy) within the next year so hyper-converged is not going to be likely although we may start with it on day one. Our current distributed storage platform stores 3 copies of all data. One on SSD for performance, and two for redundancy on HDDs. It has worked well and storage performance isn't really a reason we need to move away from OnApp... but we'd like to ensure we're not taking a step backwards.

Does anyone have any experience with running a large PVE cluster with an external Ceph or PVE-Ceph installation, and do you have any recommendations for it?

Hi,
on my last job we had an "bigger" ceph cluster in mixed mode - the mon are on the pve-nodes and the osd-nodes are outside of the pve-cluster.
But this starts a long time before proxmox include the ceph-management. At this time, I would mange all with pve.

All our nodes will have 2x10GbE LACP for Ceph, 2x10GbE LACP for VM networking, and 2x1GbE LACP for sync and management. We typically use Juniper QFX5100s for 10GbE and EX4300s for 1GbE.

What do you mean with sync?

I had a crazy idea of running the PVE internal Ceph instance with all SSDs in the hypervisor (6 OSDs per hypervisor) and a few additional "storage bricks" that had only spinning disk at the bottom of the rack.

normaly that's not so easy possible, but perhaps with an trick, because you can set the primary affinity for osds - don't know if the crush map work well if you zero the primary affinity for all hdds... you should ask the ceph devs at the ceph user mailing list.

Udo

Search

Search

Ceph/Hardware - Looking to build out a sizable Proxmox Hosting Cluster

nzurku

New Member

alexskysilk

Distinguished Member

nzurku

New Member

mir

Famous Member

udo

Distinguished Member

We value your privacy