Proxmox small/medium Hyper-Converged cluster questions

sebyp · Jul 19, 2023

Hello everyone,

I'm reaching out for the community for some pointers regarding an educational project I got going on.

Description: I'm building a cyber security range, which is a virtual environment where students can practice various security skills, the most interesting for our topic being attack & defense. For short, this means that I need to be able to quickly spin up anything between 10 to 50 linked clones at a time. These clones have various different base templates, either Linux or Windows. The VMs themselves are short lived, tipically from a couple of hours to a couple of days.
The setup that I've imagined for this is the following:

Host configuration (3-5 hosts, depending on what price I get from my hardware vendor):
- CPU: 1 x AMD Epyc 7542 (32c/64t)
- RAM: 1TB
- storage:
- 2 x M2 RAID1 for OS
- 8 - 10 x 1 TB SSD SATA-3 drives in JBOD/HBA/IT mode for Ceph OSDs
- network:
- 2 x 10/25 Gbps NICs for Ceph replication
- 2 x 10/25 Gbps NICs for VM traffic

For networking gears I'm thinking of going with 2 x 10/25 Gbps switches, probably Cisco Nexus, but I'm willing to consider other vendors as well.
And now come the questions:

1. I'm thinking of building a hyper-converged CEPH cluster out of all of these boxes. From everyone's experience, is this feasible? Do consider that these are not "production" VMs, so a *slight* penalty in performance is understood and accepted.

2. Since budget is always a constraint, I'm in favour of more nodes (with smaller RAM and / or number of SSDs and/or smaller SSD capacity) rather than 3 big ones, both from CEPH's point of view and considering that the load will be distributed across multiple nodes. Is this a good idea?

3. Is there any limitation in the numbers of VMs I can provision at the same time? I've done some testing with a small 3 node cluster and occasionally I would get en error back `TASK ERROR: clone failed: cfs-lock 'storage-vm-storage' error: got lock request timeout`, but using Ansible's retry capabilities I was able to circumvent it. Nevertheless, I would like to know from everyone's experience if there are some limitations here.

4. Last, but not least, on some occasions, when deleting VMs, their cloud-init drives would get "left behind" on the CEPH storage, but I've only noticed this when deleting 20-30 VMs at a time. For both points (3) and (4) I've been using Proxmox VE 7.4-16.

Thanks in advance!

floh8 · Aug 1, 2023

1. yes
2. more is better for ceph
3. i think hardware
4. this should not happen

sebyp · Aug 6, 2023

@floh8, thanks for your feedback, I'll make sure I keep everyone updated on my progress, in case anyone thinks of a similar setup.

maomaocake · Aug 6, 2023

do look into using ssds to use as WAL/db devices, I got about double the IOPS performance with mine. do note that you will need to provision the OSDs manually since proxmox's UI cannot put wal on partitions.

ceph scales horizontally very very well so also do consider more smaller nodes vs less big nodes

sebyp · Aug 6, 2023

@maomaocake , thanks for these great tips. From what I've been reading about WAL/db devices, they make sense only if they are faster than "actual storage" OSDs, which is not my case.

This could make a good case for optimizing storage space.

sebyp · Sep 10, 2023

Hello everyone,

A quick update of my previous post: it seems that the Dell supplier has pleasantly surprised us with something between 7-9 servers that we could use given the current budget.

Now, one obvious downside of Ceph (that I didn't take into account as much as I should have) is the production recommendation of 3/2 (2 minimum replicas, maximum 3), which effectively allows for 1/3 of the raw storage to be used [and I haven't factored in yet the default near-full (85%) and full (95%) thresholds of Ceph].

Business came back asking for "a tad more storage, if possible", which lead me to explore "the old" centralized storage architecture. I'm considering deploying a TrueNAS Enterprise X-Series dual controller for this the will present one or more NFS shares to the Proxmox nodes to be used as centralized storage.

To everyone's best knowledge, is there a big performance impact between NFS and Ceph?

Thx!

LnxBil · Sep 10, 2023

sebyp said:
Business came back asking for "a tad more storage, if possible"

Isn't that always the case?

sebyp said:
To everyone's best knowledge, is there a big performance impact between NFS and Ceph?

That depends on many things (spindels, networking, ...) , yet if you read from your local node, nothing beats local. You cannot compare both directly, NFS is a file based storage, Ceph (not with rados) is a block based storage. I recommend splitting "need-for-speed" data and "other" data and move it to NFS. You will only know how fast it is when you run fio as a benchmark on both.

sebyp · Sep 10, 2023

LnxBil said:
Isn't that always the case?

Spot on!

)

LnxBil said:
That depends on many things (spindels, networking, ...) , yet if you read from your local node, nothing beats local. You cannot compare both directly, NFS is a file based storage, Ceph (not with rados) is a block based storage. I recommend splitting "need-for-speed" data and "other" data and move it to NFS. You will only know how fast it is when you run fio as a benchmark on both.

That's correct, one other idea that we've explored is using 3 nodes for a dedicated Ceph cluster and use the remaining 4 ones for compute-only workloads. Although, from what I've researched, a 3 node Ceph cluster is not that good for production workloads. Any suggestion here would be golden.

LE: one additional thing that we might consider is using HDDs for Ceph pools and backing them up with dedicated SSDs for WAL and DB functions.

LnxBil · Sep 10, 2023

I've built all PVE related stuff including clusters with a dedicated enterprise storage because that was what was present at that time and we do not have the requirement for the one goal, only CEPH can fullfill: dynamic growing by adding more nodes. That beeing said, yes 3 nodes is the bare minimum and the most wastful one you can build. The more nodes you have, the better the "storage-waste-ratio", yet also less optimal "data-locality", which may lead to fewer IOPS.

Are you planning on having a dedicated CEPH cluster and a dedicated PVE cluster or would it be a mixed-use cluster?

sebyp · Sep 10, 2023

LnxBil said:
Are you planning on having a dedicated CEPH cluster and a dedicated PVE cluster or would it be a mixed-use cluster?

Initially we thought of mixed-use cluster. Also, see my comment about HDD for pools and SSDs for WAL and DB. Would that be a good idea?

LnxBil · Sep 10, 2023

sebyp said:
Initially we thought of mixed-use cluster.

Yes, I know. You then explained that you would have a 3-node ceph and additional compute nodes. My question was targeted at that if you plan on having pure CEPH nodes and the rest PVE, or all pve.

sebyp said:
Also, see my comment about HDD for pools and SSDs for WAL and DB. Would that be a good idea?

Sure, yet I would try to split it: a fast pool consisting of SSD-only and one for the harddisks. So you have two speeds, not just one. Maybe also try to have NVMe for all the latency related stuff if you still have money to spend.

Search

Search

Proxmox small/medium Hyper-Converged cluster questions

sebyp

Member

floh8

Renowned Member

sebyp

Member

maomaocake

Member

sebyp

Member

sebyp

Member

LnxBil

Distinguished Member

sebyp

Member

LnxBil

Distinguished Member

sebyp

Member

LnxBil

Distinguished Member

We value your privacy