Proxmox small/medium Hyper-Converged cluster questions

sebyp

New Member
Mar 13, 2023
11
0
1
39
Bucharest, Romania
www.bit-sentinel.com
Hello everyone,

I'm reaching out for the community for some pointers regarding an educational project I got going on.

Description: I'm building a cyber security range, which is a virtual environment where students can practice various security skills, the most interesting for our topic being attack & defense. For short, this means that I need to be able to quickly spin up anything between 10 to 50 linked clones at a time. These clones have various different base templates, either Linux or Windows. The VMs themselves are short lived, tipically from a couple of hours to a couple of days.
The setup that I've imagined for this is the following:

Host configuration (3-5 hosts, depending on what price I get from my hardware vendor):
- CPU: 1 x AMD Epyc 7542 (32c/64t)
- RAM: 1TB
- storage:
- 2 x M2 RAID1 for OS
- 8 - 10 x 1 TB SSD SATA-3 drives in JBOD/HBA/IT mode for Ceph OSDs
- network:
- 2 x 10/25 Gbps NICs for Ceph replication
- 2 x 10/25 Gbps NICs for VM traffic

For networking gears I'm thinking of going with 2 x 10/25 Gbps switches, probably Cisco Nexus, but I'm willing to consider other vendors as well.
And now come the questions:

1. I'm thinking of building a hyper-converged CEPH cluster out of all of these boxes. From everyone's experience, is this feasible? Do consider that these are not "production" VMs, so a *slight* penalty in performance is understood and accepted.

2. Since budget is always a constraint, I'm in favour of more nodes (with smaller RAM and / or number of SSDs and/or smaller SSD capacity) rather than 3 big ones, both from CEPH's point of view and considering that the load will be distributed across multiple nodes. Is this a good idea?

3. Is there any limitation in the numbers of VMs I can provision at the same time? I've done some testing with a small 3 node cluster and occasionally I would get en error back `TASK ERROR: clone failed: cfs-lock 'storage-vm-storage' error: got lock request timeout`, but using Ansible's retry capabilities I was able to circumvent it. Nevertheless, I would like to know from everyone's experience if there are some limitations here.

4. Last, but not least, on some occasions, when deleting VMs, their cloud-init drives would get "left behind" on the CEPH storage, but I've only noticed this when deleting 20-30 VMs at a time. For both points (3) and (4) I've been using Proxmox VE 7.4-16.

Thanks in advance!
 
1. yes
2. more is better for ceph
3. i think hardware
4. this should not happen
 
do look into using ssds to use as WAL/db devices, I got about double the IOPS performance with mine. do note that you will need to provision the OSDs manually since proxmox's UI cannot put wal on partitions.

ceph scales horizontally very very well so also do consider more smaller nodes vs less big nodes
 
Hello everyone,

A quick update of my previous post: it seems that the Dell supplier has pleasantly surprised us with something between 7-9 servers that we could use given the current budget.

Now, one obvious downside of Ceph (that I didn't take into account as much as I should have) is the production recommendation of 3/2 (2 minimum replicas, maximum 3), which effectively allows for 1/3 of the raw storage to be used [and I haven't factored in yet the default near-full (85%) and full (95%) thresholds of Ceph].

Business came back asking for "a tad more storage, if possible", which lead me to explore "the old" centralized storage architecture. I'm considering deploying a TrueNAS Enterprise X-Series dual controller for this the will present one or more NFS shares to the Proxmox nodes to be used as centralized storage.

To everyone's best knowledge, is there a big performance impact between NFS and Ceph?

Thx!
 
Business came back asking for "a tad more storage, if possible"
Isn't that always the case? ;)

To everyone's best knowledge, is there a big performance impact between NFS and Ceph?
That depends on many things (spindels, networking, ...) , yet if you read from your local node, nothing beats local. You cannot compare both directly, NFS is a file based storage, Ceph (not with rados) is a block based storage. I recommend splitting "need-for-speed" data and "other" data and move it to NFS. You will only know how fast it is when you run fio as a benchmark on both.
 
Isn't that always the case? ;)
Spot on! :))

That depends on many things (spindels, networking, ...) , yet if you read from your local node, nothing beats local. You cannot compare both directly, NFS is a file based storage, Ceph (not with rados) is a block based storage. I recommend splitting "need-for-speed" data and "other" data and move it to NFS. You will only know how fast it is when you run fio as a benchmark on both.
That's correct, one other idea that we've explored is using 3 nodes for a dedicated Ceph cluster and use the remaining 4 ones for compute-only workloads. Although, from what I've researched, a 3 node Ceph cluster is not that good for production workloads. Any suggestion here would be golden.

LE: one additional thing that we might consider is using HDDs for Ceph pools and backing them up with dedicated SSDs for WAL and DB functions.
 
Last edited:
I've built all PVE related stuff including clusters with a dedicated enterprise storage because that was what was present at that time and we do not have the requirement for the one goal, only CEPH can fullfill: dynamic growing by adding more nodes. That beeing said, yes 3 nodes is the bare minimum and the most wastful one you can build. The more nodes you have, the better the "storage-waste-ratio", yet also less optimal "data-locality", which may lead to fewer IOPS.

Are you planning on having a dedicated CEPH cluster and a dedicated PVE cluster or would it be a mixed-use cluster?
 
Initially we thought of mixed-use cluster.
Yes, I know. You then explained that you would have a 3-node ceph and additional compute nodes. My question was targeted at that if you plan on having pure CEPH nodes and the rest PVE, or all pve.

Also, see my comment about HDD for pools and SSDs for WAL and DB. Would that be a good idea?
Sure, yet I would try to split it: a fast pool consisting of SSD-only and one for the harddisks. So you have two speeds, not just one. Maybe also try to have NVMe for all the latency related stuff if you still have money to spend.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!