Considering Proxmox for hosting - What are your thoughts

Popupgbg

New Member
Nov 16, 2021
2
0
1
51
Hi!

I'm considering Proxmox for a new hosting environment. As we are going to leave a 12 node cluster with Virtuozzo. The intention is to have the hi or mid version of support agreement with Proxmox. We are going to start with 4 new nodes and the intention is to set up a cluster for HA/failover. Below you could see the specification for each server.

What is your opinion and how would you set it up?


Hardware spec for each node

2x AMD EPYC 7513 32core
16x32GB RAM = 512GB
2x512GB NVME (for OS)
6x1,92TB NVME PCIe 4.0
1x Nvidia Quadra 2200 1280 CUDA
2x2 RJ45 Intel X540
1x2 SPF+ Intel X710


Except for having a support agreement we want to have 2-3 persons in our network that are very skilled at Proxmox to help us out and work on a freelance basis with the “behind the scene work”, upgrades and maintenance of the system and so on.
 
What are you doing with your graphics in specific?

Disk layout seems a ceph setup worthy. Absolutley doable maybe network can be a bottleneck since you want your cluster communication far away from production traffic.
 
I'd suggest going with 5 nodes rather than 4, as you can only loose a single node before things get dangerous, having 5 nodes allows you to loose 2 nodes while still having a stable cluster.

Your NICs are only capable of 10G, considerung your powerful NVMe drives, you'd want a more powerful network connection to actually access the available performance of your disks, which means 25G or even 100G NICs for ceph (storage) traffic.

I don't know the specs of your 12 node cluster, but replacing it with only 4 (or 5) nodes sounds like you plan to possibly overcommit the new hosts.
Keep in mind that Ceph requires some resources on it's own for the storage services, so you cannot "squeeze" the machine by overloading it with VMs, that will backfire very badly.
Proxmox and Ceph docs contain rules of thumb about how many resources to plan for ceph services, afaik its ~1 thread per OSD, possible more when using NVMe.
 
Thanks for your input.

We will not replace 12 nodes with 4 nodes, this is just a start and the intention is to add 1-3 more nodes during next year.

The former management did not take a wise decission when it came to the hardware spec. The 12 nodes have a total of 192 cores and 3073GB of RAM. Now we start with 4 nodes that have 256 cores and 2000GB of RAM.
When it comes to the NIC´s it has to do with that we do not want to replace 4pc of 48-port 10Gbps Cisco switches that is only 2 years old.. I have eariler worked a lot with NIC-teaming both on Hyper-V and on OpenStack system to gain booth redundancy and performance and I asume that NIC-teaming is working in Proxmox as well?

The graphics has to do with that we have a lot of customers running RDS with additional graphics intensive applications and the card is there to be able to either be able to address the GPU to a specific VM or if we is going with a platform that is supporting vGPU.

As for now we are evaluating Proxmox, Sunlight.io and OpenNebula
 
Last edited:
I asume that NIC-teaming is working in Proxmox as well
Yes, but be aware that you can only use the bandwidth of a single 10G port *per connection*, even if you are bonding multiple 10G ports using LACP.
Considering that you are doing many VMs (hosting), your parallel performance is more important I think, which can be improved by using the right LACP balancing algorithm (layer 3+4).
So, using some Dual or Quad 10G NICs, you should be able to access at least some of your NVMes' potential while keeping your 10G Switches.

The 12 nodes have a total of 192 cores and 3073GB of RAM. Now we start with 4 nodes that have 256 cores and 2000GB of RAM.
Alright, no overcommitment in sight. :D

with a platform that is supporting vGPU.
Not sure about the nvidia stuff, but I saw interesting topics about the AMD FirePros in the forums, where you could split a compatible GPU into multiple virtual GPUs that can be passed through into a VM.
https://pve.proxmox.com/wiki/MxGPU_with_AMD_S7150_under_Proxmox_VE_5.x

this is just a start and the intention is to add 1-3 more nodes during next year.
I'd still strongly suggest starting with 5 nodes for higher reliability.
 
2x2 RJ45 Intel X540
1x2 SPF+ Intel X710

We run a similar node configuration other than the networking. We run 40gbps ethernet for the Ceph traffic, but I'd run 100gbps these days (we built this cluster a couple of years ago). Moving Ceph away from 10GbE makes a big different. We did some benchmarking and shared the results here :

https://forum.proxmox.com/threads/proxmox-ve-ceph-benchmark-2018-02.41761/page-8#post-273640

And yes, Nic teaming works as expected. We run active/standby bonds for everything on all cluster nodes.

Just a comment on Proxmox itself, we ran OnApp for many years and are so happy we moved to Proxmox. There are a couple of quirks (as there are with any platform) but running PVE on Ceph storage has been rock solid. 24x7 support is the only things that's really missing.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!