Suggestions for Best Proxmox Scalable High Availability Setup and Configuration

sektor · Nov 26, 2019

I am planning on having 3 hosts with 96GB of ram each 2 240GB ssd's, 1 will be dedicated to the OS and the other will be dedicated to Journal and then a single 10TB HDD in each of the nodes, I was going to go with 4 2TB but was informed that to chasis limitations I was not able to do that.

All 3 servers will be configured exactly the same with the same hardware a 10gb nic connected to a 10gb switch. I have read and I am reading through https://pve.proxmox.com/pve-docs/chapter-pveceph.html#pve_ceph_install_wizard, but decided to post to explain my setup and see if there are any suggestions anyone had. I plan to make probably 2 5TB partitions when I build the osd's and the pools one will be dedicated for the vm's and the other will be CEPH FS for end user data, etc. I also plan to do a separate network dedicated to CEPH traffic.

LnxBil · Nov 30, 2019

sektor said:
I was not able to do that.

You will not have a performant system at all with only 1 data disk and one OSD journal. Nothing else to add, really. I'd buy hardware that can house at least the mirrored os, mirrored osd and at least 4 disks.

akxx · Nov 30, 2019

@sektor: i am nearly in your situation ... and had already some learning ;-)...

i do not recomment 1 HDD... (moving-disk-from-nfs-to-ceph-hangs)
ceph is not made for these small systems ... so you really need to avoid bottlenecks ... by having proper sized the system or just moderate loading the system.

i also started with 1 HDD .. and will switching now to 3 smaller in each host ... i will see if this works out
... official recommendation is at least 4 disks

hm, what do mean by "high availability"? should the system stay alive, if a "component" is not working (by accident or planned)?
for this have a critical look at single-point-of-failures in your design ...

here some things i took in account:
3 host is great (if failure domain is "host" ... proxmox default config)
... so this means, that if a host dies or is under maintenance ... your cluster is fine.

failure domain is "host" implies for me, that i need to take just "per host" internally care for "availability" until data reached a consistent state. other parts o "high" availablity is done in the cluster:

what does this mean for the host?
CPU: just enough ;-) (if you use ceph, recommendation is: 1 core per (OSD(ie. disk) just for ceph)
RAM need to be errorfree available (ie. ECC), AND enough ;-) (if you use ceph, 1G ram per 1T HDD, just for ceph)
For HDDs: avoid local caches (ie. in HDDs or controllers))
boot drive: 1 is enough, (well here i installed ZFS single disk ... so i can more easy switch to a software raid1 .. if needed)(again some more ram)
ceph journal(DB/WAL): 1 is enough, sizing :10% of data disks
ceph data: ~~1 is enough~~ (OHOH ... here comes in the "issue": ceph is expecting very bottleneck-free hardware ... recommendation is min 4 spindels), all the same is best!
disk controller: only HBAs or direct connected to MB (no RAID controller ... even not in JBOD mode ... some controllers have a so called IT mode, which should work)
network: ~~1 is enough~~ ... well i not see this .. from host perspective it may be ok .. but from switch perspective not
... if you want to have "high availability" you need at least two switches
.. hm something in between is the "full mesh config" proxmox suggests for small configs without 10g switch
... saying this, add ... 2 ports 10G for ceph private ... for 2 ports for client ... 2 ports for PVE cluster ... 2 ports for migration network .. 2 ports for backup .... wait/wait/wait here i am not sure (;-)
switch: proxmox offers here a wide variety of configs .. i mean, you can config mesh networks ... some sort of "just a link" .. and all sorts of aggregation (ie. lacp)

my current "state of misunderstanding" of all these requirements, wishes and miracles ;-) is following config:
hosts: 3 x (1x8core, 64 eccRAM (128 possible), 10 sata ports 6G)
disks: 1T NVMe(journal) , 100G SSD (boot, swap), 3 x 2T HDD(data)
network: 2 ports 10G (meshed), 4 ports 1G (lacp, 2 switches)

coming to your config ...my opinion:
host will work (what cpu?)
local disk will work, ceph data HDD only if you throttle the load
network is not high avail? or meshed?

if it comes to configuration: here i do not have a lot experience ...
boot disk and swap i put on 1 ssd, but ZFS ... so i could switch to a raid 1 later (old thinking ;-)
which network (ceph privat/public, PVE cluster, pve migration, client) on which line???
(perhaps someone of PVE team can advice a good HCI network minimal-maximal config
i put ceph private/public on 10G(mtu9000), all others on 4 line lacp aggregated but in different vlans (migration:mtu9000)
currently i have just 1 ceph pool (3/2 replicated) ... so i like your idea having a 2nd pool for File based things

what do you think sektor? (or others here in the forum)

What is a good small 3 node setup/config? What solution do you found?

sektor · Dec 19, 2019

Ok here is the deal the server chassis is a dell c6220 6 drive version, it was a 3 drive version so now I have the capability of adding 3 more drives for each server.

The processor is Dual intel Xeon E5-2670 2.6 ghz octa cores for a total of 16 cores and in regards to the nic each server will have a 10 gb nic connected to a 10gb switch which will handle all the ceph traffic between the servers.

The other nics I planned on bonding so that way I have 2gb traffic for the vm's, the config will be the same on all 3 hosts and no raid card is installed.

My thoughts were a baseline config and expanding from there, also in regards to the memory it is ECC memory and that is upgradeable.

At the time I wasn't aware they were limiting meet to a 3 drive chassis, but that has since been corrected.

What I was thinking about doing is adding 2 more ssd's 1 to raid 1 the os using zfs and 1 for the 10tb osd that will be my dedicated data pool and 1 ssd for a 2 tb hdd that I will dedicate to my vm hdd images.

sektor · Dec 27, 2019

sektor said:
Ok here is the deal the server chassis is a dell c6220 6 drive version, it was a 3 drive version so now I have the capability of adding 3 more drives for each server.

The processor is Dual intel Xeon E5-2670 2.6 ghz octa cores for a total of 16 cores and in regards to the nic each server will have a 10 gb nic connected to a 10gb switch which will handle all the ceph traffic between the servers.

The other nics I planned on bonding so that way I have 2gb traffic for the vm's, the config will be the same on all 3 hosts and no raid card is installed.

My thoughts were a baseline config and expanding from there, also in regards to the memory it is ECC memory and that is upgradeable.

At the time I wasn't aware they were limiting meet to a 3 drive chassis, but that has since been corrected.

What I was thinking about doing is adding 2 more ssd's 1 to raid 1 the os using zfs and 1 for the 10tb osd that will be my dedicated data pool and 1 ssd for a 2 tb hdd that I will dedicate to my vm hdd images.

This is the config I decided to go with.

sektor · Dec 30, 2019

Any thoughts?

sektor said:
This is the config I decided to go with.

Search

Search

Suggestions for Best Proxmox Scalable High Availability Setup and Configuration

sektor

Member

LnxBil

Distinguished Member

akxx

Active Member

sektor

Member

sektor

Member

sektor

Member

We value your privacy