Hyper-Converged cluster on dissimilar hardware

baggins · Jul 3, 2021

Hi, We have the following servers mentioned below, they can not be changed as we are using MaaS and they are configured this way. I realize this is not optimal but would prefer to spend time on figuring out what can work with this hardware layout and how can we leverage what we have.

I am not new to proxmox and previously we used server 4,5 as FreeNAS and ran ZFS and ran a standard cluster. I would like to set up a Hyper-Converged cluster and rebuild the lot. Im looking for the best architecture and formula. (i do not need a migration plan the hardware bare and clean) we will run between 60 and 120 VM's with a combination of web apps and time series databases. We do not have a lot of peaks and valleys in performance but do tend to have a steady medium volume of data in (write) with peak reads when people access dashboards.

All of the servers have dual 10G network bonded with 3 vlans, pulbic, cluster/storage, network traffic. The OS is installed on the first disk in each machine ~100G, installed with maxvz 0. I have tried to converting all free space on disk 1 to ceph and have 13 OSD, with an OSD per disk. Below provides the output of the system the way it is (which I'm sure is wrong as its all default. I provide this not because i think this is the right way to do this but it gives you an idea of the way not to do it.... based on the degraded state.

Thanks for all the help in advance. its my first time with ceph and hyper converged clusters.

root@pve1:~# ceph -s
cluster:
id: b4e4e110-677a-44b0-b904-4b5c25305212
health: HEALTH_WARN
Degraded data redundancy: 89/11523 objects degraded (0.772%), 3 pgs degraded, 3 pgs undersized

services:
mon: 4 daemons, quorum pve1,pve5,pve3,pve4 (age 15m)
mgr: pve4(active, since 2h), standbys: pve5
mds: 4 up:standby
osd: 14 osds: 14 up (since 15m), 14 in (since 3h); 1 remapped pgs

data:
pools: 2 pools, 112 pgs
objects: 3.84k objects, 15 GiB
usage: 60 GiB used, 30 TiB / 30 TiB avail
pgs: 89/11523 objects degraded (0.772%)
37/11523 objects misplaced (0.321%)
108 active+clean
3 active+undersized+degraded
1 active+clean+remapped

root@pve1:~# ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
0 ssd 0.43619 1.00000 447 GiB 3.9 GiB 2.9 GiB 283 KiB 1024 MiB 443 GiB 0.88 4.50 20 up
1 ssd 0.33409 1.00000 342 GiB 3.5 GiB 2.5 GiB 221 KiB 1024 MiB 339 GiB 1.03 5.28 19 up
2 ssd 0.43619 1.00000 447 GiB 4.4 GiB 3.4 GiB 373 KiB 1024 MiB 442 GiB 0.98 5.02 24 up
3 ssd 0.33409 1.00000 342 GiB 3.6 GiB 2.6 GiB 321 KiB 1024 MiB 339 GiB 1.05 5.40 19 up
4 ssd 0.43619 1.00000 447 GiB 4.2 GiB 3.2 GiB 364 KiB 1024 MiB 442 GiB 0.95 4.86 22 up
5 ssd 0.33409 1.00000 342 GiB 2.5 GiB 1.5 GiB 157 KiB 1024 MiB 340 GiB 0.73 3.76 11 up
6 ssd 3.39059 1.00000 3.4 TiB 4.2 GiB 3.2 GiB 250 KiB 1024 MiB 3.4 TiB 0.12 0.62 24 up
7 ssd 3.49260 1.00000 3.5 TiB 4.7 GiB 3.7 GiB 267 KiB 1024 MiB 3.5 TiB 0.13 0.67 27 up
8 ssd 3.49260 1.00000 3.5 TiB 4.9 GiB 3.9 GiB 286 KiB 1024 MiB 3.5 TiB 0.14 0.70 28 up
9 ssd 3.49260 1.00000 3.5 TiB 5.2 GiB 4.2 GiB 513 KiB 1023 MiB 3.5 TiB 0.15 0.75 30 up
10 ssd 3.39059 1.00000 3.4 TiB 5.9 GiB 4.9 GiB 260 KiB 1024 MiB 3.4 TiB 0.17 0.87 37 up
11 ssd 3.49260 1.00000 3.5 TiB 5.1 GiB 4.1 GiB 530 KiB 1023 MiB 3.5 TiB 0.14 0.73 29 up
12 ssd 3.49260 1.00000 3.5 TiB 3.8 GiB 2.8 GiB 258 KiB 1024 MiB 3.5 TiB 0.11 0.55 20 up
13 ssd 3.49260 1.00000 3.5 TiB 4.2 GiB 3.2 GiB 264 KiB 1024 MiB 3.5 TiB 0.12 0.60 23 up
TOTAL 30 TiB 60 GiB 46 GiB 4.3 MiB 14 GiB 30 TiB 0.20
MIN/MAX VAR: 0.55/5.40 STDDEV: 0.49

Server_1:

CPU: dual silver

mem: 384G

disk1: 450G SSD

disk2: 450G SSD

Server_2:

CPU: dual silver

mem: 384G

disk1: 450G SSD

disk2: 450G SSD

Server_3:

CPU: dual silver

mem: 384G

disk1: 450G SSD

disk2: 450G SSD

Server_4:

CPU: E3-1270V6

mem: 32G

disk1: 3.4T SSD

disk2: 3.4T SSD

disk3: 3.4T SSD

disk4: 3.4T SSD

Server_5:

CPU: E3-1270V6

mem: 32G

disk1: 3.4T SSD

disk2: 3.4T SSD

disk3: 3.4T SSD

disk4: 3.4T SSD

baggins · Jul 4, 2021

After running deep scrubs and enabling balancing all of the errors are now gone and the cluster is in good health. I'm still relatively sure this is not an OSB setup as I have read there may be better ways to organize buckets? but again my experience with ceph is hours old. I'm going to start pushing load through the system to see how it handles it.

Search

Search

Hyper-Converged cluster on dissimilar hardware

baggins

New Member

baggins

New Member