CEPH cluster analysis and improvement

unsichtbarre

New Member
Oct 1, 2024
13
3
3
Howdy,

I have a CEPH cluster built on a former vSAN cluster. Right now, there are three nodes, although I do have three more identical servers available. CEPH is slow and I know that I am not getting any new hardware, so I would like to make it run as well as possible.

Hosts are BL460G9 with SB and P420i controller. 12 disks, with one used for boot and 11 available to CEPH OSDs. Most of the disks are Samsung 870 4TB EVO with one 2 1TB Intel write-intensive SSDs per node (one used as boot). The P420i is set to "hba mode" and ssacli won't let me change the cache settings on the disks because it considered the array "unconfigured"

The CEPH network is 10GbE (public and cluster)

Here's the data:
Code:
root@pve103:~# ceph osd df tree
ID  CLASS  WEIGHT     REWEIGHT  SIZE     RAW USE  DATA     OMAP     META     AVAIL    %USE  VAR   PGS  STATUS  TYPE NAME  
-1         112.43541         -  112 TiB  1.4 TiB  1.3 TiB  511 KiB   45 GiB  111 TiB  1.22  1.00    -          root default
-3          37.47847         -   37 TiB  469 GiB  454 GiB  166 KiB   15 GiB   37 TiB  1.22  1.00    -              host pve101
 0    ssd    1.09160   1.00000  1.1 TiB   15 GiB   14 GiB   10 KiB  1.3 GiB  1.1 TiB  1.36  1.12    1      up          osd.0
 1    ssd    3.63869   1.00000  3.6 TiB   44 GiB   42 GiB   21 KiB  1.4 GiB  3.6 TiB  1.18  0.96    3      up          osd.1
 2    ssd    3.63869   1.00000  3.6 TiB   44 GiB   42 GiB   11 KiB  1.6 GiB  3.6 TiB  1.18  0.97    3      up          osd.2
 3    ssd    3.63869   1.00000  3.6 TiB   44 GiB   42 GiB   15 KiB  1.1 GiB  3.6 TiB  1.17  0.96    3      up          osd.3
 4    ssd    3.63869   1.00000  3.6 TiB   72 GiB   70 GiB   11 KiB  1.4 GiB  3.6 TiB  1.93  1.58    5      up          osd.4
 5    ssd    3.63869   1.00000  3.6 TiB   59 GiB   57 GiB   18 KiB  1.4 GiB  3.6 TiB  1.58  1.29    4      up          osd.5
 6    ssd    3.63869   1.00000  3.6 TiB   44 GiB   43 GiB   16 KiB  1.8 GiB  3.6 TiB  1.19  0.97    3      up          osd.6
 7    ssd    3.63869   1.00000  3.6 TiB   15 GiB   14 GiB   10 KiB  987 MiB  3.6 TiB  0.41  0.33    1      up          osd.7
 8    ssd    3.63869   1.00000  3.6 TiB   44 GiB   43 GiB   17 KiB  1.7 GiB  3.6 TiB  1.19  0.98    3      up          osd.8
 9    ssd    3.63869   1.00000  3.6 TiB   44 GiB   43 GiB   21 KiB  1.6 GiB  3.6 TiB  1.18  0.97    4      up          osd.9
10    ssd    3.63869   1.00000  3.6 TiB   44 GiB   43 GiB   16 KiB  940 MiB  3.6 TiB  1.17  0.96    3      up          osd.10
-5          37.47847         -   37 TiB  469 GiB  454 GiB  194 KiB   15 GiB   37 TiB  1.22  1.00    -              host pve102
12    ssd    1.09160   1.00000  1.1 TiB   44 GiB   42 GiB   14 KiB  1.8 GiB  1.0 TiB  3.94  3.23    3      up          osd.12
13    ssd    3.63869   1.00000  3.6 TiB   44 GiB   43 GiB   26 KiB  1.0 GiB  3.6 TiB  1.17  0.96    4      up          osd.13
14    ssd    3.63869   1.00000  3.6 TiB   30 GiB   28 GiB   17 KiB  1.7 GiB  3.6 TiB  0.80  0.65    2      up          osd.14
15    ssd    3.63869   1.00000  3.6 TiB   44 GiB   43 GiB   19 KiB  1.1 GiB  3.6 TiB  1.18  0.96    3      up          osd.15
16    ssd    3.63869   1.00000  3.6 TiB   30 GiB   28 GiB    9 KiB  1.3 GiB  3.6 TiB  0.80  0.65    2      up          osd.16
17    ssd    3.63869   1.00000  3.6 TiB   58 GiB   57 GiB   12 KiB  1.8 GiB  3.6 TiB  1.57  1.28    4      up          osd.17
18    ssd    3.63869   1.00000  3.6 TiB   72 GiB   71 GiB   24 KiB  1.1 GiB  3.6 TiB  1.94  1.58    5      up          osd.18
19    ssd    3.63869   1.00000  3.6 TiB   16 GiB   14 GiB   14 KiB  1.6 GiB  3.6 TiB  0.42  0.35    1      up          osd.19
20    ssd    3.63869   1.00000  3.6 TiB   44 GiB   42 GiB   20 KiB  1.4 GiB  3.6 TiB  1.17  0.96    3      up          osd.20
21    ssd    3.63869   1.00000  3.6 TiB   15 GiB   14 GiB   13 KiB  946 MiB  3.6 TiB  0.40  0.33    1      up          osd.21
22    ssd    3.63869   1.00000  3.6 TiB   73 GiB   72 GiB   26 KiB  1.5 GiB  3.6 TiB  1.97  1.61    5      up          osd.22
-7          37.47847         -   37 TiB  469 GiB  454 GiB  151 KiB   15 GiB   37 TiB  1.22  1.00    -              host pve103
24    ssd    1.09160   1.00000  1.1 TiB   29 GiB   28 GiB   11 KiB  937 MiB  1.1 TiB  2.62  2.14    2      up          osd.24
25    ssd    3.63869   1.00000  3.6 TiB   30 GiB   28 GiB   13 KiB  1.5 GiB  3.6 TiB  0.80  0.66    2      up          osd.25
26    ssd    3.63869   1.00000  3.6 TiB   44 GiB   42 GiB   10 KiB  1.6 GiB  3.6 TiB  1.17  0.96    3      up          osd.26
27    ssd    3.63869   1.00000  3.6 TiB   29 GiB   28 GiB   14 KiB  1.1 GiB  3.6 TiB  0.79  0.65    2      up          osd.27
28    ssd    3.63869   1.00000  3.6 TiB   16 GiB   14 GiB   13 KiB  1.5 GiB  3.6 TiB  0.42  0.34    1      up          osd.28
29    ssd    3.63869   1.00000  3.6 TiB   15 GiB   14 GiB    6 KiB  874 MiB  3.6 TiB  0.41  0.33    1      up          osd.29
30    ssd    3.63869   1.00000  3.6 TiB   72 GiB   71 GiB   19 KiB  1.5 GiB  3.6 TiB  1.94  1.59    5      up          osd.30
31    ssd    3.63869   1.00000  3.6 TiB   73 GiB   71 GiB   22 KiB  1.6 GiB  3.6 TiB  1.96  1.60    5      up          osd.31
32    ssd    3.63869   1.00000  3.6 TiB   29 GiB   28 GiB   14 KiB  1.2 GiB  3.6 TiB  0.79  0.65    2      up          osd.32
33    ssd    3.63869   1.00000  3.6 TiB  115 GiB  113 GiB   22 KiB  1.7 GiB  3.5 TiB  3.09  2.53    8      up          osd.33
34    ssd    3.63869   1.00000  3.6 TiB   15 GiB   14 GiB    7 KiB  1.2 GiB  3.6 TiB  0.42  0.34    2      up          osd.34
                         TOTAL  112 TiB  1.4 TiB  1.3 TiB  526 KiB   45 GiB  111 TiB  1.22

Code:
root@pve103:~# cat /etc/pve/ceph.conf
[global]
        auth_client_required = cephx
        auth_cluster_required = cephx
        auth_service_required = cephx
        cluster_network = 10.0.201.1/16
        fsid = 1101c540-2741-48d9-b64d-189700d0b84f
        mon_allow_pool_delete = true
        mon_host = 10.0.203.1 10.0.202.1 10.0.201.1
        ms_bind_ipv4 = true
        ms_bind_ipv6 = false
        osd_pool_default_min_size = 2
        osd_pool_default_size = 3
        public_network = 10.0.201.1/16

[client]
        keyring = /etc/pve/priv/$cluster.$name.keyring

[client.crash]
        keyring = /etc/pve/ceph/$cluster.$name.keyring

[mon.pve101]
        public_addr = 10.0.201.1

[mon.pve102]
        public_addr = 10.0.202.1

[mon.pve103]
        public_addr = 10.0.203.1
Code:
iface bond1 inet manual
        bond-slaves ens2f0 ens2f1
        bond-miimon 100
        bond-mode active-backup
        bond-primary ens2f0
        mtu 9000
#iSCSI Bond

auto bond1.3260
iface bond1.3260 inet static
        address 10.0.203.1/16
        mtu 9000
#iSCSI 1

Code:
root@pve103:~# cat /sys/block/sd*/queue/write_cache
write through
write through
write back
write back
write back
write back
write back
write back
write back
write back
write back
write back
write through
Code:
root@pve103:~# cat /sys/block/sd*/queue/scheduler
none [mq-deadline]
none [mq-deadline]
none [mq-deadline]
none [mq-deadline]
none [mq-deadline]
none [mq-deadline]
none [mq-deadline]
none [mq-deadline]
none [mq-deadline]
none [mq-deadline]
none [mq-deadline]
none [mq-deadline]
none [mq-deadline]

THX in ADV,
-John
 
Last edited:
Samsung evo's are absolutely terrible for ceph. Please see the benchmark. These are for 850's but I assume the 870's arent much better.

You are better off using something different.
 

Attachments

  • Proxmox-VE_Ceph-Benchmark-201802.pdf
    272.2 KB · Views: 4
  • Like
Reactions: unsichtbarre
Very useful info. But like I said, I am not getting any new disks soon, so looking to get what I can out of what I have.

THX,
-JB
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!