New to Proxmox/Ceph - performance question

Jan 28, 2021
9
0
1
41
I am new to Proxmox/Ceph and looking into some performance issues.
5 OSD nodes and 3 Monitor nodes
Cluster vlan - 10.111.40.0/24

OSD nod
CPU - AMD EPYC 2144G (64 Cores)
Memory - 256GB
Storage - Dell 3.2TB NVME x 10
Network - 40 GB for Ceph Cluster
Network - 1GB for Proxmox mgmt


MON nod
CPU - Intel Xeon(R) E-2144G (8 cores)
Memory - 16GB
Storage - 60GB SSD x 1
Network - 20 GB for Ceph Cluster
Network - 1 GB for Proxmox mgmt

Jumbo Frames are enabled.

I created a Windows VM to copy a file with ceph pool and getting 30-40 mbps.

is this a normal performance?

Ceph config:
Code:
[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 10.111.40.111/24
     fsid = 1be7458c-9741-40f9-a222-64845d6c68ae
     mon_allow_pool_delete = true
     mon_host = 10.111.40.111 10.111.40.112 10.111.40.113
     osd_pool_default_min_size = 2
     osd_pool_default_size = 5
     public_network = 10.111.40.111/24

[client]
     keyring = /etc/pve/priv/$cluster.$name.keyring

Crush map:
Code:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class ssd
device 1 osd.1 class ssd
device 2 osd.2 class ssd
device 3 osd.3 class ssd
device 4 osd.4 class ssd
device 5 osd.5 class ssd
device 6 osd.6 class ssd
device 7 osd.7 class ssd
device 8 osd.8 class ssd
device 9 osd.9 class ssd
device 10 osd.10 class ssd
device 11 osd.11 class ssd
device 12 osd.12 class ssd
device 13 osd.13 class ssd
device 14 osd.14 class ssd
device 15 osd.15 class ssd
device 16 osd.16 class ssd
device 17 osd.17 class ssd
device 18 osd.18 class ssd
device 19 osd.19 class ssd
device 20 osd.20 class ssd
device 21 osd.21 class ssd
device 22 osd.22 class ssd
device 23 osd.23 class ssd
device 24 osd.24 class ssd
device 25 osd.25 class ssd
device 26 osd.26 class ssd
device 27 osd.27 class ssd
device 28 osd.28 class ssd
device 29 osd.29 class ssd
device 30 osd.30 class ssd
device 31 osd.31 class ssd
device 32 osd.32 class ssd
device 33 osd.33 class ssd
device 34 osd.34 class ssd
device 35 osd.35 class ssd
device 36 osd.36 class ssd
device 37 osd.37 class ssd
device 38 osd.38 class ssd
device 39 osd.39 class ssd
device 40 osd.40 class ssd
device 41 osd.41 class ssd
device 42 osd.42 class ssd
device 43 osd.43 class ssd
device 44 osd.44 class ssd
device 45 osd.45 class ssd
device 46 osd.46 class ssd
device 47 osd.47 class ssd
device 48 osd.48 class ssd
device 49 osd.49 class ssd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host OSD-NOD-01 {
    id -3        # do not change unnecessarily
    id -4 class ssd        # do not change unnecessarily
    # weight 29.110
    alg straw2
    hash 0    # rjenkins1
    item osd.0 weight 2.911
    item osd.1 weight 2.911
    item osd.2 weight 2.911
    item osd.3 weight 2.911
    item osd.4 weight 2.911
    item osd.5 weight 2.911
    item osd.6 weight 2.911
    item osd.7 weight 2.911
    item osd.8 weight 2.911
    item osd.9 weight 2.911
}
host OSD-NOD-02 {
    id -5        # do not change unnecessarily
    id -6 class ssd        # do not change unnecessarily
    # weight 29.110
    alg straw2
    hash 0    # rjenkins1
    item osd.10 weight 2.911
    item osd.11 weight 2.911
    item osd.12 weight 2.911
    item osd.13 weight 2.911
    item osd.14 weight 2.911
    item osd.15 weight 2.911
    item osd.16 weight 2.911
    item osd.17 weight 2.911
    item osd.18 weight 2.911
    item osd.19 weight 2.911
}
host OSD-NOD-03 {
    id -7        # do not change unnecessarily
    id -8 class ssd        # do not change unnecessarily
    # weight 29.110
    alg straw2
    hash 0    # rjenkins1
    item osd.20 weight 2.911
    item osd.21 weight 2.911
    item osd.22 weight 2.911
    item osd.23 weight 2.911
    item osd.24 weight 2.911
    item osd.25 weight 2.911
    item osd.26 weight 2.911
    item osd.27 weight 2.911
    item osd.28 weight 2.911
    item osd.29 weight 2.911
}
host OSD-NOD-04 {
    id -9        # do not change unnecessarily
    id -10 class ssd        # do not change unnecessarily
    # weight 29.110
    alg straw2
    hash 0    # rjenkins1
    item osd.30 weight 2.911
    item osd.31 weight 2.911
    item osd.32 weight 2.911
    item osd.33 weight 2.911
    item osd.34 weight 2.911
    item osd.35 weight 2.911
    item osd.36 weight 2.911
    item osd.37 weight 2.911
    item osd.38 weight 2.911
    item osd.39 weight 2.911
}
host OSD-NOD-05 {
    id -11        # do not change unnecessarily
    id -12 class ssd        # do not change unnecessarily
    # weight 29.110
    alg straw2
    hash 0    # rjenkins1
    item osd.40 weight 2.911
    item osd.41 weight 2.911
    item osd.42 weight 2.911
    item osd.43 weight 2.911
    item osd.44 weight 2.911
    item osd.45 weight 2.911
    item osd.46 weight 2.911
    item osd.47 weight 2.911
    item osd.48 weight 2.911
    item osd.49 weight 2.911
}
root default {
    id -1        # do not change unnecessarily
    id -2 class ssd        # do not change unnecessarily
    # weight 145.550
    alg straw2
    hash 0    # rjenkins1
    item OSD-NOD-01 weight 29.110
    item OSD-NOD-02 weight 29.110
    item OSD-NOD-03 weight 29.110
    item OSD-NOD-04 weight 29.110
    item OSD-NOD-05 weight 29.110
}

# rules
rule replicated_rule {
    id 0
    type replicated
    min_size 1
    max_size 10
    step take default
    step chooseleaf firstn 0 type host
    step emit
}

# end crush map
 
CPU - AMD EPYC 2144G (64 Cores)
This seems wrong :)

Storage - Dell 3.2TB NVME x 10
Network - 40 GB for Ceph Cluster
This not a good combination. For NVMe you should use 100GbE, the latency is not really the Problem but the throughput.

osd_pool_default_min_size = 2
osd_pool_default_size = 5
You have deployed your Pools with 5 replicas? Why? Normal is size of 3 and min size of 2, thats really enough redundancy. Maybe you should take a look at Erasure Coding.

What about your Network? Do you use "native QSFP" or a Breakout Cable? What Switches do you use? Are they configured with Jumbo Frames as well? What Bonding Hash did you set?

is this a normal performance?
Hell no, thats not the expected Performance for this Hardware. My SSD Only Storage are faster then yours :)

What about your Pools? Are the PG Size correct?
 
I was using 4 x 10GB SFP+ bond with 802.3ad. But i guess i need to change it to 100GB QSFP28. Hopefully, i get better result with with it. I will make sure my pool are 3/2.
thanks,
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!