slow Performance on Ceph in LXC Container

Ah_lead

New Member
Jul 11, 2024
1
0
1
Hello everyone,

I am currently experiencing performance issues with my Ceph storage setup inside LXC containers and would greatly appreciate any insights or suggestions you might have. Please note that I am still a newbie and might be making some unexpected beginner mistakes, so I would deeply appreciate your guidance.

Server Hardware:
  • 3 Servers (each with the following specifications):
    • 8 x 1TB SSDs (total 8TB SSD storage per server)
    • 72 x Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz (2 Sockets)
    • 10GbE Switch for networking
    • HDD Controller: HP Smart HBA
Cluster Configuration:
  • Cluster: 3 Nodes (Servers)
  • OSDs: 21 OSDs (7 per Node)
  • PG Count: 1024
  • Ceph Configuration: 3x Monitor and 3x Manager
  • Networks: Separate NICs for public and cluster network through 10GbE switch
iface eno49 inet manual
mtu 9000

iface eno50 inet manual
mtu 9000

auto vmbr0
iface vmbr0 inet static
address 192.168.42.60/24
gateway 192.168.42.1
bridge-ports eno49
bridge-stp off
bridge-fd 0
mtu 9000

auto vmbr1
iface vmbr1 inet static
address 192.168.123.60/24
bridge-ports eno50
bridge-stp off
bridge-fd 0
mtu 9000

source /etc/network/interfaces.d/*

/etc/ceph/ceph.conf
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
fsid = 68xxxxxx-xxxe-xxxc-8xxx-xxxx0720xxxx
mon_allow_pool_delete = true
mon_host = 192.168.42.60 192.168.42.61 192.168.42.62
ms_bind_ipv4 = true
ms_bind_ipv6 = false
public_network = 192.168.42.60/24
cluster_network = 192.168.123.0/24

[client]
keyring = /etc/pve/priv/$cluster.$name.keyring

[mds]
keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mon.vrt-ffm1]
public_addr = 192.168.42.60

[mon.vrt-ffm2]
public_addr = 192.168.42.61

[mon.vrt-ffm3]
public_addr = 192.168.42.62

Container Configuration on Each Node:
  • LXC Container: One on each node
  • Resources:
    • 12 cores per container
    • 36 GB RAM per container
    • 800 GB storage per container
  • Ceph Storage: Provided within the LXC containers
ceph tell osd.* bench
Node 1:
  • OSD.0: 1GB write, blocksize 4MB, 25.68 sec, 40MB/s
  • OSD.3: 1GB write, blocksize 4MB, 73.06 sec, 14.70MB/s
  • OSD.4: 1GB write, blocksize 4MB, 49.53 sec, 21.68MB/s
  • OSD.5: 1GB write, blocksize 4MB, 37.52 sec, 28.62MB/s
  • OSD.6: 1GB write, blocksize 4MB, 3.78 sec, 283.93MB/s
  • OSD.7: 1GB write, blocksize 4MB, 53.76 sec, 19.97MB/s
  • OSD.8: 1GB write, blocksize 4MB, 2.89 sec, 372.13MB/s
Node 2:
  • OSD.1: 1GB write, blocksize 4MB, 5.08 sec, 211.50MB/s
  • OSD.9: 1GB write, blocksize 4MB, 52.37 sec, 20.50MB/s
  • OSD.10: 1GB write, blocksize 4MB, 55.95 sec, 19.19MB/s
  • OSD.11: 1GB write, blocksize 4MB, 59.25 sec, 18.12MB/s
  • OSD.12: 1GB write, blocksize 4MB, 54.70 sec, 19.63MB/s
  • OSD.13: 1GB write, blocksize 4MB, 3.50 sec, 306.57MB/s
  • OSD.14: 1GB write, blocksize 4MB, 74.12 sec, 14.49MB/s
Node 3:
  • OSD.2: 1GB write, blocksize 4MB, 3.65 sec, 294.39MB/s
  • OSD.15: 1GB write, blocksize 4MB, 2.66 sec, 404.06MB/s
  • OSD.16: 1GB write, blocksize 4MB, 59.07 sec, 18.18MB/s
  • OSD.17: 1GB write, blocksize 4MB, 53.96 sec, 19.90MB/s
  • OSD.18: 1GB write, blocksize 4MB, 45.87 sec, 23.41MB/s
  • OSD.19: 1GB write, blocksize 4MB, 3.03 sec, 354.54MB/s
  • OSD.20: 1GB write, blocksize 4MB, 2.95 sec, 363.42MB/s

Rados Benchmark:
rados bench -p ceph 10 write --no-cleanup
Write: 4194304 bytes, 274.4 MB/sec, 68 IOPS, 0.22 average latency

rados bench -p ceph 10 seq
Sequential: 4194304 bytes, 1165.1 MB/sec, 291 IOPS, 0.05 average latency

rados bench -p ceph 10 rand
Random: 4194304 bytes, 1129.42 MB/sec, 307 IOPS, 0.05 average latency
FIO Benchmark:
fio --randrepeat=1 --ioengine=libaio –direct=1 --gtod_reduce=1 --name=test --filename=test --bs=1024k --iodepth=32 --size=150G --readwrite=randrw

IO Depth 32: 41.5 MiB/s write, 41.7 MiB/s read, 41 IOPS write, 41 IOPS read
Iperf3 Test:
Transfer: 1.15 GBytes Bitrate: 9.87 Gbits/sec Congestion Window: 1.55 MBytes
Issues Encountered:

the performance of my Ceph storage inside the LXC containers is inconsistent. Some OSDs exhibit very high performance (e.g., OSD.8, OSD.13, OSD.15), while others are much slower (e.g., OSD.3, OSD.14, OSD.16).

I have already configured the Ceph cluster with a pg_num of 1024 and the bdev_enable_discard option set to true. However, I am still experiencing slow write speeds and inconsistent performance across different OSDs.

Request for Advice:
What are the potential causes for the inconsistent performance among the OSDs?
Are there any additional Ceph or system-level optimizations that I can apply to improve the performance inside the LXC containers?
Could the current configuration of the cluster and containers be affecting the performance? If so, what changes would you recommend?
Is there a specific fio or other benchmark configuration that would better represent the typical Ceph workload in my setup?

I appreciate any insights or recommendations you can provide. Thank you for your help
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!