I/O Performance issues with Ceph

danstephans · Nov 18, 2024

We've had Proxmox with Ceph for over 5 years now and have deployed a production cluster to move off of VMware and NetApp. We've got about 60T of NVME in a dedicated pool, 15T of SSD and 20T of HDD fronted by SSD configured in Ceph.

Overview:
5 dedicated storage nodes and 4 compute nodes with 25G bonded ethernet for the backend all connected to Arista switching over fiber. The compute nodes have 2x10G for their front end.

Testing at the command line yields expected high performance with the expected breaks in performance by storage class. We are running into problems with high IOPs / throughput scenarios when the source is a VM (obviously kvm/qemu). We get great initial performance when an intense IO operation starts and then it gets clearly throttled. A database load that takes 29 minutes on VMware backed by old slow netapp storage takes seven hours when using an NVME drive.

The performance drop-off seems to occur at relatively the same wall clock point every time which indicates we're hitting some programmatic problem. We've moved from librbd to KRBD in testing and that helps because overall its faster but the dropoff still happens at the same time. It is almost like qemu is throttling but we don't have any disks in a throttle group and don't have anything configured to throttle. We see this behavior on Windows as well. Starting to suspect qemu so if there's something we need to tune or configure there maybe we've missed it.

Code:

rados write bench, default 16 threads

Total time run:         120.044
Total writes made:      38196
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     1272.73
Stddev Bandwidth:       139.248
Max bandwidth (MB/sec): 1600
Min bandwidth (MB/sec): 868
Average IOPS:           318
Stddev IOPS:            34.8119
Max IOPS:               400
Min IOPS:               217
Average Latency(s):     0.0502733
Stddev Latency(s):      0.0332458
Max latency(s):         0.992938
Min latency(s):         0.0174327

rados seq bench default 16 threads

Total time run:       76.7042
Total reads made:     38196
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   1991.86
Average IOPS:         497
Stddev IOPS:          15.5239
Max IOPS:             531
Min IOPS:             461
Average Latency(s):   0.031651
Max latency(s):       0.246385
Min latency(s):       0.00564461

rados rand bench default 16 threads

Total time run:       120.029
Total reads made:     60349
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   2011.15
Average IOPS:         502
Stddev IOPS:          12.747
Max IOPS:             531
Min IOPS:             474
Average Latency(s):   0.031409
Max latency(s):       0.217155
Min latency(s):       0.00256268

rbd bench-write localimage --pool=NVME
rbd: bench-write is deprecated, use rbd bench --io-type write ...
bench  type write io_size 4096 io_threads 16 bytes 1073741824 pattern sequential
  SEC       OPS   OPS/SEC   BYTES/SEC
    1    118096    118227   462 MiB/s
    2    231504    115815   452 MiB/s
elapsed: 2   ops: 262144   ops/sec: 114972   bytes/sec: 449 MiB/s

All those tests are on the NVME pool. Any thoughts or suggestions for what to try are appreciated. We have great performance in most instances but these are showstoppers.

danstephans · Nov 19, 2024

After extensive testing today it turns out we've got some crap NVME drives (40 of them).

hand363 · Nov 19, 2024

Haha, I was just about to ask. Out of curiosity which SSDs are these? I am planning multiple, small scale deployments myself and I am wondering if consumer Samsung 990 pros are a bad idea and if I should stick with more expensive and harder to buy Solidigm D7-PS1030 (enterprise U.3) drives are worth it?

danstephans · Nov 19, 2024

So our 2.5" SSDs are Intel 4TB data center ssds and the are fantastic. The NVMEs are crucial P3 and they (apparently) have a SLC cache in front of QLC and once that cache fills the performance goes into the toilet. It's unbelievably bad once that happens. I'll be buying some Micron 7450 Pro to replace them but nobody seems to have any in stock. With what I know (now) about the Crucial P3s, I wouldn't put them in my desktop. Live and learn.

ness1602 · Nov 19, 2024

Crucial P3s are okay for win office machines, but nothing else.

Search

Search

I/O Performance issues with Ceph

danstephans

Member

danstephans

Member

hand363

Member

danstephans

Member

ness1602

Famous Member

We value your privacy