Ceph: Unusable random read/write performance

WildcardTom

New Member
Jul 3, 2024
5
0
1
Currently running a 3 node cluster with Ceph, suffering from unusably bad random read/write performance.

Each node is running 2x 22 core Xeon chips with 256GB RAM.
Ceph network is 10G, MTU 9000 - I have verified this is being used correctly.
Currently running 2 SATA SSD OSDs per node, increasing to 7 OSDs per node didn't make a difference.
SSDs are consumer grade, however using Kingston DC600M enterprise disks made no difference.
Disk level caching is ON, disabling it just made things worse
Increasing threads per shard made no difference.
Ceph pool configured as 3 way replication.
Guest configured with 8 cores, 16GB RAM, VirtIO SCSI single, CPU Type=Host, NUMA=ON.
Write through and write back caching increase write performance to acceptable levels but doesn't impact read performance.

I've attached a CrystalDisk screenshot from a W10 VM, and kdiskmark from Ubuntu.

Is this the performance I should expect from 3 nodes? I'm looking to scale up to 4 or 5 with 6-8 OSDs per node however if I can't get anything close to acceptable figures with 3 nodes I won't be able to justify the cost.
 

Attachments

  • CrystalDiskMark.png
    CrystalDiskMark.png
    52.2 KB · Views: 8
  • kdiskmark.png
    kdiskmark.png
    71.8 KB · Views: 8

I've ran rados bench, seeing around 1200M read and 550M write. VM sees around 750M read and 250M write without any caching. I've changed to my enterprise disks with PLP, increases to around 1450M read 600M write. VM writes are a little higher, maybe 20% on sequential and double on random write, but read speeds just aren't moving.

I've ran the rados commands as shown, these are just sequential read/writes if I'm not mistaken and don't seem to show the problem that the VM does.

The network config used in that document is vastly superior to mine, but so are their storage disks. With my target for performance being below even their worst benchmark I'm not sure what parallels I can draw to it. Random performance is my issue here so I don't believe my network performance would be the issue, although if I were to scale this up running many VMs on many OSDs I can see 10gbps being slow.
 
Hello, did you try enabling the writeback cache on the VM? What about enabling KRBD? As per our documentation [1]:

KRBD: Enforce access to rados block devices through the krbd kernel module. Optional.

I would suggest to not skip any test when benchmarking Ceph:

- Benchmark the network, if the NICs support 10G you should be able to reach 10G speeds
- Benchmark disks individually
- Benchmark IO to the Ceph pool with rados bench
- Finally, benchmark IO inside a VM

[1] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#storage_rbd_config
 
  • Like
Reactions: WildcardTom
Hello, did you try enabling the writeback cache on the VM? What about enabling KRBD? As per our documentation [1]:



I would suggest to not skip any test when benchmarking Ceph:

- Benchmark the network, if the NICs support 10G you should be able to reach 10G speeds
- Benchmark disks individually
- Benchmark IO to the Ceph pool with rados bench
- Finally, benchmark IO inside a VM

[1] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#storage_rbd_config

Enabling KRBD caused my Windows VMs to fail to start, QEMU exit code 1.
Ubuntu worked just fine, but saw no improvement. Writeback cache improves writes

Network benchmarks at full 10gbps stable, 0.2ms or lower latency.

Benchmarking the disks individually with fio works great.

Benchmarking Ceph pool with rados bench returns good sequential results, higher than shown in the VM. Unsure how to correctly run a random rw test, guidance appreciated.

Benchmark IO inside VM returns acceptable sequential performance, unusably slow random. Writeback cache fixes writes, but not read.
 
Last edited:
Could you please share with us the exact error you get on the windows VM? In theory KRBD should be transparent to the guest. You can find the logs for the Start task on the bottom panel of the web UI. Alternatively you look for the error in the systemlogs.
 
Could you please share with us the exact error you get on the windows VM? In theory KRBD should be transparent to the guest. You can find the logs for the Start task on the bottom panel of the web UI. Alternatively you look for the error in the systemlogs.


task started by HA resource agent
/dev/rbd1
kvm: -drive file=/dev/rbd-pve/a3b46bae-59d9-4a2a-bc08-e5818add321f/TestPool1/vm-100-disk-0,if=none,id=drive-scsi0,cache=writeback,aio=native,discard=on,format=raw,detect-zeroes=unmap: aio=native was specified, but it requires cache.direct=on, which was not specified.
TASK ERROR: start failed: QEMU exited with code 1

Looks to be my Async IO configuration?
 
I think the issue is that KRBD cannot be used together with aio=native and cache=writeback. Please try with aio=io_uring. See [1].

[1] https://bugzilla.proxmox.com/show_bug.cgi?id=5537

Okay I've adjusted that and enabled KRBD, rebooted my hosts and performance has taken off.

Windows seeing 12GB/s read, 4.5GB/s write. Random performance around 20MB/s. Ubuntu is falling short of this but still seeing massive improvements. I'll be restricting each VM in production anyway so this is fine. Write through cache is also showing a big performance uplift.

I can see network behavior has changed, seems to be mainly reading and writing to RAM cache as these disks cannot perform like this. Network load is massively reduced and I'm only seeing occasional spikes of high usage (I'm assuming Ceph periodically copying data when writes are flushed)

Hopefully this has solved it, appreciate the assistance
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!