Ceph VM Performance

roli8200

Member
Feb 7, 2020
17
0
21
49
Hello

I noticed an annoying difference in the performance of Ceph/RDB and the performance in the VM itself.
While RDB is fast as expected:
- 40GBe each on Storage Frontend and Backend Network,
- All Enterpise SAS SSD
- replica 2
- RDB Cache
- Various OSD optimizations
- KRBD activated on the pool
- Ceph Authx disabled

Code:
rados bench -p ceph-virtualmachines 30 write --no-cleanup
Result:
Code:
Total time run: 30.0404
Total writes made: 19918
Write size: 4194304
object size: 4194304
Bandwidth (MB/sec): 2652.17
Stddev Bandwidth: 85.7042
Max bandwidth (MB/sec): 2800
Min bandwidth (MB/sec): 2460
Average IOPS: 663
Stddev IOPS: 21,4261
Max IOPS: 700
Min IOPS: 615
Average Latency(s): 0.0241245
Stddev Latency(s): 0.01518
Max latency(s): 0.256767
Min latency(s): 0.00936511

Is the performance in the VM (CentOS 7 x64, VirtIO-SCSI / Debian 10 x64 VirtIO-SCSI) significantly worse:
fio test over 16GB (with 2GB VM memory):
Code:
 write: IOPS=87.3k, BW=341MiB/s (358MB/s)(16.0GiB/48050msec)

dd test: over 25GB (with 2GB VM Memory):
Result:
Code:
dd if=/dev/zero of=test.img bs=1M count=25000
26214400000 bytes (26 GB) copied, 16.0823 s, 1.6 GB/s

dd if=/dev/zero of=test.img bs=1M count=25000 oflag=direct
26214400000 bytes (26 GB) copied, 47,8142 s, 548 MB/s

I ask myself all the time, where does this difference in performance come from. What could be tuned there?
 
Yes, but they are so far from each other that it cannot be only due to the different benchmarks.
 
for fio, you're results are with small blocks size(->big iops, small bandwidth)) vs rados bench (big block size -> small ios,big bandwidth)


you can't really bench with with dd=/dev/zero, because they are fast zeroes write in the qemu backend.
 
By the way: Write-Back on the VM is activated.

To have comparable numbers, I created a ceph image, mapped it and ran the same fio test as in the VM (on the same host system).

fio with rbd direct write:
Bash:
write: IOPS=367k, BW=1433MiB/s (1503MB/s)(8192MiB/5715msec)

fio with rbd direct rw:
Bash:
read: IOPS=79.7k, BW=311MiB/s (327MB/s)(4098MiB/13159msec)
write: IOPS=79.6k, BW=311MiB/s (326MB/s)(4094MiB/13159msec)

fio in the virtual machine write:
Bash:
write: IOPS=89.3k, BW=349MiB/s (366MB/s)(8192MiB/23471msec)

fio in the virtual machine rw:
Bash:
read: IOPS=47.0k, BW=187MiB/s (196MB/s)(4098MiB/21872msec)
write: IOPS=47.9k, BW=187MiB/s (196MB/s)(4094MiB/21872msec)

So, now I have the same test and already significant differences.

I can also turn KRBD off, but various posts here and elsewhere have clearly described that KRBD is the fastest connection.
 
I can also turn KRBD off, but various posts here and elsewhere have clearly described that KRBD is the fastest connection.
with octopus, they are a new scheduler in the librbd client (and I think it's not yet available in krbd)

https://docs.ceph.com/en/latest/releases/octopus/

"
  • librbd now uses a write-around cache policy be default, replacing the previous write-back cache policy default. This cache policy allows librbd to immediately complete write IOs while they are still in-flight to the OSDs. Subsequent flush requests will ensure all in-flight write IOs are completed prior to completing. The librbd cache policy can be controlled via a new “rbd_cache_policy” configuration option.
  • librbd now includes a simple IO scheduler which attempts to batch together multiple IOs against the same backing RBD data block object. The librbd IO scheduler policy can be controlled via a new “rbd_io_scheduler” configuration option.

"

Here my result: (with 3ghz cpu)

Code:
                        nautilus-cache=none     nautilus-cache=writeback          octopus-cache=none     octopus-cache=writeback
          
randread 4k                  62.1k                        25.2k                                          61.1k                             60.8k
randwrite 4k                 27.7k                       19.5k                                           34.5k                           53.0k
seqwrite 4k                  7850                         37.5k                                          24.9k                            82.6k


Generally, the iops limit of 1disk in a vm is you cpu frequency. (by default, qemu use only 1thread/core for all disk. It's possible to have 1 thread/core by disk using virtio-scsi-single controller + iothread option.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!