Ceph RBD cache does not apply to vms

altayaltan

New Member
Aug 4, 2020
15
2
3
32
Hi,

I use an external ceph cluster as proxmox storage. When i tweak the rbd cache settings on the proxmox node rados bench test changes accordingly, so there's no problem with applying the rbd cache configurations there. However when i tested this on a vm inside the cluster it gave almost same results whether its "No Cache" or "Writeback" or Ceph rbd cache options commented out from ceph.conf . I'm trying to find some information about this but couldnt find any. Ceph.conf rbd section is like this:

Code:
rbd cache = True
rbd_cache_size= 268435456
rbd_cache_max_dirty= 134217728
rbd_cache_max_dirty_age= 5
rbd_cache_writethrough_until_flush= true
admin_socket= /var/run/ceph/guests/$cluster-$type.$id.$pid.$cctid.asok
log_file= /var/log/qemu/qemu-guest-$pid.log
rbd_concurrent_management_ops= 20
 
When i tweak the rbd cache settings on the proxmox node rados bench test changes accordingly, so there's no problem with applying the rbd cache configurations there.
Interesting, to my knowledge the rados bench doesn't use librbd. The cache setting shouldn't have any effect.

However when i tested this on a vm inside the cluster it gave almost same results whether its "No Cache" or "Writeback" or Ceph rbd cache options commented out from ceph.conf .
Did you migrate or stopp-started the VM for the settings to take effect?

Depending on the VMs resources, it might not be able to facilitate the speeds to get an visible effect of the cache.
 
Interesting, to my knowledge the rados bench doesn't use librbd. The cache setting shouldn't have any effect.


Did you migrate or stopp-started the VM for the settings to take effect?

Depending on the VMs resources, it might not be able to facilitate the speeds to get an visible effect of the cache.

Yes I did the tests then restarted and then tested again, showed no difference. Also tried some fio tests today with the same conditions and their results were almost the same as well. What am i missing? Other posts in this forum and some docs indicate that "writeback" disk cache option enables rbd cache usage on the machine. Or did i misunderstand those parts?
 
Yes I did the tests then restarted and then tested again, showed no difference.
With restart you man stop-start or a reboot inside the VM? The latter doesn't create a new KVM process.

Also tried some fio tests today with the same conditions and their results were almost the same as well. What am i missing?
Please share the benchmarks and its results.

Other posts in this forum and some docs indicate that "writeback" disk cache option enables rbd cache usage on the machine. Or did i misunderstand those parts?
Yes, disk cache = writeback activates librbd's cache.
 
With restart you man stop-start or a reboot inside the VM? The latter doesn't create a new KVM process.
I stop started from proxmox.

Please share the benchmarks and its results.
Code:
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test.fio --bs=4K --iodepth=64 --size=1G --readwrite=randwrite
This gave out about the same results before and after restart:
Code:
WRITE: bw=20.7MiB/s (21.7MB/s), 20.7MiB/s-20.7MiB/s (21.7MB/s-21.7MB/s), io=1024MiB (1074MB), run=49525-49525msec

And rados bench tests:
Code:
rbd bench --io-type write testimage --pool=testpool --io-threads 64 --io-total 4G --io-pattern rand --io-size 4K

elapsed:   364  ops:  1048576  ops/sec:  2878.55  bytes/sec: 11790537.13

Again almost the same results before and after restarts.
 
This gave out about the same results before and after restart:
Well, that's ~5555 IO/s. Depending on the hardware beneath it, that might be a good value.

rbd bench --io-type write testimage --pool=testpool --io-threads 64 --io-total 4G --io-pattern rand --io-size 4K
64 threads is more than usually a VM has. KVM only has one thread per VM disk.
 
Well, that's ~5555 IO/s. Depending on the hardware beneath it, that might be a good value.
The problem isnt with the average IO though. The problem is that changing cache configurations didnt change anything. Something wrong between ceph.conf and vms i suppose but what?
 
Yes im aware of the benchmark paper but my setup is not hyperconverged. I have proxmox and ceph as seperate clusters.
That only secondarily matters. The graphs show the cache modes and their outcomes.

To do debugging, the first tests should start at the bottom layer, hardware. What can the raw disks achieve, then network and next a rados bench itself. With this approach you find to max possible performance and can derive if caching can have any noticeable benefit.
 
That only secondarily matters. The graphs show the cache modes and their outcomes.

To do debugging, the first tests should start at the bottom layer, hardware. What can the raw disks achieve, then network and next a rados bench itself. With this approach you find to max possible performance and can derive if caching can have any noticeable benefit.
The hardware and bandwith tests have been done a while ago and I can definitely say there is no problem there at the moment. We are getting the max possible bandwith between proxmox node and ceph cluster. The appropriate osd configurations have been done as well.
 
The hardware and bandwith tests have been done a while ago and I can definitely say there is no problem there at the moment. We are getting the max possible bandwith between proxmox node and ceph cluster. The appropriate osd configurations have been done as well.
Ok, but you leave me in the dark, since I don't know the setup. The only thing I can tell you now, is that the cache has an effect, but maybe not to the extend that you'd expect. See the benchmark paper.
 
Ok, but you leave me in the dark, since I don't know the setup. The only thing I can tell you now, is that the cache has an effect, but maybe not to the extend that you'd expect. See the benchmark paper.
Will do, thank you for your time and input, i'll post here again if i come up with something in the morning.
 
  • Like
Reactions: Alwin Antreich
Checked the proxmox/ceph benchmark paper again and noticed that they see around %2 performance difference between nocache and writeback on librbd. I tried the same tests again and saw similar results. While using librbd the vms used ceph node's memory for caching. Then i swapped to krbd from proxmox storage options and ran the same tests(All the tests were done with 1 thread). And noticed that krbd is about 2 times faster than librbd, but when testing on krbd this time vm used from proxmox node's ram for caching. I suppose this is the expected behaviour?
Also when i tried to increase the number of threads for tests on krbd i reached to a speed i've never seen before but after the test peaked on performance suddenly write drops to 0. After a while i get timeout error from proxmox vm console while i cant do anything on my ssh session. Didn't encounter with such problems while testing librbd.
 
On librbd vs krbd:
librbd, is a client-side library in user-space. It has no connection to the kernel and therefore needs to bring its own cache. By default 32 MiB. Qemu is able to connect directly (no mapped device needed) to the cluster. Since librbd is shipped with Ceph, the version is always more up-to-date on features then the kernel client. Additionally, if iothreads is not specified, only one thread for all disks is used.​
krbd, is the kernel client and uses the page cache. There is no limit, besides free memory. This can be seen especially in the read results graphs on page 15. The 9 GB LV for the fio benchmark fits easily into memory and needed to be read only once. The kernel client needs to map each rbd image to a device, which Qemu then consumes.​
With Ceph Octopus, a new cache policy was introduced. The write-around policy. Because reads can be done in parallel from the OSDs, the writeback cache has the disadvantage of a starved write cache (ironically). The new policy caches only writes and directly reads from the OSDs.

I tried the same tests again and saw similar results. While using librbd the vms used ceph node's memory for caching. Then i swapped to krbd from proxmox storage options and ran the same tests(All the tests were done with 1 thread). And noticed that krbd is about 2 times faster than librbd, but when testing on krbd this time vm used from proxmox node's ram for caching. I suppose this is the expected behaviour?
I presume with same tests, you mean inside the VM. Was it Windows or LInux? And may you share the tests including the results?

Also when i tried to increase the number of threads for tests on krbd i reached to a speed i've never seen before but after the test peaked on performance suddenly write drops to 0. After a while i get timeout error from proxmox vm console while i cant do anything on my ssh session. Didn't encounter with such problems while testing librbd.
First thing that comes to my mind, is memory exhaustion. Probably the node started to swap and with that unresponsive.
 
  • Like
Reactions: b2a225

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!