Ceph RBD cache does not apply to vms

altayaltan · Jan 26, 2021

Hi,

I use an external ceph cluster as proxmox storage. When i tweak the rbd cache settings on the proxmox node rados bench test changes accordingly, so there's no problem with applying the rbd cache configurations there. However when i tested this on a vm inside the cluster it gave almost same results whether its "No Cache" or "Writeback" or Ceph rbd cache options commented out from ceph.conf . I'm trying to find some information about this but couldnt find any. Ceph.conf rbd section is like this:

Code:

rbd cache = True
rbd_cache_size= 268435456
rbd_cache_max_dirty= 134217728
rbd_cache_max_dirty_age= 5
rbd_cache_writethrough_until_flush= true
admin_socket= /var/run/ceph/guests/$cluster-$type.$id.$pid.$cctid.asok
log_file= /var/log/qemu/qemu-guest-$pid.log
rbd_concurrent_management_ops= 20

Alwin Antreich · Jan 26, 2021

altayaltan said:
When i tweak the rbd cache settings on the proxmox node rados bench test changes accordingly, so there's no problem with applying the rbd cache configurations there.

Interesting, to my knowledge the rados bench doesn't use librbd. The cache setting shouldn't have any effect.

altayaltan said:
However when i tested this on a vm inside the cluster it gave almost same results whether its "No Cache" or "Writeback" or Ceph rbd cache options commented out from ceph.conf .

Did you migrate or stopp-started the VM for the settings to take effect?

Depending on the VMs resources, it might not be able to facilitate the speeds to get an visible effect of the cache.

altayaltan · Jan 26, 2021

Alwin Antreich said:
Interesting, to my knowledge the rados bench doesn't use librbd. The cache setting shouldn't have any effect.

Did you migrate or stopp-started the VM for the settings to take effect?

Depending on the VMs resources, it might not be able to facilitate the speeds to get an visible effect of the cache.

Yes I did the tests then restarted and then tested again, showed no difference. Also tried some fio tests today with the same conditions and their results were almost the same as well. What am i missing? Other posts in this forum and some docs indicate that "writeback" disk cache option enables rbd cache usage on the machine. Or did i misunderstand those parts?

Alwin Antreich · Jan 26, 2021

altayaltan said:
Yes I did the tests then restarted and then tested again, showed no difference.

With restart you man stop-start or a reboot inside the VM? The latter doesn't create a new KVM process.

altayaltan said:
Also tried some fio tests today with the same conditions and their results were almost the same as well. What am i missing?

Please share the benchmarks and its results.

altayaltan said:
Other posts in this forum and some docs indicate that "writeback" disk cache option enables rbd cache usage on the machine. Or did i misunderstand those parts?

Yes, disk cache = writeback activates librbd's cache.

altayaltan · Jan 26, 2021

With restart you man stop-start or a reboot inside the VM? The latter doesn't create a new KVM process.
I stop started from proxmox.

Please share the benchmarks and its results.

Code:

fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test.fio --bs=4K --iodepth=64 --size=1G --readwrite=randwrite

This gave out about the same results before and after restart:

Code:

WRITE: bw=20.7MiB/s (21.7MB/s), 20.7MiB/s-20.7MiB/s (21.7MB/s-21.7MB/s), io=1024MiB (1074MB), run=49525-49525msec

And rados bench tests:

Code:

rbd bench --io-type write testimage --pool=testpool --io-threads 64 --io-total 4G --io-pattern rand --io-size 4K

elapsed:   364  ops:  1048576  ops/sec:  2878.55  bytes/sec: 11790537.13

Again almost the same results before and after restarts.

Alwin Antreich · Jan 26, 2021

altayaltan said:
This gave out about the same results before and after restart:

Well, that's ~5555 IO/s. Depending on the hardware beneath it, that might be a good value.

altayaltan said:
rbd bench --io-type write testimage --pool=testpool --io-threads 64 --io-total 4G --io-pattern rand --io-size 4K

64 threads is more than usually a VM has. KVM only has one thread per VM disk.

altayaltan · Jan 26, 2021

Alwin Antreich said:
Well, that's ~5555 IO/s. Depending on the hardware beneath it, that might be a good value.

The problem isnt with the average IO though. The problem is that changing cache configurations didnt change anything. Something wrong between ceph.conf and vms i suppose but what?

Alwin Antreich · Jan 26, 2021

Did you also run bandwidth benchmarks?

And have you seen the Ceph benchmark paper?
https://forum.proxmox.com/threads/proxmox-ve-ceph-benchmark-2020-09-hyper-converged-with-nvme.76516/

altayaltan · Jan 26, 2021

Alwin Antreich said:
Did you also run bandwidth benchmarks?

And have you seen the Ceph benchmark paper?
https://forum.proxmox.com/threads/proxmox-ve-ceph-benchmark-2020-09-hyper-converged-with-nvme.76516/

Yes im aware of the benchmark paper but my setup is not hyperconverged. I have proxmox and ceph as seperate clusters.

Alwin Antreich · Jan 26, 2021

altayaltan said:
Yes im aware of the benchmark paper but my setup is not hyperconverged. I have proxmox and ceph as seperate clusters.

That only secondarily matters. The graphs show the cache modes and their outcomes.

To do debugging, the first tests should start at the bottom layer, hardware. What can the raw disks achieve, then network and next a rados bench itself. With this approach you find to max possible performance and can derive if caching can have any noticeable benefit.

altayaltan · Jan 26, 2021

Alwin Antreich said:
That only secondarily matters. The graphs show the cache modes and their outcomes.

To do debugging, the first tests should start at the bottom layer, hardware. What can the raw disks achieve, then network and next a rados bench itself. With this approach you find to max possible performance and can derive if caching can have any noticeable benefit.

The hardware and bandwith tests have been done a while ago and I can definitely say there is no problem there at the moment. We are getting the max possible bandwith between proxmox node and ceph cluster. The appropriate osd configurations have been done as well.

Alwin Antreich · Jan 26, 2021

altayaltan said:
The hardware and bandwith tests have been done a while ago and I can definitely say there is no problem there at the moment. We are getting the max possible bandwith between proxmox node and ceph cluster. The appropriate osd configurations have been done as well.

Ok, but you leave me in the dark, since I don't know the setup. The only thing I can tell you now, is that the cache has an effect, but maybe not to the extend that you'd expect. See the benchmark paper.

altayaltan · Jan 26, 2021

Alwin Antreich said:
Ok, but you leave me in the dark, since I don't know the setup. The only thing I can tell you now, is that the cache has an effect, but maybe not to the extend that you'd expect. See the benchmark paper.

Will do, thank you for your time and input, i'll post here again if i come up with something in the morning.

altayaltan · Jan 27, 2021

Checked the proxmox/ceph benchmark paper again and noticed that they see around %2 performance difference between nocache and writeback on librbd. I tried the same tests again and saw similar results. While using librbd the vms used ceph node's memory for caching. Then i swapped to krbd from proxmox storage options and ran the same tests(All the tests were done with 1 thread). And noticed that krbd is about 2 times faster than librbd, but when testing on krbd this time vm used from proxmox node's ram for caching. I suppose this is the expected behaviour?
Also when i tried to increase the number of threads for tests on krbd i reached to a speed i've never seen before but after the test peaked on performance suddenly write drops to 0. After a while i get timeout error from proxmox vm console while i cant do anything on my ssh session. Didn't encounter with such problems while testing librbd.

Alwin Antreich · Jan 28, 2021

On librbd vs krbd:

librbd, is a client-side library in user-space. It has no connection to the kernel and therefore needs to bring its own cache. By default 32 MiB. Qemu is able to connect directly (no mapped device needed) to the cluster. Since librbd is shipped with Ceph, the version is always more up-to-date on features then the kernel client. Additionally, if iothreads is not specified, only one thread for all disks is used.

krbd, is the kernel client and uses the page cache. There is no limit, besides free memory. This can be seen especially in the read results graphs on page 15. The 9 GB LV for the fio benchmark fits easily into memory and needed to be read only once. The kernel client needs to map each rbd image to a device, which Qemu then consumes.

With Ceph Octopus, a new cache policy was introduced. The write-around policy. Because reads can be done in parallel from the OSDs, the writeback cache has the disadvantage of a starved write cache (ironically). The new policy caches only writes and directly reads from the OSDs.

altayaltan said:
I tried the same tests again and saw similar results. While using librbd the vms used ceph node's memory for caching. Then i swapped to krbd from proxmox storage options and ran the same tests(All the tests were done with 1 thread). And noticed that krbd is about 2 times faster than librbd, but when testing on krbd this time vm used from proxmox node's ram for caching. I suppose this is the expected behaviour?

I presume with same tests, you mean inside the VM. Was it Windows or LInux? And may you share the tests including the results?

altayaltan said:
Also when i tried to increase the number of threads for tests on krbd i reached to a speed i've never seen before but after the test peaked on performance suddenly write drops to 0. After a while i get timeout error from proxmox vm console while i cant do anything on my ssh session. Didn't encounter with such problems while testing librbd.

First thing that comes to my mind, is memory exhaustion. Probably the node started to swap and with that unresponsive.

Search

Search

Ceph RBD cache does not apply to vms

altayaltan

New Member

Alwin Antreich

Well-Known Member

altayaltan

New Member

Alwin Antreich

Well-Known Member

altayaltan

New Member

Alwin Antreich

Well-Known Member

altayaltan

New Member

Alwin Antreich

Well-Known Member

altayaltan

New Member

Alwin Antreich

Well-Known Member

altayaltan

New Member

Alwin Antreich

Well-Known Member

altayaltan

New Member

altayaltan

New Member

Alwin Antreich

Well-Known Member

We value your privacy