[SOLVED] need clarification on cache settings for rbd-based storage

Waschbüsch

Renowned Member
Dec 15, 2014
93
8
73
Munich
Hi there,

I use VMs with ceph / rbd backend for storage and am confused about the cache settings:

On the wiki (https://pve.proxmox.com/wiki/Performance_Tweaks) the different caching options are explained.
And from the description there I would have thought that writethrough is the thing to use if I want to ensure writes are fsynced pronto without disabling caching altogether.

Now, in the ceph documentation (http://docs.ceph.com/docs/luminous/rbd/qemu-rbd/) there is a warning:
Important - If you set rbd_cache=true, you must set cache=writeback or risk data loss. Without cache=writeback, QEMU will not send flush requests to librbd. If QEMU exits uncleanly in this configuration, filesystems on top of rbd can be corrupted.

This leaves me confused as this would suggest that cache mode writeback is, in this use case, the more secure thing?
 
  • Like
Reactions: chrone
cache=none can be faster on NVMe or enterprise class SSD drives. The caching changed in QEMU 2.4.0 as well, as detailed here:
http://docs.ceph.com/docs/luminous/rbd/qemu-rbd/

ie: QEMU cache settings now override Ceph settings.

I believe the warnings pertaining to usermode RBD caching causing data loss do not affect kernel RBD, as it has access to the page cache.

We use kernel RBD module for optimal performance (requires changing image features to disable
object-map, fast-diff, deep-flatten), cache=writeback for hdd backed VMs and cache=none for high performance VMs.
 
Hi there,

I use VMs with ceph / rbd backend for storage and am confused about the cache settings:

On the wiki (https://pve.proxmox.com/wiki/Performance_Tweaks) the different caching options are explained.
And from the description there I would have thought that writethrough is the thing to use if I want to ensure writes are fsynced pronto without disabling caching altogether.

Now, in the ceph documentation (http://docs.ceph.com/docs/luminous/rbd/qemu-rbd/) there is a warning:


This leaves me confused as this would suggest that cache mode writeback is, in this use case, the more secure thing?


if you enable cache=writeback on vm, it'll enable rbd_cache=true.
ceph have a feature by default
rbd cache writethrough until flush = true.

That mean that it's waiting to receive a first fsync, before really enable writeback. So you are safe to enable writeback.


Writeback is helping for 1thing : for sequential write of small blocks. It's merging them into a big transaction, before sending it to ceph storage.

but currently, it's increase latency for read. (maybe it'll solved in next ceph release, devs are currently working on it).

So , until you need sequential write, just keep cache=none.
 
Herewith quantitative results for a Windows 2012r2 VirtIO SCSI virtual. We used Microsoft's Diskspd (https://gallery.technet.microsoft.com/DiskSpd-a-robust-storage-6cd2f223) to obtain IOPs on 8K random read/write and throughput on 256K random read/write patterns.

Commands:
Code:
diskspd.exe -b8K   -d120 -Suw -L -o2 -t4 -r -w30 -c250M c:\io.dat
diskspd.exe -b256K -d120 -Suw -L -o2 -t4 -r -w30 -c250M c:\io.dat


Essentially:
  • (-d120) Run for a period of 2 minutes
  • (-Su) Disable software caching (within VM itself)
  • (-Sw) Set hardware to writethrough caching
  • (-L) Measure latency statistics
  • (-o2) Number of oustanding I/O requests per target per thread
  • (-t4) Number of threads per target
  • (-r) Random I/O alignment
  • (-w30) 30 percent writes (equates to 70/30 read/write distribution)
  • (-c250M) Create file of 250MB

write back:
Code:
erasure coded:
        read         write
8k      945          406
256k    167.26 MB/s  72.13 MB/s
 
erasure coded with lz4 compression:
        read         write
8k      1007         432
256k    169.98 MB/s  73.36 MB/s



none:
Code:
erasure coded:
        read         write
8k      2103         899
256k    330.33 MB/s  142.19 MB/s


erasure coded with lz4 compression:
        read         write
8k      2088         892
256k    356.98 MB/s  154.01 MB/s
 
if you enable cache=writeback on vm, it'll enable rbd_cache=true.
ceph have a feature by default
rbd cache writethrough until flush = true.

That mean that it's waiting to receive a first fsync, before really enable writeback. So you are safe to enable writeback.

If you mean safe as in 'knowing that the VM is actually issuing flush commands, then I guess that is true.
But I was more concerned with minimizing the risk of data-loss.

So , until you need sequential write, just keep cache=none.

That is what I'll go with, then.
 
@David : do you have tried with a bigger file size ? (as it's random write, with a small file, you have more chance to have 2 block near each others, so writeback is usefull is this case).
 
If you mean safe as in 'knowing that the VM is actually issuing flush commands, then I guess that is true.
But I was more concerned with minimizing the risk of data-loss.

That is what I'll go with, then.

if you are concerned about dataloss, cache=none.
rbd_cache is 32mb (can be tuned), so even with fsync you can loose 32mb. (but you'll don't have filesystem corruption).
 
Just to make sure, this rbd_cache memory is located in ceph cluster (not on pve host)?
 
Just to make sure, this rbd_cache memory is located in ceph cluster (not on pve host)?

The configured rbd_cache_size is located on the machine where the client runs (e.g. the hypervisor that runs the VM that uses the image).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!