[SOLVED] need clarification on cache settings for rbd-based storage

Waschbüsch · Apr 28, 2018

Hi there,

I use VMs with ceph / rbd backend for storage and am confused about the cache settings:

On the wiki (https://pve.proxmox.com/wiki/Performance_Tweaks) the different caching options are explained.
And from the description there I would have thought that writethrough is the thing to use if I want to ensure writes are fsynced pronto without disabling caching altogether.

Now, in the ceph documentation (http://docs.ceph.com/docs/luminous/rbd/qemu-rbd/) there is a warning:

Important - If you set rbd_cache=true, you must set cache=writeback or risk data loss. Without cache=writeback, QEMU will not send flush requests to librbd. If QEMU exits uncleanly in this configuration, filesystems on top of rbd can be corrupted.

This leaves me confused as this would suggest that cache mode writeback is, in this use case, the more secure thing?

David Herselman · Apr 29, 2018

cache=none can be faster on NVMe or enterprise class SSD drives. The caching changed in QEMU 2.4.0 as well, as detailed here:
http://docs.ceph.com/docs/luminous/rbd/qemu-rbd/

ie: QEMU cache settings now override Ceph settings.

I believe the warnings pertaining to usermode RBD caching causing data loss do not affect kernel RBD, as it has access to the page cache.

We use kernel RBD module for optimal performance (requires changing image features to disable
object-map, fast-diff, deep-flatten), cache=writeback for hdd backed VMs and cache=none for high performance VMs.

aderumier · Apr 29, 2018

Waschbüsch said:
Hi there,

I use VMs with ceph / rbd backend for storage and am confused about the cache settings:

On the wiki (https://pve.proxmox.com/wiki/Performance_Tweaks) the different caching options are explained.
And from the description there I would have thought that writethrough is the thing to use if I want to ensure writes are fsynced pronto without disabling caching altogether.

Now, in the ceph documentation (http://docs.ceph.com/docs/luminous/rbd/qemu-rbd/) there is a warning:

This leaves me confused as this would suggest that cache mode writeback is, in this use case, the more secure thing?

if you enable cache=writeback on vm, it'll enable rbd_cache=true.
ceph have a feature by default
rbd cache writethrough until flush = true.

That mean that it's waiting to receive a first fsync, before really enable writeback. So you are safe to enable writeback.

Writeback is helping for 1thing : for sequential write of small blocks. It's merging them into a big transaction, before sending it to ceph storage.

but currently, it's increase latency for read. (maybe it'll solved in next ceph release, devs are currently working on it).

So , until you need sequential write, just keep cache=none.

David Herselman · Apr 29, 2018

Herewith quantitative results for a Windows 2012r2 VirtIO SCSI virtual. We used Microsoft's Diskspd (https://gallery.technet.microsoft.com/DiskSpd-a-robust-storage-6cd2f223) to obtain IOPs on 8K random read/write and throughput on 256K random read/write patterns.

Commands:

Code:

diskspd.exe -b8K   -d120 -Suw -L -o2 -t4 -r -w30 -c250M c:\io.dat
diskspd.exe -b256K -d120 -Suw -L -o2 -t4 -r -w30 -c250M c:\io.dat

Essentially:

(-d120) Run for a period of 2 minutes
(-Su) Disable software caching (within VM itself)
(-Sw) Set hardware to writethrough caching
(-L) Measure latency statistics
(-o2) Number of oustanding I/O requests per target per thread
(-t4) Number of threads per target
(-r) Random I/O alignment
(-w30) 30 percent writes (equates to 70/30 read/write distribution)
(-c250M) Create file of 250MB

write back:

Code:

erasure coded:
        read         write
8k      945          406
256k    167.26 MB/s  72.13 MB/s
 
erasure coded with lz4 compression:
        read         write
8k      1007         432
256k    169.98 MB/s  73.36 MB/s

none:

Code:

erasure coded:
        read         write
8k      2103         899
256k    330.33 MB/s  142.19 MB/s


erasure coded with lz4 compression:
        read         write
8k      2088         892
256k    356.98 MB/s  154.01 MB/s

Waschbüsch · Apr 30, 2018

aderumier said:
if you enable cache=writeback on vm, it'll enable rbd_cache=true.
ceph have a feature by default
rbd cache writethrough until flush = true.

That mean that it's waiting to receive a first fsync, before really enable writeback. So you are safe to enable writeback.

If you mean safe as in 'knowing that the VM is actually issuing flush commands, then I guess that is true.
But I was more concerned with minimizing the risk of data-loss.

aderumier said:
So , until you need sequential write, just keep cache=none.

That is what I'll go with, then.

Waschbüsch · Apr 30, 2018

David Herselman said:
Herewith quantitative results for a Windows 2012r2 VirtIO SCSI virtual. We used Microsoft's Diskspd (https://gallery.technet.microsoft.com/DiskSpd-a-robust-storage-6cd2f223) to obtain IOPs on 8K random read/write and throughput on 256K random read/write patterns.

Thanks for the numbers, David. That always helps in making up one's mind!

aderumier · Apr 30, 2018

@David : do you have tried with a bigger file size ? (as it's random write, with a small file, you have more chance to have 2 block near each others, so writeback is usefull is this case).

aderumier · Apr 30, 2018

Waschbüsch said:
If you mean safe as in 'knowing that the VM is actually issuing flush commands, then I guess that is true.
But I was more concerned with minimizing the risk of data-loss.

That is what I'll go with, then.

if you are concerned about dataloss, cache=none.
rbd_cache is 32mb (can be tuned), so even with fsync you can loose 32mb. (but you'll don't have filesystem corruption).

woima · Nov 14, 2019

Just to make sure, this rbd_cache memory is located in ceph cluster (not on pve host)?

mgabriel · Oct 1, 2020

woima said:
Just to make sure, this rbd_cache memory is located in ceph cluster (not on pve host)?

The configured rbd_cache_size is located on the machine where the client runs (e.g. the hypervisor that runs the VM that uses the image).

Search

Search

[SOLVED] need clarification on cache settings for rbd-based storage

Waschbüsch

Renowned Member

David Herselman

Renowned Member

aderumier

Renowned Member

David Herselman

Renowned Member

Waschbüsch

Renowned Member

Waschbüsch

Renowned Member

aderumier

Renowned Member

aderumier

Renowned Member

woima

Member

mgabriel

Renowned Member

We value your privacy