ceph can't disable tiering cache

Alibek

Well-Known Member
Jan 13, 2017
102
15
58
44
I switch cache-mode from writeback to proxy
Code:
# ceph osd tier cache-mode rbd_nvme proxy

But after cache-flush-evict-all (with errors) cache filled again
Code:
# rados -p rbd_nvme cache-flush-evict-all | grep failed
failed to flush /rbd_header.5a06de6b8b4567: (16) Device or resource busy
failed to flush /rbd_header.3d4e9a6b8b4567: (16) Device or resource busy
failed to flush /rbd_header.5a09276b8b4567: (16) Device or resource busy
failed to flush /rbd_directory: (16) Device or resource busy
failed to flush /rbd_header.aac2196b8b4567: (16) Device or resource busy
failed to flush /rbd_header.55f35f6b8b4567: (16) Device or resource busy
failed to flush /rbd_header.6143126b8b4567: (16) Device or resource busy
failed to flush /rbd_header.58ef4e6b8b4567: (16) Device or resource busy
failed to flush /rbd_header.6143726b8b4567: (16) Device or resource busy
failed to flush /rbd_header.5d65c66b8b4567: (16) Device or resource busy
failed to flush /rbd_header.61352a6b8b4567: (16) Device or resource busy
failed to flush /rbd_header.dca34e6b8b4567: (16) Device or resource busy
failed to flush /rbd_header.43c3336b8b4567: (16) Device or resource busy
failed to flush /rbd_header.42855e6b8b4567: (16) Device or resource busy
failed to flush /rbd_header.5a09f36b8b4567: (16) Device or resource busy
failed to flush /rbd_header.4627e96b8b4567: (16) Device or resource busy
failed to flush /rbd_header.39b7d56b8b4567: (16) Device or resource busy
failed to flush /rbd_header.4cd13b6b8b4567: (16) Device or resource busy
failed to flush /rbd_header.55eed16b8b4567: (16) Device or resource busy
failed to flush /rbd_header.36a4c46b8b4567: (16) Device or resource busy
failed to flush /rbd_header.5922346b8b4567: (16) Device or resource busy
failed to flush /rbd_header.2ef2546b8b4567: (16) Device or resource busy
failed to flush /rbd_header.611dd76b8b4567: (16) Device or resource busy
failed to flush /rbd_header.f99eee6b8b4567: (16) Device or resource busy
failed to flush /rbd_header.5a02036b8b4567: (16) Device or resource busy
failed to flush /rbd_header.5fef2b6b8b4567: (16) Device or resource busy
failed to evict /rbd_data.55f35f6b8b4567.0000000000000602: (16) Device or resource busy
failed to flush /rbd_header.f042356b8b4567: (16) Device or resource busy
failed to flush /rbd_header.96a0ae6b8b4567: (16) Device or resource busy
failed to flush /rbd_header.44289d6b8b4567: (16) Device or resource busy
error from cache-flush-evict-all: (1) Operation not permitted

# ceph -s -d; ceph df
  cluster:
    id:     e20e909d-6303-47e4-b00b-b53f7c6551d1
    health: HEALTH_OK

  services:
    mon: 4 daemons, quorum lpr11a,lpr11b,lpr11c,lpr11d
    mgr: lpr11a(active), standbys: lpr11c, lpr11b, lpr11d
    osd: 24 osds: 24 up, 24 in

  data:
    pools:   2 pools, 640 pgs
    objects: 544.84k objects, 2.05TiB
    usage:   4.13TiB used, 55.8TiB / 60.0TiB avail
    pgs:     640 active+clean

  io:
    client:   0B/s rd, 1.78MiB/s wr, 0op/s rd, 91op/s wr
    cache:    0op/s promote

GLOBAL:
    SIZE        AVAIL       RAW USED     %RAW USED 
    60.0TiB     55.8TiB      4.13TiB          6.88 
POOLS:
    NAME         ID     USED        %USED     MAX AVAIL     OBJECTS 
    ec_ssd       39     2.04TiB      7.73       24.4TiB      540465 
    rbd_nvme     40     10.6GiB      0.80       1.28TiB        4376

Code:
# ceph osd dump
epoch 4897
fsid e20e909d-6303-47e4-b00b-b53f7c6551d1
created 2018-09-18 01:38:51.019392
modified 2019-05-04 00:01:16.663757
flags sortbitwise,recovery_deletes,purged_snapdirs
crush_version 121
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client jewel
min_compat_client jewel
require_osd_release luminous
pool 39 'ec_ssd' erasure size 4 min_size 3 crush_rule 5 object_hash rjenkins pg_num 512 pgp_num 512 last_change 4897 lfor 3048/3048 flags hashpspool,ec_overwrites tiers 40 read_tier 40 write_tier 40 stripe_width 8192 compression_algorithm none compression_mode force application rbd
    removed_snaps [1~6d]
pool 40 'rbd_nvme' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 128 pgp_num 128 last_change 4897 lfor 3048/3048 flags hashpspool,incomplete_clones tier_of 39 cache_mode proxy target_bytes 107374182400 hit_set bloom{false_positive_probability: 0.05, target_size: 0, seed: 0} 600s x12 decay_rate 0 search_last_n 0 stripe_width 0 compression_algorithm lz4 compression_mode force
    removed_snaps [1~6d]
max_osd 24
osd.0 up   in  weight 1 up_from 3335 up_thru 4886 down_at 3333 last_clean_interval [3186,3332) 172.20.0.11:6816/7293 172.20.0.11:6817/7293 172.20.0.11:6818/7293 172.20.0.11:6819/7293 exists,up d8f151c7-d83c-4c1f-ab19-47101a6049ad
osd.1 up   in  weight 1 up_from 3335 up_thru 4886 down_at 3333 last_clean_interval [3186,3333) 172.20.0.11:6812/7116 172.20.0.11:6813/7116 172.20.0.11:6814/7116 172.20.0.11:6815/7116 exists,up cf70903c-d21b-4b38-8a0f-c8ee5cfc4085
osd.2 up   in  weight 1 up_from 3335 up_thru 4880 down_at 3333 last_clean_interval [3186,3333) 172.20.0.11:6820/7417 172.20.0.11:6821/7417 172.20.0.11:6822/7417 172.20.0.11:6823/7417 exists,up f641e090-7e4e-4e01-a448-c110e2128370
osd.3 up   in  weight 1 up_from 3335 up_thru 4886 down_at 3333 last_clean_interval [3186,3332) 172.20.0.11:6804/6807 172.20.0.11:6805/6807 172.20.0.11:6806/6807 172.20.0.11:6807/6807 exists,up 8f989bd7-825f-483a-be43-89622a6b40d1
osd.4 up   in  weight 1 up_from 3335 up_thru 4877 down_at 3333 last_clean_interval [3186,3332) 172.20.0.11:6800/6661 172.20.0.11:6801/6661 172.20.0.11:6802/6661 172.20.0.11:6803/6661 exists,up 20e4ff9c-5ee5-46eb-9112-ef2382c9bc37
osd.5 up   in  weight 1 up_from 3338 up_thru 4877 down_at 3337 last_clean_interval [3335,3337) 172.20.0.11:6808/6925 172.20.0.11:6824/1006925 172.20.0.11:6825/1006925 172.20.0.11:6826/1006925 exists,up 38bf12a3-b227-4d50-b0ce-e206b764d1ec
osd.6 up   in  weight 1 up_from 4807 up_thru 4885 down_at 4804 last_clean_interval [3361,4806) 172.20.0.12:6812/7326 172.20.0.12:6805/1007326 172.20.0.12:6806/1007326 172.20.0.12:6817/1007326 exists,up d4da4457-aab7-4940-95e5-7f2266b304c8
osd.7 up   in  weight 1 up_from 4807 up_thru 4886 down_at 4804 last_clean_interval [3362,4806) 172.20.0.12:6816/7664 172.20.0.12:6807/1007664 172.20.0.12:6824/1007664 172.20.0.12:6825/1007664 exists,up 29829d57-6b1d-485c-a29c-0e267978f775
osd.8 up   in  weight 1 up_from 4807 up_thru 4886 down_at 4804 last_clean_interval [3362,4806) 172.20.0.12:6804/7043 172.20.0.12:6813/1007043 172.20.0.12:6814/1007043 172.20.0.12:6815/1007043 exists,up c06a405e-e9d8-4ad4-9d66-a73d95efb139
osd.9 up   in  weight 1 up_from 4807 up_thru 4886 down_at 4804 last_clean_interval [3361,4806) 172.20.0.12:6808/7197 172.20.0.12:6818/1007197 172.20.0.12:6819/1007197 172.20.0.12:6826/1007197 exists,up 5883c543-daa2-4470-bf80-c73ac51ad0e2
osd.10 up   in  weight 1 up_from 4807 up_thru 4877 down_at 4806 last_clean_interval [3362,4806) 172.20.0.12:6800/6856 172.20.0.12:6809/1006856 172.20.0.12:6810/1006856 172.20.0.12:6811/1006856 exists,up 0a272507-ccc1-45fb-8879-ed15bccf1c9b
osd.11 up   in  weight 1 up_from 4808 up_thru 4877 down_at 4806 last_clean_interval [3362,4807) 172.20.0.12:6820/7863 172.20.0.12:6801/1007863 172.20.0.12:6802/1007863 172.20.0.12:6803/1007863 exists,up 7c05fcd3-6736-4e25-bfc0-fda5d780ecd5
osd.12 up   in  weight 1 up_from 3353 up_thru 4880 down_at 3350 last_clean_interval [3325,3349) 172.20.0.13:6808/6691 172.20.0.13:6809/6691 172.20.0.13:6810/6691 172.20.0.13:6811/6691 exists,up b2d9b125-850b-4f88-bebc-dd7eb43c1652
osd.13 up   in  weight 1 up_from 3353 up_thru 4880 down_at 3350 last_clean_interval [3325,3349) 172.20.0.13:6804/6569 172.20.0.13:6805/6569 172.20.0.13:6806/6569 172.20.0.13:6807/6569 exists,up 1f635d20-b410-44a6-908d-4b934a447a0d
osd.14 up   in  weight 1 up_from 3352 up_thru 4880 down_at 3350 last_clean_interval [3325,3349) 172.20.0.13:6820/7167 172.20.0.13:6821/7167 172.20.0.13:6822/7167 172.20.0.13:6823/7167 exists,up 872fbb31-9c76-4c8c-bca1-ed3fc7b1e4c6
osd.15 up   in  weight 1 up_from 4886 up_thru 4886 down_at 4885 last_clean_interval [3352,4885) 172.20.0.13:6816/7003 172.20.0.13:6814/1007003 172.20.0.13:6815/1007003 172.20.0.13:6827/1007003 exists,up 62acea9f-7bff-4b4b-ab41-2045aca677b2
osd.16 up   in  weight 1 up_from 4815 up_thru 4877 down_at 4814 last_clean_interval [3353,4814) 172.20.0.13:6812/6816 172.20.0.13:6824/1006816 172.20.0.13:6825/1006816 172.20.0.13:6826/1006816 exists,up a579b8cd-4a73-46fe-8ed1-2f938347a061
osd.17 up   in  weight 1 up_from 3352 up_thru 4877 down_at 3350 last_clean_interval [3325,3349) 172.20.0.13:6800/6411 172.20.0.13:6801/6411 172.20.0.13:6802/6411 172.20.0.13:6803/6411 exists,up 30a43e51-37f1-4f08-9923-00f631f3710b
osd.18 up   in  weight 1 up_from 4043 up_thru 4885 down_at 4042 last_clean_interval [4041,4041) 172.20.0.14:6820/7801 172.20.0.14:6821/7801 172.20.0.14:6822/7801 172.20.0.14:6823/7801 exists,up 7cdf1ac3-926d-4333-92d4-3f937848f07a
osd.19 up   in  weight 1 up_from 4044 up_thru 4886 down_at 4042 last_clean_interval [4040,4041) 172.20.0.14:6808/6930 172.20.0.14:6809/6930 172.20.0.14:6810/6930 172.20.0.14:6811/6930 exists,up b8c9a21a-64e3-4b82-8f4d-4bccde887df8
osd.20 up   in  weight 1 up_from 4044 up_thru 4886 down_at 4042 last_clean_interval [4040,4041) 172.20.0.14:6804/6801 172.20.0.14:6805/6801 172.20.0.14:6806/6801 172.20.0.14:6807/6801 exists,up 8d24ef13-e863-4725-b2da-1f0ecd767973
osd.21 up   in  weight 1 up_from 4044 up_thru 4880 down_at 4042 last_clean_interval [4040,4041) 172.20.0.14:6816/7421 172.20.0.14:6817/7421 172.20.0.14:6818/7421 172.20.0.14:6819/7421 exists,up f6367d1e-5e59-46af-b742-ec4f05f30cd0
osd.22 up   in  weight 1 up_from 4043 up_thru 4877 down_at 4042 last_clean_interval [4040,4041) 172.20.0.14:6812/7226 172.20.0.14:6813/7226 172.20.0.14:6814/7226 172.20.0.14:6815/7226 exists,up d015fb9a-01b0-40e3-bb82-723a7a4b313d
osd.23 up   in  weight 1 up_from 4044 up_thru 4877 down_at 4042 last_clean_interval [4041,4041) 172.20.0.14:6800/6638 172.20.0.14:6801/6638 172.20.0.14:6802/6638 172.20.0.14:6803/6638 exists,up dac4e915-3505-4cb5-b6c0-99a4200c5ded

Code:
pveversion --verbose
proxmox-ve: 5.3-1 (running kernel: 4.15.18-9-pve)
pve-manager: 5.3-5 (running version: 5.3-5/97ae681d)
pve-kernel-4.15: 5.2-12
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-7-pve: 4.15.18-27
pve-kernel-4.15.18-4-pve: 4.15.18-23
pve-kernel-4.15.18-2-pve: 4.15.18-21
pve-kernel-4.15.18-1-pve: 4.15.18-19
pve-kernel-4.15.17-3-pve: 4.15.17-14
ceph: 12.2.8-pve1
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: not correctly installed
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-43
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-33
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-5
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-31
pve-container: 2.0-31
pve-docs: 5.3-1
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-16
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-43
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1
 
RBD clients still access the cache tier, hence the failed flush messages. While the link talks about removing the cache tier, it also shows the steps to evict all objects from it.
http://docs.ceph.com/docs/luminous/rados/operations/cache-tiering/#removing-a-cache-tier

And just for reference.
http://docs.ceph.com/docs/luminous/rados/operations/cache-tiering/#a-word-of-caution
KNOWN BAD WORKLOADS
The following configurations are known to work poorly with cache tiering.

  • RBD with replicated cache and erasure-coded base: This is a common request, but usually does not perform well. Even reasonably skewed workloads still send some small writes to cold objects, and because small writes are not yet supported by the erasure-coded pool, entire (usually 4 MB) objects must be migrated into the cache in order to satisfy a small (often 4 KB) write. Only a handful of users have successfully deployed this configuration, and it only works for them because their data is extremely cold (backups) and they are not in any way sensitive to performance.
  • RBD with replicated cache and base: RBD with a replicated base tier does better than when the base is erasure coded, but it is still highly dependent on the amount of skew in the workload, and very difficult to validate. The user will need to have a good understanding of their workload and will need to tune the cache tiering parameters carefully.
 
RBD clients still access the cache tier, hence the failed flush messages. While the link talks about removing the cache tier, it also shows the steps to evict all objects from it.
http://docs.ceph.com/docs/luminous/rados/operations/cache-tiering/#removing-a-cache-tier
This is recomendation for readonly cache.
And this recomendation wrong for writeback cache - forward mode can't usable because:
Code:
# ceph osd tier cache-mode rbd_nvme forward
Error EPERM: 'forward' is not a well-supported cache mode and may corrupt your data.  pass --yes-i-really-mean-it to force

I try to remove writeback cache - it must be remove by proxy mode. And right documentation is on next page: http://docs.ceph.com/docs/master/rados/operations/cache-tiering/#removing-a-writeback-cache
 
Last edited:
I found out that I can not give up tiering with erasure code pool for store images because:
"RBD can store image data in EC pools, but the image header and metadata still needs to go in a replicated pool. Assuming you have the usual pool named “rbd” for this purpose"
For example:
Code:
rbd create rbd/myimage --size 1T --data-pool ec42
But PVE can't use option --data-pool for RBD storage. ((

Note: I want to use erasure code pool because it represent more storage spaces than replicated pool.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!