1 MDSs behind on trimming

mattock

New Member
Dec 26, 2024
4
0
1
Since a few days I got the Ceph HEALTH_WARN status with the following message:
mds.pve1(mds.0): Behind on trimming (1109/128) max_segments: 128, num_segments: 1109
The number num_segments keeps increasing.

I use CephFS and recently upgraded to CEPH 19.2.0 (from 18).
The problem occurred after the upgrade. Of course, that does not mean that this is the cause.



I already tried:
ceph config set mds mds_cache_trim_threshold 384Ki
ceph config set mds mds_cache_trim_decay_rate 0.5
ceph config set mds mds_recall_max_caps 45000B
ceph config set mds mds_recall_max_decay_rate 0.75
Source: https://www.suse.com/de-de/support/kb/doc/?id=000019740

And:
ceph config set mds mds_dir_max_commit_size 80
ceph fs fail fs_name
ceph fs set fs_name joinable true
Source: https://github.com/rook/rook/issues/14220#issuecomment-2270086717

But both didn´t help (https://github.com/rook/rook/issues/14220#issuecomment-2270086717).
I reverted these settings.


How can I resolve this problem?
If any additional info is needed, please let me know.

Thanks in advance.

proxmox-ve: 8.3.0 (running kernel: 6.8.12-5-pve)pve-manager: 8.3.2 (running version: 8.3.2/3e76eec21c4a14a7)proxmox-kernel-helper: 8.1.0pve-kernel-5.15: 7.4-4proxmox-kernel-6.8: 6.8.12-5proxmox-kernel-6.8.12-5-pve-signed: 6.8.12-5proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4proxmox-kernel-6.5.13-6-pve-signed: 6.5.13-6proxmox-kernel-6.5: 6.5.13-6pve-kernel-5.4: 6.4-5pve-kernel-5.15.108-1-pve: 5.15.108-1pve-kernel-5.15.5-1-pve: 5.15.5-1pve-kernel-5.4.128-1-pve: 5.4.128-2pve-kernel-5.4.106-1-pve: 5.4.106-1amd64-microcode: 3.20240820.1~deb12u1ceph: 19.2.0-pve2ceph-fuse: 19.2.0-pve2corosync: 3.1.7-pve3criu: 3.17.1-2glusterfs-client: 10.3-5ifupdown: not correctly installedifupdown2: 3.2.0-1+pmx11intel-microcode: 3.20240910.1~deb12u1ksm-control-daemon: 1.5-1libjs-extjs: 7.0.0-5libknet1: 1.28-pve1libproxmox-acme-perl: 1.5.1libproxmox-backup-qemu0: 1.4.1libproxmox-rs-perl: 0.3.4libpve-access-control: 8.2.0libpve-apiclient-perl: 3.3.2libpve-cluster-api-perl: 8.0.10libpve-cluster-perl: 8.0.10libpve-common-perl: 8.2.9libpve-guest-common-perl: 5.1.6libpve-http-server-perl: 5.1.2libpve-network-perl: 0.10.0libpve-rs-perl: 0.9.1libpve-storage-perl: 8.3.3libqb0: 1.0.5-1libspice-server1: 0.15.1-1lvm2: 2.03.16-2lxc-pve: 6.0.0-1lxcfs: 6.0.0-pve2novnc-pve: 1.5.0-1proxmox-backup-client: 3.3.2-1proxmox-backup-file-restore: 3.3.2-2proxmox-firewall: 0.6.0proxmox-kernel-helper: 8.1.0proxmox-mail-forward: 0.3.1proxmox-mini-journalreader: 1.4.0proxmox-offline-mirror-helper: 0.6.7proxmox-widget-toolkit: 4.3.3pve-cluster: 8.0.10pve-container: 5.2.3pve-docs: 8.3.1pve-edk2-firmware: 4.2023.08-4pve-esxi-import-tools: 0.7.2pve-firewall: 5.1.0pve-firmware: 3.14-2pve-ha-manager: 4.0.6pve-i18n: 3.3.2pve-qemu-kvm: 9.0.2-4pve-xtermjs: 5.3.0-3qemu-server: 8.3.3smartmontools: 7.3-pve1spiceterm: 3.3.0swtpm: 0.8.0+pve1vncterm: 1.8.0zfsutils-linux: 2.2.6-pve1
 
So I'm having the same issue in Kubernetes using rook-ceph, and I believe it is caused by this bug: https://tracker.ceph.com/issues/66948

The bug report implies that there are no ill effects and more logs being made in the mds will result in the trimming happening again.

There is a fix waiting backporting into squid, but it's probably a while away: https://github.com/ceph/ceph/pull/60838
Thanks for the links. I also have this problem. Seems like it will be fixed in 19.2.3. Did anyone try this version out and got it fixed? Mine is on 19.2.2

Edit:
Seems like it isn't in the repository at this moment

Code:
apt list -a ceph-common
Listing... Done
ceph-common/stable,now 19.2.2-pve1~bpo12+1 amd64 [installed]
ceph-common/stable 19.2.1-pve3 amd64
ceph-common/stable 19.2.1-pve2 amd64
ceph-common/stable 19.2.1-pve1 amd64
ceph-common/stable 19.2.0-pve2 amd64
ceph-common/stable 19.2.0-pve1 amd64
ceph-common/stable,stable-security 16.2.15+ds-0+deb12u1 amd64
 
Last edited:
I use Ceph in Kubernetes not in Proxmox, but I just updated to the v19.2.3 Docker Images and the problem has gone away.

My metadata pool is also one tenth of the size it used to be, which is nice.
 
Last edited: