ceph: I got "clients failing to respond to cache pressure" after proxmox 9

cola16

Member
Feb 2, 2024
51
2
8
Only one filesystem is having a problem.

This filesystem has a very large capacity and a large number of files, and it is composed of HDD and erasure code, which is very slow.
(Of course, metadata and data are composed of ssd and replicate.)

It is very rare to read or write files.

https://docs.ceph.com/en/reef/cephf...failing-to-respond-to-cache-pressure-messages

I checked the following configuration after looking at the above document.

1. Using the top command, I checked the memory usage of ceph-mds, but it is using about 400MiB.

2. There is not much difference between setting mds_max_caps_per_client to 500K and setting 10M.

3. It's the most suspicious because it's a file system that's too little used.

The client is using ceph fuse, not ceph kernel.
I don't know why, but if I use ceph kernel, the machine will continue to stop, so I have to use fuse.