"Client failing to respond to cache pressure" error when copying files on to new CephFS share

MGSteve

New Member
Nov 1, 2023
15
0
1
Hi All,

Background
We currently have a CRM which is written in PHP, running on the LAMP stack. We were using HyperV, but I managed to talk the boss into getting a new platform, 3 PBE nodes and a PBS backup server :) - All physical servers, not cloud etc..

As part of the CRM we store millions of files (Emails, documents etc..), which is currently handled by a single VM on the Hyper-V setup acting as a fileserver via NFS. it works OK - file until now when the disk is running out of space and the HyperV Server itself has no more disk space available - one of the reasons for the re-platform (the Hyper-V servers are old too).

So, I was looking at fixing this by having a GlusterFS volume setup across the 3 nodes & backup server and the backup server just backs up the local mount of the gluster share. In testing (apart from the PBS as we didn't have it at that point), it worked fine and the new Webserver VM on Proxmox mounted the drive & worked great. However, I've just noticed that GlusterFS appears to be being dropped by RedHat, so I'm not sure if I want to rollout a solution that's about to be abandon-ware.

The Issue
So I looked at Ceph instead. We've already set that up for the 3 PVE nodes and it's working well. I noticed you can mount this as a Filesystem too, so I did that and mounted the share on one of the existing servers we have and started to copy over the document store - note, this has around 21m files in it and runs to around 1.2TB.

It copied 28GB of it via rsync and then stalled with a "Client failing to respond to cache pressure" error.

Is this just down to a config error? I'm more of a developer than a sysadmin, I just know my way around what I need to on Linux, usually Ubuntu is my distro of choice and I'm pretty much self-taught on this.

Any help would be appreciated and if I need to post any config files etc... please just ask :)

Thanks
 
What you need to understand here is that your 21m files are not just 1.2TB of storage space. One of the biggest topics is the inodes at this point, and you shouldn't forget the metadata in CephFS.

CephFS basically has a limit of 100k files. If you want more, see here: https://docs.ceph.com/en/latest/cephfs/dirfrags/

The error message you describe applies to a lack of cache for your MDS. But it should also be a "WRN" and not an "ERR". You would have to assign more cache to your MDS, see here: https://docs.ceph.com/en/latest/cephfs/cache-configuration/

Although you have the opportunity to make additional settings here, your plan to save 21m files there will be doomed to failure. CEPH itself also writes in its docs that you might be better off using object storage (https://docs.ceph.com/en/latest/cephfs/app-best-practices/). CEPH with RBD is a block storage, CephFS is a file system (you also have similar problems with inodes etc. with XFS or EXT4) and S3 and Swift would then be your object storage.

Basically, with many file systems you will come across limits when it comes to the amount of files. In my opinion, you need to think about a different type of storage so that your application works and scales in the long term. Otherwise, it could happen that you have a 100 TB volume and despite only occupying 1.2 TB you can no longer write files.

For example, you could consider creating several RBD volumes in CEPH. Here you could, for example, differentiate according to content, i.e. RBD A for emails and RBD B for documents. You could also try to break down all content into smaller sentences. Instead of storing the entire 10 years in storage, you do it every year or quarter. Then you won't have a 100 TB RBD image but 80x 1 TB RBD images.
 
  • Like
Reactions: MGSteve
What you need to understand here is that your 21m files are not just 1.2TB of storage space. One of the biggest topics is the inodes at this point, and you shouldn't forget the metadata in CephFS.

CephFS basically has a limit of 100k files. If you want more, see here: https://docs.ceph.com/en/latest/cephfs/dirfrags/

The error message you describe applies to a lack of cache for your MDS. But it should also be a "WRN" and not an "ERR". You would have to assign more cache to your MDS, see here: https://docs.ceph.com/en/latest/cephfs/cache-configuration/

Although you have the opportunity to make additional settings here, your plan to save 21m files there will be doomed to failure. CEPH itself also writes in its docs that you might be better off using object storage (https://docs.ceph.com/en/latest/cephfs/app-best-practices/). CEPH with RBD is a block storage, CephFS is a file system (you also have similar problems with inodes etc. with XFS or EXT4) and S3 and Swift would then be your object storage.

Basically, with many file systems you will come across limits when it comes to the amount of files. In my opinion, you need to think about a different type of storage so that your application works and scales in the long term. Otherwise, it could happen that you have a 100 TB volume and despite only occupying 1.2 TB you can no longer write files.

For example, you could consider creating several RBD volumes in CEPH. Here you could, for example, differentiate according to content, i.e. RBD A for emails and RBD B for documents. You could also try to break down all content into smaller sentences. Instead of storing the entire 10 years in storage, you do it every year or quarter. Then you won't have a 100 TB RBD image but 80x 1 TB RBD images.
Thanks for that, it makes sense. Yeah, I was thinking inodes, but on the one hand you read somethings that say that Ceph doesn't have inodes so you don't need to worry about them and then another says you need to increase the inodes_max value, which doesn't appear to even exist on the current version (well, quincy at least, that's what PVE installed by default). It's all very confusing for a ceph noob.

For example, you could consider creating several RBD volumes in CEPH. Here you could, for example, differentiate according to content, i.e. RBD A for emails and RBD B for documents. You could also try to break down all content into smaller sentences. Instead of storing the entire 10 years in storage, you do it every year or quarter. Then you won't have a 100 TB RBD image but 80x 1 TB RBD images.

Yup, but no easy way to back them up as it doesn't appear you can backup Ceph pools anyway (from what was posted in the other question I asked)

I'll just carry on running a fileshare VM, but with a larger drive! At least that can be easily backed up by PBS, the simplest option may be the best in this case, come to think of it until I can re-engineer the storage backend.

The VM storage would be backed by CEPH anyway, so would have 3 replicas in effect on the three nodes and I don't have to worry about gluster, which would be the only alternative.
 
Yeah, I was thinking inodes, but on the one hand you read somethings that say that Ceph doesn't have inodes so you don't need to worry about them and then another says you need to increase the inodes_max value, which doesn't appear to even exist on the current version (well, quincy at least, that's what PVE installed by default).
I must have expressed myself a bit unclearly. What I meant was that inodes can fundamentally play a bigger role than just the amount of storage space required. I didn't mean to say that this is also your problem with CephFS. Of course, CephFS also records the inodes, but in fact there is no limit.

For Example:
Code:
root@prox1 ~ # df -i /mnt/pve/cephfs
Filesystem                                                                               Inodes IUsed IFree IUse% Mounted on
172.16.11.2,172.16.11.2:6789,172.16.11.3,172.16.11.3:6789,172.16.11.4,172.16.11.4:6789:/  25705     -     -     - /mnt/pve/cephfs

The limits of CephFS lie more in the MDS cache and the fact that it is generally limited to around 100,000 files. You can increase this limitation slightly with various settings, but you shouldn't expect performance to increase or remain the same.
Alternatively, you can also create and use several CephFS. I just don't know if this actually offers you any added value, as it doesn't automatically give you new MDS services.

Yup, but no easy way to back them up as it doesn't appear you can backup Ceph pools anyway (from what was posted in the other question I asked)
Is it actually necessary? So is data from previous years actually changed or is it more of a “read-only” thing?
Ultimately, that's not a challenge. You just need something like a “backup gateway” that mounts the image and then backs it up. You can also pull it directly from CEPH and copy it to external storage or replicate it to a second CEPH cluster. There are a number of ways to do this. But it's not a CEPH problem, it's the nature of block storage.

I'll just carry on running a fileshare VM, but with a larger drive! At least that can be easily backed up by PBS, the simplest option may be the best in this case, come to think of it until I can re-engineer the storage backend.
I would recommend that you address the issue very promptly. Every file system has its limits somewhere and perhaps also weaknesses or bugs, so that it forgets / loses data or constantly crashes.
 
  • Like
Reactions: MGSteve

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!