Ceph: sudden slow ops, freezes, and slow-downs

Hey guys, I am also getting this issue, however, I am not using CephFS and this is based on a new cluster setup with nvme. I posted a question on the forums as well: https://forum.proxmox.com/threads/ceph-slow-ops.121033/

Just to summarize: new set up, added OSDs, create pools and I get slow ops all the over the place and no good pgs at all.

Tried downgrading the kernel as suggested but to no avail.

Ceph is unusable on my 3 node instance so I switched back to Zfs for now.

Not sure if any of you guys have a solution for this issue?

Thanks!
My Issue was related to CephFS and probably upgrading to higher Ceph which I did back then and a huge number of small files on CephFS. The solution for me was to throw away CephFS and since then everything works with a charm. So, if you are not using CephFS there must be some other issue in your setup.
 
Hi,

I'm using a 4-node cluster with Ceph (PVE 7.3.4 and Ceph 17.2.5) and 12 HDD OSD (3 OSD per node).

The Ceph network is a dedicated 10 GbE network for this 4-node cluster.

More or less one year ago, with previous versions, CephRBD and CephCephFS were working properly : fast and usable.

From now, Ceph is so slow and unusable : slow ops, freezes, and slow-downs...

For example :
  • when I add new HDD OSD, the recovery/rebalance speed is between 200-250 MiBs/s which is, in my opinion, a good result.
  • when I would like to backup a VM (<10 GB) from a node to CephFS, it takes many hours...
  • when I create a 1 GiB file on CephFS :
dd if=/dev/zero of=test.img bs=1M count=1024
I've warnings/errors such has : MDS report slow metadata IOs, osd.(different numbers) slow slow ops...​
Time to create a 1 GiB file is between 2 seconds and few minutes...​

Any suggestions or tips are welcome

Regards
 
Last edited:
My Issue was related to CephFS and probably upgrading to higher Ceph which I did back then and a huge number of small files on CephFS. The solution for me was to throw away CephFS and since then everything works with a charm. So, if you are not using CephFS there must be some other issue in your setup.
Yep I read that many of the issues are caused by CephFS, so my issue was kinda isolated. I fixed it by fixing my botched 10GBE Cisco DAC Module.

It worked like a charm after that.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!