[SOLVED] Ceph snaptrim causing perforamnce impact on whole Cluster since update


Active Member
Jun 22, 2019

I upgraded a Cluster right all the way from Proxmox 6.2/Ceph 14.x to Proxmox 8.0/Ceph 17.x (latest). Hardware is Epyc Servers, all flash / NVME. I can rule out Hardware issues. I can reproduce the issue as well.

All running fine so far, except that my whole system gehts slowed down when i delete snapshots. All osd processes shoot to 100% cpu utilisation. I read here and there that deleting all snapshots and then restarting all osd's fixes it.

Can someone confiirm this or give a more fine grained advise how to solve this?

An interessting Fact:

- This issues does not occur on Clusters that were "born" as 7.x / Pacific or Quincy
- The I/O impact is so hard, that workload can barely run, I/O is very laggy..

I will look into the mclock tuning thing.
Hm.. even with these params a snapshot delete brings i/o for client operations to nearly 0:

ceph tell osd.* injectargs "--osd_mclock_profile=custom"
ceph tell osd.* injectargs "--osd_mclock_scheduler_client_wgt=4"
ceph tell osd.* injectargs "--osd_mclock_scheduler_background_recovery_lim=100"
ceph tell osd.* injectargs "--osd_mclock_scheduler_background_recovery_res=100"

I guess i don't need to restart all osd?
Last edited:
Okay a report of my progress:

After trying to finetune on the OPS weights and limits... it just does not work for me. Going back to the old scheduler solved the thing for me and everything works like charme again. I know this will be deprecated in the future. I will continue trying to figure out the right values, but it'd be nice if things would work out of the Box.

Still - I am curious why this effect only seems to appear on Clusters, that have been born in 6.x/nautilus.
Last edited:
I think i am narrowing the problem down.

On a Production cluster I have the effect that also with the old WPQ scheduler I have a huge performance impact - not as bad, but still not funny - on a few osd's in the Cluster;:

osd commit_latency(ms) apply_latency(ms) 13 0 0 14 0 0 12 1 1 15 0 0 19 0 0 18 0 0 17 0 0 0 1 1 1 0 0 2 0 0 3 37 37 16 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 0 9 0 0 10 8 8 11 0 0

This is under normal load (No snaptrim or something iops heavy running). It's just osd.3 and osd.10 that seem suspect. These are also the osd's that spike CPU.

I digged into the telemetry and it confirms this:


The two blue lines on top are osd.3+osd.10 .. (SCale is logairthmic.)

This happened while snapshotting + exporting diff + removing snapshot.

I have the feeling this might be related to other compareable reports:


I also noticed an overall degraded read performance even if no snaptrim is running.

What you could try is to destroy and recreate these OSDs, one at a time, and check how they behave afterward. If they are in the same node, you could recreate them at the same time, since they won't be sharing any replicas.
I suspected this might be required and indeed:

Recreating osd.3 and osd.10 solved my whole issue. As it turns out here and there OSD's seem to be somehow corrupted when getting internal structures converted on the upgrade to pacific. Still - i couldn't detect anything wrong on the logfiles, only the suspicious read latency and cpu usage pointed to that.

I would advise everyone to make use of the Ceph Telemetry module. It was very helpful tracking this down.

Thanks for your help.
Last edited:
  • Like
Reactions: aaron


The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!