I've been searching around for weeks.. maybe months and so far with no positive results. I upgraded my 3 node cluster from proxmox 6 to 7 and ceph from nautilus to octopus to pacific. I also converted all OSDs to bluestore. My containers and VMs are all very slow with read/write which led me to checking:
which seems to be the culprit. All OSDs are SSDs. The ones with the high latency are each on a different node. I have tried deleting and recreating these OSDs, and it didn't help. I've found various posts about snaptrim running wild, but that didn't seem to be the case with my cluster, but for fun I set nosnaptrim, which didn't do anything either.
All three drives with this issue are consumer grade samsung 870 1TB. All the other drives are crucial brand SSDs (also consumer grade).
Any assistance anyone can offer would be greatly appreciated. I'm of course willing to replace the drives, but it seems strange the issue has only surfaced after the update, and I'd really like to understand the problem.
root@pmox1:~# ceph osd perf
osd commit_latency(ms) apply_latency(ms)
8 703 703
6 7 7
0 3 3
1 3 3
11 9 9
10 550 550
9 229 229
7 5 5
5 6 6
4 4 4
3 3 3
2 3 3
which seems to be the culprit. All OSDs are SSDs. The ones with the high latency are each on a different node. I have tried deleting and recreating these OSDs, and it didn't help. I've found various posts about snaptrim running wild, but that didn't seem to be the case with my cluster, but for fun I set nosnaptrim, which didn't do anything either.
All three drives with this issue are consumer grade samsung 870 1TB. All the other drives are crucial brand SSDs (also consumer grade).
Any assistance anyone can offer would be greatly appreciated. I'm of course willing to replace the drives, but it seems strange the issue has only surfaced after the update, and I'd really like to understand the problem.