Another CEPH Performance Problem

30 iops is what i got with ceph bench....
30 iops times 7 (osds in each server) would be amazing....

removed one osd, wiped it and readded it. Turned out that there was another osd that has a latency of about 2000ms. Pulled that out the cluster, now waiting for recovery then i'll retest.
 
iops are the same, but the defect ssd was interrupting all vms, now the vms are much faster then before.

i think i have to check each ssd.



thank you.
Definitely something to keep more in mind. Thx @Craig St George for bringing this up.

If you consider the problem as solved, please mark the thread as solved by editing the first post and selecting the prefix in the dropdown menu next to the title :)
 
did you replaced the ssds to get the performance up again?

i'm still wondering why performance in vms is so poor. Tried to update the virtio drivers in the kernel, but didn't find any new version.
 
funny, i removed some of the ssds and added them right back into the ceph-cluster and suddenly we got performance inside our vms.

read-iops before 8-10 - now 400-500.... and backfilling is on the way with 36pgs
what is that???

feels like the system is setting the osds in hibernate and to access them slow the whole system down....
 
check ceph monitor logs to see any OSD is slow or blacklisted due to higher latency and make sure all nodes are NTP sync
 
in the end, it was a faulty ssd which had terrible iops and r/w performance

after removal, speed is back to normal
is there a log file for slow osds?
 
in the end, it was a faulty ssd which had terrible iops and r/w performance

after removal, speed is back to normal
is there a log file for slow osds?
you should see latency stats in proxmox gui a lot higher for this specific osd.
Not sure that they are log for "slow" osd, until it's really hanging. (you have logs with "slow ops", but it's really need to hang)
 
i'm still into it. updated the drivers inside the vms and rechecked.

Virtio 0.1.165 Drivers under windows server 2012r2
Code:
winsat disk -ran -write -drive c
Windows-Systembewertungstool
> Wird ausgeführt: Featureaufzählung ''
> Laufzeit 00:00:00.00
> Wird ausgeführt: Speicherbewertung '-ran -write -drive c'
> Laufzeit 00:00:44.75
> Dshow-Videocodierzeit                        0.00000 s
> Dshow-Videodecodierzeit                      0.00000 s
> Media Foundation-Decodierzeit                0.00000 s
> Disk  Random 16.0 Write                      0.64 MB/s
> Gesamtausführungszeit 00:00:49.42

winsat disk -ran -read -drive c
Windows-Systembewertungstool
> Wird ausgeführt: Featureaufzählung ''
> Laufzeit 00:00:00.00
> Wird ausgeführt: Speicherbewertung '-ran -read -drive c'
> Laufzeit 00:00:01.25
> Dshow-Videocodierzeit                        0.00000 s
> Dshow-Videodecodierzeit                      0.00000 s
> Media Foundation-Decodierzeit                0.00000 s
> Disk  Random 16.0 Read                       62.95 MB/s          6.7
> Gesamtausführungszeit 00:00:05.64

virtio 0.1.190 drivers under windows server 2019
Code:
winsat disk -ran -write -drive c
Windows-Systembewertungstool
> Wird ausgeführt: Featureaufzählung ''
> Laufzeit 00:00:00.00
> Wird ausgeführt: Speicherbewertung '-ran -write -drive c'
> Laufzeit 00:00:40.25
> Dshow-Videocodierzeit                        0.00000 s
> Dshow-Videodecodierzeit                      0.00000 s
> Media Foundation-Decodierzeit                0.00000 s
> Disk  Random 16.0 Write                      1. 45 MB/s
> Gesamtausführungszeit 00:00:49.83

winsat disk -ran -read -drive c
Windows-Systembewertungstool
> Wird ausgeführt: Featureaufzählung ''
> Laufzeit 00:00:00.00
> Wird ausgeführt: Speicherbewertung '-ran -read -drive c'
> Laufzeit 00:00:06.36
> Dshow-Videocodierzeit                        0.00000 s
> Dshow-Videodecodierzeit                      0.00000 s
> Media Foundation-Decodierzeit                0.00000 s
> Disk  Random 16.0 Read                       67.40 MB/s          6.8
> Gesamtausführungszeit 00:01:14.39

slightly better, but still not what i would expect for an full-flash backed storage with 59 osds.


found something that it may help to disable ceph cache for the ssds, is this a thing?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!