While investigating OSD performance issues on a new ceph cluster, I did the same analysis on my "good" cluster. I discovered something interesting and fixing it may be the solution to my new cluster issue.
For the "good" cluster, I have three nearly identical servers. Each server has four OSDs, for 12 total. When I do a "ceph tell osd.x bench -f plain", I get the following:
Note that the first four OSDs in one server all are very slow. All nodes are connected via a 10GbE network and use the same HDD drives with a single SSD for the Bluestore DB in each server. There are 13 Proxmox nodes using this ceph cluster and each Proxmox node shows the same numbers above.
I am looking for help to figure out why the OSDs in the one server are so slow. I tried deleting osd.0 and recreating it, but it still shows the slow speed after recreation. I've been reading through documentation, but I'm hoping that someone may provide a pointer to get me to a solution faster.
Thanks in advance.
For the "good" cluster, I have three nearly identical servers. Each server has four OSDs, for 12 total. When I do a "ceph tell osd.x bench -f plain", I get the following:
Code:
osd.0:bench: wrote 1024 MB in blocks of 4096 kB in 12.109905 sec at 86588 kB/sec
osd.1:bench: wrote 1024 MB in blocks of 4096 kB in 8.501180 sec at 120 MB/sec
osd.2:bench: wrote 1024 MB in blocks of 4096 kB in 11.384842 sec at 92102 kB/sec
osd.3:bench: wrote 1024 MB in blocks of 4096 kB in 8.695865 sec at 117 MB/sec
osd.4:bench: wrote 1024 MB in blocks of 4096 kB in 0.753332 sec at 1359 MB/sec
osd.5:bench: wrote 1024 MB in blocks of 4096 kB in 1.712017 sec at 598 MB/sec
osd.6:bench: wrote 1024 MB in blocks of 4096 kB in 2.815910 sec at 363 MB/sec
osd.7:bench: wrote 1024 MB in blocks of 4096 kB in 1.698323 sec at 602 MB/sec
osd.8:bench: wrote 1024 MB in blocks of 4096 kB in 0.283092 sec at 3617 MB/sec
osd.9:bench: wrote 1024 MB in blocks of 4096 kB in 2.606005 sec at 392 MB/sec
osd.10:bench: wrote 1024 MB in blocks of 4096 kB in 2.652026 sec at 386 MB/sec
osd.11:bench: wrote 1024 MB in blocks of 4096 kB in 2.468191 sec at 414 MB/sec
Note that the first four OSDs in one server all are very slow. All nodes are connected via a 10GbE network and use the same HDD drives with a single SSD for the Bluestore DB in each server. There are 13 Proxmox nodes using this ceph cluster and each Proxmox node shows the same numbers above.
I am looking for help to figure out why the OSDs in the one server are so slow. I tried deleting osd.0 and recreating it, but it still shows the slow speed after recreation. I've been reading through documentation, but I'm hoping that someone may provide a pointer to get me to a solution faster.
Thanks in advance.