Hi all,
We see the following output of ceph bench:
With regular intervals, the "cur MB/s" drops to zero. If meanwhile we ALSO run iperf, we can tell that the network is fuctioning perfectly: while ceph bench goes to zero, iperf continues at max speed. (10G ethernet)
So there is something slowing down ceph at regular intervals.
Anyone some clues what to look at?
This is on a three-node proxmox network, 65 Gig ram per server, journals on ssd (default proxmox config: 5GB per journal) connected through 10G ethernet. Each node has four 4TB disks installed, total of 12 osd's.
During the 0 MB/sec, there is NO increased cpu usage: it stays around 15 - 20% for the four ceph-osd processes.
Anyone with a suggestions where to look at?
MJ
We see the following output of ceph bench:
Code:
> root@ceph1:~# rados bench -p scbench 600 write --no-cleanup
> Maintaining 16 concurrent writes of 4194304 bytes for up to 600 seconds or 0 objects
> Object prefix: benchmark_data_pm1_36584
> sec Cur ops started finished avg MB/s cur MB/s last lat avg lat
> 0 0 0 0 0 0 - 0
> 1 16 124 108 431.899 432 0.138315 0.139077
> 2 16 237 221 441.928 452 0.169759 0.140138
> 3 16 351 335 446.598 456 0.105837 0.139844
> 4 16 466 450 449.938 460 0.140141 0.139716
> 5 16 569 553 442.337 412 0.025245 0.139328
> 6 16 634 618 411.943 260 0.0302609 0.147129
> 7 16 692 676 386.233 232 1.01843 0.15158
> 8 16 721 705 352.455 116 0.0224958 0.159924
> 9 16 721 705 313.293 0 - 0.159924
> 10 16 764 748 299.163 86 0.0629263 0.20961
> 11 16 869 853 310.144 420 0.0805086 0.204707
> 12 16 986 970 323.295 468 0.175718 0.196822
> 13 16 1100 1084 333.5 456 0.171172 0.19105
> 14 16 1153 1137 324.819 212 0.0468416 0.188643
> 15 16 1225 1209 322.363 288 0.0421159 0.195791
> 16 16 1236 1220 304.964 44 1.28629 0.195499
> 17 16 1236 1220 287.025 0 - 0.195499
> 18 16 1236 1220 271.079 0 - 0.195499
> 19 16 1324 1308 275.336 117.333 0.148679 0.231708
> 20 16 1436 1420 283.967 448 0.120878 0.224367
> 21 16 1552 1536 292.538 464 0.173587 0.218141
> 22 16 1662 1646 299.238 440 0.141544 0.212946
> 23 16 1720 1704 296.314 232 0.0273257 0.211416
> 24 16 1729 1713 285.467 36 0.0215821 0.211308
> 25 16 1729 1713 274.048 0 - 0.211308
> 26 16 1729 1713 263.508 0 - 0.211308
> 27 16 1787 1771 262.34 77.3333 0.0338129 0.241103
> 28 16 1836 1820 259.97 196 0.183042 0.245665
> 29 16 1949 1933 266.59 452 0.129397 0.239445
> 30 16 2058 2042 272.235 436 0.165108 0.234447
> 31 16 2159 2143 276.484 404 0.0466259 0.229704
> 32 16 2189 2173 271.594 120 0.0206958 0.231772
So there is something slowing down ceph at regular intervals.
Anyone some clues what to look at?
This is on a three-node proxmox network, 65 Gig ram per server, journals on ssd (default proxmox config: 5GB per journal) connected through 10G ethernet. Each node has four 4TB disks installed, total of 12 osd's.
During the 0 MB/sec, there is NO increased cpu usage: it stays around 15 - 20% for the four ceph-osd processes.
Anyone with a suggestions where to look at?
MJ
Last edited: