well, I "solved" my cephx problem (https://forum.proxmox.com/threads/new-install-cannot-create-ceph-osds-bc-of-keyring-error.119375/) by disabling cephx
now, however, I always have clock skew being complained about on the Ceph dashboard (ex: mon.ganges clock skew 0.131502s > max 0.05s (latency 0.00491829s), mon.orinoco clock skew 0.272502s > max 0.05s (latency 0.00142334s)). that is with Chrony on all 4 hosts syncing to a VM on one host running htpdate (ntp is blocked outgoing on my network). as far as I know there isn't anything "magic" about doing it on a VM that should break anything.
I also have 5 pgs apparently permanently in the "active+clean+remapped" state. `ceph pg repair` does nothing to fix them.
perhaps as a result, I get terrible benchmark results:
before reimaging this cluster, it was 3 instead of 4 nodes and the 4th has added an additional NVMe device to the pool. there's now a total of 10 NVMe devices in the pool. this is over 10G networking, all NVMe. my sense is that I should be getting much better numbers than this; I could swear to God it was able to do IOPS well into the thousands and over 1Gbps of sequential write.
any pointers where to start? TIA.
now, however, I always have clock skew being complained about on the Ceph dashboard (ex: mon.ganges clock skew 0.131502s > max 0.05s (latency 0.00491829s), mon.orinoco clock skew 0.272502s > max 0.05s (latency 0.00142334s)). that is with Chrony on all 4 hosts syncing to a VM on one host running htpdate (ntp is blocked outgoing on my network). as far as I know there isn't anything "magic" about doing it on a VM that should break anything.
I also have 5 pgs apparently permanently in the "active+clean+remapped" state. `ceph pg repair` does nothing to fix them.
perhaps as a result, I get terrible benchmark results:
Code:
root@ganges:/var/log# rados bench -p fastwrx 10 write
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_ganges_41218
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 135 119 475.976 476 0.0429534 0.128449
2 16 266 250 499.965 524 0.0186067 0.119404
3 16 380 364 485.296 456 0.0263719 0.126233
4 16 503 487 486.961 492 0.284686 0.128938
5 16 625 609 487.161 488 0.119083 0.129185
6 16 732 716 477.295 428 0.261727 0.131228
7 16 862 846 483.391 520 0.0398449 0.12845
8 16 980 964 481.962 472 0.0338546 0.130552
9 16 1109 1093 485.739 516 0.0250924 0.129385
10 16 1256 1240 495.961 588 0.0500059 0.128721
Total time run: 10.1216
Total writes made: 1256
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 496.364
Stddev Bandwidth: 44.1009
Max bandwidth (MB/sec): 588
Min bandwidth (MB/sec): 428
Average IOPS: 124
Stddev IOPS: 11.0252
Max IOPS: 147
Min IOPS: 107
Average Latency(s): 0.128113
Stddev Latency(s): 0.101359
Max latency(s): 0.556049
Min latency(s): 0.0133952
Cleaning up (deleting benchmark objects)
Removed 1256 objects
Clean up completed and total clean up time :0.542294
root@ganges:/var/log#
before reimaging this cluster, it was 3 instead of 4 nodes and the 4th has added an additional NVMe device to the pool. there's now a total of 10 NVMe devices in the pool. this is over 10G networking, all NVMe. my sense is that I should be getting much better numbers than this; I could swear to God it was able to do IOPS well into the thousands and over 1Gbps of sequential write.
any pointers where to start? TIA.