Proxmox CEPH performance

MoreDakka · Nov 16, 2021

As if this subject hasn't been brought around enough I thought I would open a new one cause I'm a bit confused.
We have two clusters

==| Cluster 1, dev |==
Proxmox - Virtual Environment 5.4-3
4x Dell r710
Between 72-128 Gb ram each
H700 - sda - 2x500gb spinning, 7200rpm RAID1
H700 - sdb - 1x2460 Kingston v300 SSD RAID0
2x1Gb bond0 - Management
2x1Gb bond1 - VM Networks
1x10g NIC for CEPH network

==| Custer 2, eventual production |==
Proxmox - Virtual Environment 7.0-14+1
Dell c6220 with 4 nodes, each node has:
2xE5-2650v0
128Gb RAM
sda - adata su800 250Gb
sdb - Samsung 870 QVO
2x1Gb bond0 - Management and VM Network
2x10Gb bond1 - HA, SAN and Backup Network
2x40Gb infinniband bond2 (active-backup) - CEPH Network

When I built Cluster 1 it was quite easy, everything flowed nicely together and performed very well for having low end consumer SSDs. Cluster 2 has been more of a pain thanks to Infinniband and me rebuilding the cluster a couple times leaving old keys and OSDs laying around. However it's all cleaned up and OSD are created, started running some benchmarks. I've been reading the QVOs aren't great for CEPH, right now it's all I can afford. Comparing QVO to v300 I would hope that the newer generation of SSDs would perform at least a bit better than the old gen. However my benchmarks are giving a much more terrible story. The reads are wicked awesome at 1.6Gb but the writes are 50MBish.

Cluster 1:

Code:

root@prox01:~# ceph tell osd.0 bench
{
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 2.773273,
    "bytes_per_sec": 387174920.724119,
    "iops": 92.309694
}
root@prox01:~# ceph tell osd.1 bench
{
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 2.171328,
    "bytes_per_sec": 494509353.029995,
    "iops": 117.900217
}
root@prox01:~# ceph tell osd.2 bench
{
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 2.548962,
    "bytes_per_sec": 421246729.674416,
    "iops": 100.433047
}
root@prox01:~# ceph tell osd.3 bench
{
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 2.790667,
    "bytes_per_sec": 384761758.709077,
    "iops": 91.734352
}
root@prox01:~#


root@prox01:~# ceph -s
  cluster:
    id:     d12f1ef0-3a28-4fe0-9df1-b1913c03a374
    health: HEALTH_OK

  services:
    mon: 4 daemons, quorum prox01,prox02,prox03,prox04
    mgr: prox01(active), standbys: prox03, prox02, prox04
    mds: cephfs-1/1/1 up  {0=prox02=up:active}, 3 up:standby
    osd: 4 osds: 4 up, 4 in

  data:
    pools:   2 pools, 160 pgs
    objects: 60.97k objects, 233GiB
    usage:   644GiB used, 248GiB / 892GiB avail
    pgs:     160 active+clean

  io:
    client:   12.5KiB/s wr, 0op/s rd, 2op/s wr


root@prox01:~# rados bench -p cephfs_data 60 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 60 seconds or 0 objects
Object prefix: benchmark_data_prox01_2303881
Total time run:         60.135353
Total writes made:      6612
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     439.808
Stddev Bandwidth:       108.835
Max bandwidth (MB/sec): 592
Min bandwidth (MB/sec): 204
Average IOPS:           109
Stddev IOPS:            27
Max IOPS:               148
Min IOPS:               51
Average Latency(s):     0.145508
Stddev Latency(s):      0.124458
Max latency(s):         1.16326
Min latency(s):         0.0258368

root@prox01:~# rados -p cephfs_data bench 60 rand
Total time run:       60.066568
Total reads made:     20313
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   1352.7
Average IOPS:         338
Stddev IOPS:          13
Max IOPS:             363
Min IOPS:             305
Average Latency(s):   0.046073
Max latency(s):       0.953591
Min latency(s):       0.00309316

root@prox01:~# hwinfo --disk
  SysFS ID: /class/block/sdb
  Model: "DELL PERC H700" (Only one drive on here, Kingston v300 240Gb in RAID0)

root@prox01:~# lsblk
NAME                 MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                    8:0    0 418.6G  0 disk
├─sda1                 8:1    0  1007K  0 part
├─sda2                 8:2    0   512M  0 part
└─sda3                 8:3    0 418.1G  0 part
  ├─pve-swap         253:0    0     8G  0 lvm  [SWAP]
  ├─pve-root         253:1    0    96G  0 lvm  /
  ├─pve-data_tmeta   253:2    0     3G  0 lvm
  │ └─pve-data-tpool 253:4    0 292.2G  0 lvm
  │   └─pve-data     253:5    0 292.2G  0 lvm
  └─pve-data_tdata   253:3    0 292.2G  0 lvm
    └─pve-data-tpool 253:4    0 292.2G  0 lvm
      └─pve-data     253:5    0 292.2G  0 lvm
sdb                    8:16   0   223G  0 disk
├─sdb1                 8:17   0   100M  0 part /var/lib/ceph/osd/ceph-0
└─sdb2                 8:18   0 222.9G  0 part
sr0                   11:0    1  1024M  0 rom
sr1                   11:1    1  1024M  0 rom

root@prox01:~# pveversion -v
proxmox-ve: 5.4-1 (running kernel: 4.15.18-12-pve)
pve-manager: 5.4-3 (running version: 5.4-3/0a6eaa62)
pve-kernel-4.15: 5.3-3
pve-kernel-4.15.18-12-pve: 4.15.18-35
ceph: 12.2.13-pve1~bpo9
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-50
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-13
libpve-storage-perl: 5.0-41
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-3
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-25
pve-cluster: 5.0-36
pve-container: 2.0-37
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-20
pve-firmware: 2.0-6
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 2.12.1-3
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-50
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2

root@prox01:~# iperf3 -c 10.10.1.163
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  10.7 GBytes  9.21 Gbits/sec    0             sender
[  4]   0.00-10.00  sec  10.7 GBytes  9.21 Gbits/sec                  receiver

Here is Cluster 2:

Code:

root@pve1-cpu4:~# ceph tell osd.0 bench
{
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 2.123248142,
    "bytes_per_sec": 505707177.01821965,
    "iops": 120.56998658614627
}
root@pve1-cpu4:~# ceph tell osd.1 cache drop
root@pve1-cpu4:~# ceph tell osd.1 bench
{
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 2.1475737160000001,
    "bytes_per_sec": 499979030.28908181,
    "iops": 119.20428998209997
}
root@pve1-cpu4:~# ceph tell osd.2 cache drop
root@pve1-cpu4:~# ceph tell osd.2 bench
{
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 2.1261140599999999,
    "bytes_per_sec": 505025503.66465288,
    "iops": 120.40746299377749
}
root@pve1-cpu4:~# ceph tell osd.3 cache drop
root@pve1-cpu4:~# ceph tell osd.3 bench
{
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 2.1270281309999999,
    "bytes_per_sec": 504808473.54622978,
    "iops": 120.35571898132081
}
root@pve1-cpu4:~#


root@pve1-cpu4:~# ceph -s
  cluster:
    id:     84681487-a5e1-431f-8741-95694c39d8ac
    health: HEALTH_OK

  services:
    mon: 4 daemons, quorum pve1-cpu1,pve1-cpu2,pve1-cpu3,pve1-cpu4 (age 2h)
    mgr: pve1-cpu1(active, since 21h), standbys: pve1-cpu3, pve1-cpu4, pve1-cpu2
    mds: 1/1 daemons up, 3 standby
    osd: 4 osds: 4 up (since 112m), 4 in (since 112m)

  data:
    volumes: 1/1 healthy
    pools:   3 pools, 192 pgs
    objects: 2.23k objects, 8.6 GiB
    usage:   26 GiB used, 7.3 TiB / 7.3 TiB avail
    pgs:     192 active+clean

root@pve1-cpu4:~# rados bench -p cephfs_data 60 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 60 seconds or 0 objects
Object prefix: benchmark_data_pve1-cpu4_182670
Total time run:         61.5063
Total writes made:      663
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     43.1175
Stddev Bandwidth:       13.4592
Max bandwidth (MB/sec): 80
Min bandwidth (MB/sec): 12
Average IOPS:           10
Stddev IOPS:            3.3648
Max IOPS:               20
Min IOPS:               3
Average Latency(s):     1.46936
Stddev Latency(s):      0.867992
Max latency(s):         4.5161
Min latency(s):         0.0314636

root@pve1-cpu4:~# rados -p cephfs_data bench 60 rand
hints = 1
Total time run:       64.3072
Total reads made:     957
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   59.5268
Average IOPS:         14
Stddev IOPS:          8.94126
Max IOPS:             47
Min IOPS:             3
Average Latency(s):   1.03755
Max latency(s):       4.97272
Min latency(s):       0.00367236

root@pve1-cpu4:~# hwinfo --disk
  SysFS ID: /class/block/sdb
  Model: "Samsung SSD 870"

root@pve1-cpu4:~# lsblk
NAME                                                                                                  MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                                                                                                     8:0    0 238.5G  0 disk
├─sda1                                                                                                  8:1    0  1007K  0 part
├─sda2                                                                                                  8:2    0   512M  0 part
└─sda3                                                                                                  8:3    0   238G  0 part
  ├─pve-swap                                                                                          253:0    0     8G  0 lvm  [SWAP]
  ├─pve-root                                                                                          253:1    0  59.3G  0 lvm  /
  ├─pve-data_tmeta                                                                                    253:2    0   1.6G  0 lvm
  │ └─pve-data                                                                                        253:4    0 151.6G  0 lvm
  └─pve-data_tdata                                                                                    253:3    0 151.6G  0 lvm
    └─pve-data                                                                                        253:4    0 151.6G  0 lvm
sdb                                                                                                     8:16   0   1.8T  0 disk
└─ceph--c40d9a18--8547--4e12--84ee--f9e05c18cb46-osd--block--4e3ef8be--e516--44da--acf1--be0fab77b1da 253:5    0   1.8T  0 lvm

root@pve1-cpu4:~# pveversion -v
proxmox-ve: 7.0-2 (running kernel: 5.11.22-7-pve)
pve-manager: 7.0-14+1 (running version: 7.0-14+1/08975a4c)
pve-kernel-helper: 7.1-4
pve-kernel-5.11: 7.0-10
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-5-pve: 5.11.22-10
pve-kernel-5.11.22-4-pve: 5.11.22-9
ceph: 16.2.6-pve2
ceph-fuse: 16.2.6-pve2
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-6
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-12
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-13
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.13-1
proxmox-backup-file-restore: 2.0.13-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.1-1
pve-docs: 7.0-5
pve-edk2-firmware: 3.20210831-1
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.5-1
pve-qemu-kvm: 6.1.0-1
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-18
smartmontools: 7.2-1
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3


root@pve1-cpu1:~# iperf3 -c 192.168.1.84 -T s1
s1:  - - - - - - - - - - - - - - - - - - - - - - - - -
s1:  [ ID] Interval           Transfer     Bitrate         Retr
s1:  [  5]   0.00-10.00  sec  25.6 GBytes  22.0 Gbits/sec    0             sender
s1:  [  5]   0.00-10.00  sec  25.6 GBytes  22.0 Gbits/sec                  receiver

Any ideas on why the write is so terrible on the QVO compared to the v300s?

Thanks!

Felix. · Nov 16, 2021

MoreDakka said:
Any ideas on why the write is so terrible on the QVO compared to the v300s?

The v300 is MLC NAND while the QVO are QLC.
MLC is way more durable and provides better write performance than QLC NAND in general.

MoreDakka · Nov 16, 2021

Felix,
Thanks for the fast response.

I'm surprised that it's that much of a difference between the two with the same amount of drives. about 1/6th of the performance, is that right or is there something I have configured wrong?

If I increase the amount of OSDs, even if they are QVOs, the speed should increase as well. I'm assuming that's correct?
Or would it be better to replace the QVO with SU800s as they are TLC, kinda the compromise between QLC and MLC?

Felix. · Nov 17, 2021

MoreDakka said:
I'm surprised that it's that much of a difference between the two with the same amount of drives. about 1/6th of the performance, is that right or is there something I have configured wrong?

Other users and proxmox staff could provide a more educated answer on this, I think.
But you can read this in many threads here and I also made the personal experience that there is a very noticeable difference between QLC and TLC/MLC drives.
For usecases like ZFS and Ceph there is also that general discussion about Consumer vs Enterprise/Datacenter drives, which also makes a huge difference in terms of durability and performance.

MoreDakka said:
If I increase the amount of OSDs, even if they are QVOs, the speed should increase as well. I'm assuming that's correct?
Or would it be better to replace the QVO with SU800s as they are TLC, kinda the compromise between QLC and MLC?

If you don't plan to deploy lots of OSDs (like 10-20 or more per host) to compensate the performance issue, you'll most probably get best results buying some enterprise-grade drives.

What is your budget?
Even used enterprise-grade drives are still in great condition most of the time, as most companys just don't have that super write intensive loads to really burnout modern enterprise ssds.

MoreDakka · Nov 17, 2021

Budget. Ha. I have to rub two pennies together to get a dime around here. I was lucky to get the QVOs. But looking at it now I should have gotten really any TLC drive to work with the writes we are going to need in the future...
The plan was, since we have 4 nodes get 4 drives at a time (up to 20 total) of low cost drives to build a decent cluster. If a consumer drive fails not a big deal. However we need to have some decent write speeds

I found some older SSDs that are MLC drives. Samsung 845DC 960Gb. They were in a cluster that we got from an old client. Put these in the CEPH cluster:

--Write--

Code:

root@pve1-cpu4:~# rados bench -p cephfs_data 60 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 60 seconds or 0 objects
Object prefix: benchmark_data_pve1-cpu4_9337
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16        78        62   247.965       248    0.451433    0.255291
    2      16       122       106   211.971       176    0.434949    0.287831
    3      16       170       154   205.303       192    0.474022    0.296527
    4      16       213       197   196.972       172    0.213007    0.309513
    5      16       258       242   193.571       180     0.23517    0.312751
    6      16       310       294   195.972       208   0.0903562    0.312546
    7      16       346       330   188.544       144    0.242184    0.319971
    8      16       393       377   188.472       188    0.087689    0.323024
    9      16       453       437   194.194       240    0.178694    0.327717
   10      16       498       482   192.772       180    0.235404     0.33123
   11      16       540       524   190.517       168    0.462258    0.330741
   12      16       586       570   189.972       184     0.24902    0.331716
   13      16       631       615   189.203       180    0.243185    0.333147
   14      16       673       657   187.686       168    0.231272    0.335648
   15      16       714       698   186.105       164   0.0599265    0.337422
   16      16       739       723   180.722       100   0.0882526    0.338128
   17      16       755       739   173.855        64    0.811421    0.358055
   18      16       774       758   168.418        76     1.09968    0.372287
   19      16       808       792   166.711       136    0.779239    0.381968
2021-11-17T14:17:15.166464-0700 min lat: 0.0255776 max lat: 1.38393 avg lat: 0.382384
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
   20      16       821       805   160.975        52    0.149098    0.382384
   21      16       841       825   157.118        80      0.5007    0.394908
   22      16       860       844   153.431        76    0.387724    0.403118
   23      16       880       864   150.238        80    0.437605    0.419872
   24      16       896       880   146.644        64    0.924659    0.434907
   25      16       905       889   142.218        36   0.0323117     0.43292
   26      16       928       912   140.286        92    0.049305    0.445693
   27      16       949       933   138.201        84    0.480636    0.456318
   28      16       966       950   135.693        68     1.35134    0.468287
   29      16       988       972   134.048        88    0.361723     0.47389
   30      16       996       980   130.646        32    0.197151      0.4738
   31      16      1017      1001   129.141        84    0.515753    0.488358
   32      16      1033      1017   127.105        64    0.771147    0.498338
   33      16      1052      1036   125.556        76    0.866866    0.507729
   34      16      1056      1040   122.334        16    0.583102    0.508014
   35      16      1074      1058   120.895        72    0.446162    0.516517
   36      16      1090      1074   119.315        64       1.654    0.532593
   37      16      1095      1079    116.63        20    0.447553    0.532195
   38      16      1119      1103   116.087        96    0.476802    0.545049
   39      16      1136      1120   114.854        68      1.4115    0.553531
2021-11-17T14:17:35.169683-0700 min lat: 0.0255776 max lat: 1.65703 avg lat: 0.559705
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
   40      16      1157      1141   114.082        84     1.16915    0.559705
   41      16      1160      1144   111.592        12    0.041678    0.558873
   42      16      1176      1160   110.459        64     1.55913    0.571997
   43      16      1197      1181   109.843        84     1.20605    0.578131
   44      16      1200      1184   107.619        12   0.0440399    0.577439
   45      16      1219      1203   106.916        76    0.516058    0.589911
   46      16      1237      1221   106.157        72    0.457447    0.594328
   47      16      1253      1237    105.26        64      1.0274     0.60511
   48      16      1253      1237   103.067         0           -     0.60511
   49      16      1274      1258   102.678        42    0.113966     0.61704
   50      16      1274      1258   100.624         0           -     0.61704
   51      16      1293      1277   100.141        38    0.318497    0.630156
   52      16      1310      1294   99.5225        68     0.32546    0.638228
   53      16      1310      1294   97.6447         0           -    0.638228
   54      16      1328      1312   97.1697        36   0.0395871    0.647939
   55      16      1351      1335   97.0754        92     0.14881    0.651666
   56      16      1368      1352    96.556        68    0.376653    0.656945
   57      16      1384      1368   95.9847        64     1.08257    0.661615
   58      16      1386      1370   94.4676         8    0.491302    0.661367
   59      16      1405      1389   94.1544        76    0.365021    0.672164
2021-11-17T14:17:55.173011-0700 min lat: 0.0255776 max lat: 1.82977 avg lat: 0.679736
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
   60      16      1421      1405   93.6516        64     1.40858    0.679736

--Read--

Code:

Bandwidth (MB/sec):   921.582
Average IOPS:         230
Stddev IOPS:          14.9803
Max IOPS:             255
Min IOPS:             195
Average Latency(s):   0.0686646
Max latency(s):       0.440302
Min latency(s):       0.00424398

--fio--
iops : min= 8756, max=16070, avg=15419.48, stdev=903.25, samples=119

Run status group 0 (all jobs):
WRITE: bw=60.2MiB/s (63.1MB/s), 60.2MiB/s-60.2MiB/s (63.1MB/s-63.1MB/s), io=3613MiB (3788MB), run=60001-60001msec

Is that to be expected from an MLC drive? This one has about 10% wear according to Proxmox SMART sensors. Should we be looking then for TLC drives? I found it really interesting, it looks like the cache was all eaten up in the ceph bench write test, then it slowed to almost the same as the QLC drives. I'm going to mix the drives into one ceph test and see what speeds show up.

MoreDakka · Nov 17, 2021

Code:

root@pve1-cpu4:~# rados bench -p cephfs_data 60 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 60 seconds or 0 objects
Object prefix: benchmark_data_pve1-cpu4_20932
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16       161       145   579.924       580   0.0318865    0.101586
    2      16       297       281   561.929       544   0.0717412     0.10997
    3      16       460       444   591.929       652    0.145967     0.10662
    4      16       620       604    603.93       640   0.0696718    0.105175
    5      16       767       751   600.729       588    0.104872    0.105577
    6      16       932       916   610.594       660   0.0522979    0.104389
    7      16      1073      1057   603.927       564   0.0588275    0.104568
    8      16      1225      1209   604.424       608   0.0890162    0.105219
    9      16      1348      1332   591.923       492   0.0236679      0.1059
   10      16      1492      1476   590.322       576     0.14115    0.107347
   11      16      1545      1529   555.926       212   0.0738511    0.114277
   12      16      1594      1578   525.929       196    0.889112    0.121634
   13      16      1629      1613   496.239       140   0.0457992    0.120535
   14      16      1679      1663   475.076       200   0.0213725    0.127972
   15      16      1715      1699   453.002       144   0.0224597    0.134818
   16      16      1765      1749   437.189       200   0.0394326    0.144323
   17      16      1794      1778   418.294       116   0.0394416    0.152248
   18      16      1860      1844    409.72       264   0.0669887    0.154702
   19      16      1903      1887   397.207       172    0.123935    0.160252
2021-11-17T14:57:40.980455-0700 min lat: 0.019712 max lat: 1.3878 avg lat: 0.165038
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
   20      16      1947      1931   386.146       176    0.087218    0.165038
   21      16      1981      1965   374.232       136   0.0189958    0.163454
   22      16      2028      2012   365.765       188   0.0240041     0.17048
   23      16      2058      2042   355.079       120   0.0192232    0.176382
   24      16      2111      2095   349.115       212   0.0280052     0.17965
   25      16      2178      2162    345.87       268   0.0308782    0.183198
   26      16      2208      2192   337.181       120   0.0280145    0.186849
   27      16      2253      2237   331.359       180   0.0623303    0.192138
   28      16      2256      2240   319.953        12   0.0322703     0.19195
   29      16      2321      2305   317.884       260   0.0239512    0.198332
   30      16      2348      2332   310.887       108   0.0383736    0.205402
   31      16      2394      2378   306.793       184    0.932655    0.208322
   32      16      2458      2442   305.205       256     0.02089    0.208523
   33      16      2498      2482   300.804       160    0.285432    0.211689
   34      16      2559      2543   299.133       244   0.0732398    0.213233
   35      16      2600      2584   295.271       164    0.207141    0.216253
   36      16      2638      2622   291.291       152    0.748943    0.219319
   37      16      2676      2660   287.526       152   0.0299627    0.217184
   38      16      2733      2717   285.958       228   0.0224088    0.219346
   39      16      2787      2771   284.164       216   0.0370704    0.223649
2021-11-17T14:58:00.983415-0700 min lat: 0.0178192 max lat: 1.58033 avg lat: 0.226373
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
   40      16      2827      2811   281.059       160    0.027243    0.226373
   41      16      2860      2844   277.424       132    0.061374    0.230095
   42      16      2868      2852    271.58        32   0.0352846    0.229584
   43      16      2908      2892   268.984       160   0.0706435    0.236521
   44      16      2951      2935    266.78       172   0.0291031    0.239537
   45      16      2967      2951   262.273        64    0.018766    0.238622
   46      16      3034      3018   262.397       268   0.0789816    0.243514
   47      16      3088      3072   261.409       216   0.0342701    0.243912
   48      16      3095      3079   256.546        28   0.0263619    0.243452
   49      16      3166      3150   257.106       284   0.0235772    0.245072
   50      16      3232      3216   257.243       264   0.0206852    0.246304
   51      16      3281      3265   256.041       196   0.0631038    0.248913
   52      16      3331      3315   254.963       200    0.102478    0.250433
   53      16      3397      3381   255.133       264   0.0830442    0.250273
   54      16      3434      3418   253.149       148    0.133276    0.252419
   55      16      3453      3437   249.928        76   0.0198236    0.251304
   56      16      3512      3496   249.678       236   0.0300526    0.253173
   57      16      3560      3544   248.666       192    0.132158    0.256132
   58      16      3626      3610    248.93       264   0.0270589     0.25546
   59      16      3683      3667   248.574       228   0.0358595    0.255605
2021-11-17T14:58:20.986376-0700 min lat: 0.0178192 max lat: 1.7879 avg lat: 0.259127
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
   60      16      3706      3690   245.964        92   0.0433483    0.259127
Total time run:         60.7734
Total writes made:      3706
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     243.922
Stddev Bandwidth:       168.007
Max bandwidth (MB/sec): 660
Min bandwidth (MB/sec): 12
Average IOPS:           60
Stddev IOPS:            42.0018
Max IOPS:               165
Min IOPS:               3
Average Latency(s):     0.26218
Stddev Latency(s):      0.368674
Max latency(s):         1.7879
Min latency(s):         0.0178192

So that's 4x 870QVO2T and 4x 845DCEVO960Gb
It's really interesting watching the numbers drop off. Also the latency for some of the 845DC drives increased from about 40ms -> 900+ms One drive in particular. Is that a sign of a drive getting bad or that it needs a garbage cleanup or whatever I can do to SSDs to get them faster?

Put that drive in down/out mode, much more stable writes, not as fast at the beginning but better average:

Code:

root@pve1-cpu4:~# rados bench -p cephfs_data 60 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 60 seconds or 0 objects
Object prefix: benchmark_data_pve1-cpu4_23392
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16        89        73   291.953       292   0.0444918    0.164023
    2      16       164       148   295.952       300   0.0658814     0.20252
    3      16       223       207   275.959       236    0.483888    0.229438
    4      16       311       295   294.954       352   0.0384802    0.203206
    5      16       389       373   298.356       312    0.109076    0.211592
    6      16       493       477   317.954       416    0.511331    0.199604
    7      16       573       557   318.239       320    0.025522    0.193761
    8      16       665       649   324.452       368   0.0211037    0.193435
    9      16       770       754   335.061       420   0.0256059    0.185377
   10      16       859       843   337.151       356   0.0196613    0.186276
   11      16       926       910   330.862       268    0.463413    0.193092
   12      16      1030      1014   337.952       416   0.0531977    0.187079
   13      16      1094      1078   331.646       256    0.262194    0.192796
   14      16      1158      1142   326.239       256   0.0214019    0.192801
   15      16      1222      1206   321.554       256    0.461251    0.198232
   16      16      1302      1286   321.455       320   0.0478179    0.196461
   17      16      1380      1364   320.896       312   0.0875434    0.198132
   18      16      1469      1453   322.844       356    0.448708    0.198092
   19      16      1541      1525   321.007       288   0.0562509    0.196974
2021-11-17T15:12:16.078364-0700 min lat: 0.019013 max lat: 0.691056 avg lat: 0.195315
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
   20      16      1642      1626   325.154       404    0.117868    0.195315
   21      16      1705      1689   321.668       252   0.0240699    0.194852
   22      16      1794      1778   323.226       356   0.0459042    0.196862
   23      16      1858      1842   320.301       256   0.0219746    0.196204
   24      16      1931      1915    319.12       292    0.131663    0.199722
   25      16      2010      1994   318.993       316   0.0186529    0.198506
   26      16      2090      2074    319.03       320    0.126974    0.199997
   27      16      2178      2162   320.249       352   0.0293687    0.196834
   28      16      2240      2224   317.667       248   0.0688802    0.200686
   29      16      2304      2288   315.539       256   0.0221984    0.200413
   30      16      2355      2339   311.821       204    0.424412    0.205138
   31      16      2427      2411    311.05       288   0.0386804    0.203838
   32      16      2502      2486   310.704       300   0.0201365    0.204228
   33      16      2580      2564   310.742       312    0.426443     0.20573
   34      16      2649      2633   309.719       276   0.0305598    0.205321
   35      16      2688      2672   305.326       156   0.0223855    0.206025
   36      16      2757      2741    304.51       276    0.107794    0.209648
   37      16      2841      2825    305.36       336   0.0356009    0.208294
   38      16      2911      2895   304.692       280   0.0457671    0.207301
   39      16      3016      3000   307.647       420    0.101721    0.207842
2021-11-17T15:12:36.081421-0700 min lat: 0.0176422 max lat: 0.865423 avg lat: 0.208207
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
   40      16      3083      3067   306.655       268    0.110049    0.208207
   41      16      3154      3138   306.101       284   0.0205416    0.208101
   42      16      3194      3178   302.622       160   0.0297869    0.208491
   43      16      3287      3271   304.234       372     0.20916    0.210057
   44      16      3376      3360    305.41       356      0.1396    0.209084
   45      16      3454      3438   305.555       312   0.0488427    0.208714
   46      16      3512      3496   303.955       232   0.0265811    0.208408
   47      16      3601      3585   305.062       356    0.310278    0.209666
   48      16      3653      3637   303.039       208   0.0411237    0.210539
   49      16      3749      3733    304.69       384   0.0311829    0.208632
   50      16      3822      3806   304.435       292    0.446102    0.209807
   51      16      3895      3879   304.191       292    0.138437    0.209628
   52      16      3925      3909   300.648       120      0.2437    0.210365
   53      16      4007      3991   301.164       328     0.33282    0.212442
   54      16      4078      4062   300.845       284   0.0400915    0.212174
   55      16      4129      4113   299.083       204   0.0322874    0.212612
   56      16      4178      4162   297.242       196    0.027161    0.212844
   57      16      4256      4240     297.5       312   0.0893997    0.214286
   58      16      4309      4293   296.025       212   0.0650014    0.214587
   59      16      4390      4374   296.499       324     0.14421    0.215447
2021-11-17T15:12:56.084382-0700 min lat: 0.0176422 max lat: 0.865423 avg lat: 0.215415
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
   60      16      4460      4444   296.223       280   0.0673159    0.215415
Total time run:         60.2074
Total writes made:      4460
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     296.309
Stddev Bandwidth:       65.7592
Max bandwidth (MB/sec): 420
Min bandwidth (MB/sec): 120
Average IOPS:           74
Stddev IOPS:            16.4398
Max IOPS:               105
Min IOPS:               30
Average Latency(s):     0.21583
Stddev Latency(s):      0.217148
Max latency(s):         0.865423
Min latency(s):         0.0176422

aaron · Nov 18, 2021

Not going into any actual results as I do not know all the SSDs involved. But I want to give some advice when looking out for which SSDs to buy, as this has become a bit of a minefield.

Unfortunately, one cannot blindly buy a new SSD and expect it to behave well. By now the market has fragmented quite a bit and there are some bad practices happening.

This means that you should carefully research the SSDs you plan to buy, especially if they are not in the enterprise/datacenter range but consumer grade!

QLC is most likely terrible, some datacenter QLC SSDs still deliver decent performance. The 870 QVO though is really bad once it's SLC cache is full, down to 80Mbyte/s from what I remember. That is slower than most spinning HDDs...

The other thing manufacturers started to do in recent years, is to quietly change the SSDs without changing the model number. So you could have a good SSD that got great reviews 3 years ago, but if you would buy a new one, you will have a vastly different SSD, different controller, maybe even QLC instead of TLC chips!
The current market situation where manufacturers change different components with sometimes drastic changes in performance, without making it very clear to the end user, is infuriating.

So please do search for those models you consider buying, check the benchmarks, and also check the news as not all review sites will update their year-old reviews mentioning that currently sold models are not the ones anymore that they tested.

One example, where they were nice enough to update their review afterwards, is the Crucial P2: https://www.tomshardware.com/reviews/crucial-p2-m-2-nvme-ssd

Felix. · Nov 18, 2021

MoreDakka said:
Budget. Ha. I have to rub two pennies together to get a dime around here.

I know that feeling.

Checkout this PDF: https://forum.proxmox.com/threads/proxmox-ve-ceph-benchmark-2020-09-hyper-converged-with-nvme.76516/
There is a comparison of a handful of SSDs, including a Samsung EVO.
It's ridiculous, the Intel 3510 are not the newest drives by any means and they still vastly outperform the EVO, like ~30 times according to the numbers.

The Intel DC S3700 has a very good reputation, you can find the 800gb versions of those on ebay for around 120€ (excl. VAT).
Usually it is safe to buy those enterprise ssds used, especially when buying from an commercial seller.
And most of the time they have very few wear, so you could use them for quite some time, and even if a drive fails, Ceph got you covered.

Looking for some affordable, used enterprise grade drives will most probably get you better results than looking for older / other consumer ssds.

Search

Search

Proxmox CEPH performance

MoreDakka

Active Member

Felix.

Renowned Member

MoreDakka

Active Member

Felix.

Renowned Member

MoreDakka

Active Member

MoreDakka

Active Member

aaron

Proxmox Staff Member

Felix.

Renowned Member

We value your privacy