As if this subject hasn't been brought around enough I thought I would open a new one cause I'm a bit confused.
We have two clusters
==| Cluster 1, dev |==
Proxmox - Virtual Environment 5.4-3
4x Dell r710
Between 72-128 Gb ram each
H700 - sda - 2x500gb spinning, 7200rpm RAID1
H700 - sdb - 1x2460 Kingston v300 SSD RAID0
2x1Gb bond0 - Management
2x1Gb bond1 - VM Networks
1x10g NIC for CEPH network
==| Custer 2, eventual production |==
Proxmox - Virtual Environment 7.0-14+1
Dell c6220 with 4 nodes, each node has:
2xE5-2650v0
128Gb RAM
sda - adata su800 250Gb
sdb - Samsung 870 QVO
2x1Gb bond0 - Management and VM Network
2x10Gb bond1 - HA, SAN and Backup Network
2x40Gb infinniband bond2 (active-backup) - CEPH Network
When I built Cluster 1 it was quite easy, everything flowed nicely together and performed very well for having low end consumer SSDs. Cluster 2 has been more of a pain thanks to Infinniband and me rebuilding the cluster a couple times leaving old keys and OSDs laying around. However it's all cleaned up and OSD are created, started running some benchmarks. I've been reading the QVOs aren't great for CEPH, right now it's all I can afford. Comparing QVO to v300 I would hope that the newer generation of SSDs would perform at least a bit better than the old gen. However my benchmarks are giving a much more terrible story. The reads are wicked awesome at 1.6Gb but the writes are 50MBish.
Cluster 1:
Here is Cluster 2:
Any ideas on why the write is so terrible on the QVO compared to the v300s?
Thanks!
We have two clusters
==| Cluster 1, dev |==
Proxmox - Virtual Environment 5.4-3
4x Dell r710
Between 72-128 Gb ram each
H700 - sda - 2x500gb spinning, 7200rpm RAID1
H700 - sdb - 1x2460 Kingston v300 SSD RAID0
2x1Gb bond0 - Management
2x1Gb bond1 - VM Networks
1x10g NIC for CEPH network
==| Custer 2, eventual production |==
Proxmox - Virtual Environment 7.0-14+1
Dell c6220 with 4 nodes, each node has:
2xE5-2650v0
128Gb RAM
sda - adata su800 250Gb
sdb - Samsung 870 QVO
2x1Gb bond0 - Management and VM Network
2x10Gb bond1 - HA, SAN and Backup Network
2x40Gb infinniband bond2 (active-backup) - CEPH Network
When I built Cluster 1 it was quite easy, everything flowed nicely together and performed very well for having low end consumer SSDs. Cluster 2 has been more of a pain thanks to Infinniband and me rebuilding the cluster a couple times leaving old keys and OSDs laying around. However it's all cleaned up and OSD are created, started running some benchmarks. I've been reading the QVOs aren't great for CEPH, right now it's all I can afford. Comparing QVO to v300 I would hope that the newer generation of SSDs would perform at least a bit better than the old gen. However my benchmarks are giving a much more terrible story. The reads are wicked awesome at 1.6Gb but the writes are 50MBish.
Cluster 1:
Code:
root@prox01:~# ceph tell osd.0 bench
{
"bytes_written": 1073741824,
"blocksize": 4194304,
"elapsed_sec": 2.773273,
"bytes_per_sec": 387174920.724119,
"iops": 92.309694
}
root@prox01:~# ceph tell osd.1 bench
{
"bytes_written": 1073741824,
"blocksize": 4194304,
"elapsed_sec": 2.171328,
"bytes_per_sec": 494509353.029995,
"iops": 117.900217
}
root@prox01:~# ceph tell osd.2 bench
{
"bytes_written": 1073741824,
"blocksize": 4194304,
"elapsed_sec": 2.548962,
"bytes_per_sec": 421246729.674416,
"iops": 100.433047
}
root@prox01:~# ceph tell osd.3 bench
{
"bytes_written": 1073741824,
"blocksize": 4194304,
"elapsed_sec": 2.790667,
"bytes_per_sec": 384761758.709077,
"iops": 91.734352
}
root@prox01:~#
root@prox01:~# ceph -s
cluster:
id: d12f1ef0-3a28-4fe0-9df1-b1913c03a374
health: HEALTH_OK
services:
mon: 4 daemons, quorum prox01,prox02,prox03,prox04
mgr: prox01(active), standbys: prox03, prox02, prox04
mds: cephfs-1/1/1 up {0=prox02=up:active}, 3 up:standby
osd: 4 osds: 4 up, 4 in
data:
pools: 2 pools, 160 pgs
objects: 60.97k objects, 233GiB
usage: 644GiB used, 248GiB / 892GiB avail
pgs: 160 active+clean
io:
client: 12.5KiB/s wr, 0op/s rd, 2op/s wr
root@prox01:~# rados bench -p cephfs_data 60 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 60 seconds or 0 objects
Object prefix: benchmark_data_prox01_2303881
Total time run: 60.135353
Total writes made: 6612
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 439.808
Stddev Bandwidth: 108.835
Max bandwidth (MB/sec): 592
Min bandwidth (MB/sec): 204
Average IOPS: 109
Stddev IOPS: 27
Max IOPS: 148
Min IOPS: 51
Average Latency(s): 0.145508
Stddev Latency(s): 0.124458
Max latency(s): 1.16326
Min latency(s): 0.0258368
root@prox01:~# rados -p cephfs_data bench 60 rand
Total time run: 60.066568
Total reads made: 20313
Read size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 1352.7
Average IOPS: 338
Stddev IOPS: 13
Max IOPS: 363
Min IOPS: 305
Average Latency(s): 0.046073
Max latency(s): 0.953591
Min latency(s): 0.00309316
root@prox01:~# hwinfo --disk
SysFS ID: /class/block/sdb
Model: "DELL PERC H700" (Only one drive on here, Kingston v300 240Gb in RAID0)
root@prox01:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 418.6G 0 disk
├─sda1 8:1 0 1007K 0 part
├─sda2 8:2 0 512M 0 part
└─sda3 8:3 0 418.1G 0 part
├─pve-swap 253:0 0 8G 0 lvm [SWAP]
├─pve-root 253:1 0 96G 0 lvm /
├─pve-data_tmeta 253:2 0 3G 0 lvm
│ └─pve-data-tpool 253:4 0 292.2G 0 lvm
│ └─pve-data 253:5 0 292.2G 0 lvm
└─pve-data_tdata 253:3 0 292.2G 0 lvm
└─pve-data-tpool 253:4 0 292.2G 0 lvm
└─pve-data 253:5 0 292.2G 0 lvm
sdb 8:16 0 223G 0 disk
├─sdb1 8:17 0 100M 0 part /var/lib/ceph/osd/ceph-0
└─sdb2 8:18 0 222.9G 0 part
sr0 11:0 1 1024M 0 rom
sr1 11:1 1 1024M 0 rom
root@prox01:~# pveversion -v
proxmox-ve: 5.4-1 (running kernel: 4.15.18-12-pve)
pve-manager: 5.4-3 (running version: 5.4-3/0a6eaa62)
pve-kernel-4.15: 5.3-3
pve-kernel-4.15.18-12-pve: 4.15.18-35
ceph: 12.2.13-pve1~bpo9
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-50
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-13
libpve-storage-perl: 5.0-41
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-3
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-25
pve-cluster: 5.0-36
pve-container: 2.0-37
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-20
pve-firmware: 2.0-6
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 2.12.1-3
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-50
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2
root@prox01:~# iperf3 -c 10.10.1.163
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 10.7 GBytes 9.21 Gbits/sec 0 sender
[ 4] 0.00-10.00 sec 10.7 GBytes 9.21 Gbits/sec receiver
Here is Cluster 2:
Code:
root@pve1-cpu4:~# ceph tell osd.0 bench
{
"bytes_written": 1073741824,
"blocksize": 4194304,
"elapsed_sec": 2.123248142,
"bytes_per_sec": 505707177.01821965,
"iops": 120.56998658614627
}
root@pve1-cpu4:~# ceph tell osd.1 cache drop
root@pve1-cpu4:~# ceph tell osd.1 bench
{
"bytes_written": 1073741824,
"blocksize": 4194304,
"elapsed_sec": 2.1475737160000001,
"bytes_per_sec": 499979030.28908181,
"iops": 119.20428998209997
}
root@pve1-cpu4:~# ceph tell osd.2 cache drop
root@pve1-cpu4:~# ceph tell osd.2 bench
{
"bytes_written": 1073741824,
"blocksize": 4194304,
"elapsed_sec": 2.1261140599999999,
"bytes_per_sec": 505025503.66465288,
"iops": 120.40746299377749
}
root@pve1-cpu4:~# ceph tell osd.3 cache drop
root@pve1-cpu4:~# ceph tell osd.3 bench
{
"bytes_written": 1073741824,
"blocksize": 4194304,
"elapsed_sec": 2.1270281309999999,
"bytes_per_sec": 504808473.54622978,
"iops": 120.35571898132081
}
root@pve1-cpu4:~#
root@pve1-cpu4:~# ceph -s
cluster:
id: 84681487-a5e1-431f-8741-95694c39d8ac
health: HEALTH_OK
services:
mon: 4 daemons, quorum pve1-cpu1,pve1-cpu2,pve1-cpu3,pve1-cpu4 (age 2h)
mgr: pve1-cpu1(active, since 21h), standbys: pve1-cpu3, pve1-cpu4, pve1-cpu2
mds: 1/1 daemons up, 3 standby
osd: 4 osds: 4 up (since 112m), 4 in (since 112m)
data:
volumes: 1/1 healthy
pools: 3 pools, 192 pgs
objects: 2.23k objects, 8.6 GiB
usage: 26 GiB used, 7.3 TiB / 7.3 TiB avail
pgs: 192 active+clean
root@pve1-cpu4:~# rados bench -p cephfs_data 60 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 60 seconds or 0 objects
Object prefix: benchmark_data_pve1-cpu4_182670
Total time run: 61.5063
Total writes made: 663
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 43.1175
Stddev Bandwidth: 13.4592
Max bandwidth (MB/sec): 80
Min bandwidth (MB/sec): 12
Average IOPS: 10
Stddev IOPS: 3.3648
Max IOPS: 20
Min IOPS: 3
Average Latency(s): 1.46936
Stddev Latency(s): 0.867992
Max latency(s): 4.5161
Min latency(s): 0.0314636
root@pve1-cpu4:~# rados -p cephfs_data bench 60 rand
hints = 1
Total time run: 64.3072
Total reads made: 957
Read size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 59.5268
Average IOPS: 14
Stddev IOPS: 8.94126
Max IOPS: 47
Min IOPS: 3
Average Latency(s): 1.03755
Max latency(s): 4.97272
Min latency(s): 0.00367236
root@pve1-cpu4:~# hwinfo --disk
SysFS ID: /class/block/sdb
Model: "Samsung SSD 870"
root@pve1-cpu4:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 238.5G 0 disk
├─sda1 8:1 0 1007K 0 part
├─sda2 8:2 0 512M 0 part
└─sda3 8:3 0 238G 0 part
├─pve-swap 253:0 0 8G 0 lvm [SWAP]
├─pve-root 253:1 0 59.3G 0 lvm /
├─pve-data_tmeta 253:2 0 1.6G 0 lvm
│ └─pve-data 253:4 0 151.6G 0 lvm
└─pve-data_tdata 253:3 0 151.6G 0 lvm
└─pve-data 253:4 0 151.6G 0 lvm
sdb 8:16 0 1.8T 0 disk
└─ceph--c40d9a18--8547--4e12--84ee--f9e05c18cb46-osd--block--4e3ef8be--e516--44da--acf1--be0fab77b1da 253:5 0 1.8T 0 lvm
root@pve1-cpu4:~# pveversion -v
proxmox-ve: 7.0-2 (running kernel: 5.11.22-7-pve)
pve-manager: 7.0-14+1 (running version: 7.0-14+1/08975a4c)
pve-kernel-helper: 7.1-4
pve-kernel-5.11: 7.0-10
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-5-pve: 5.11.22-10
pve-kernel-5.11.22-4-pve: 5.11.22-9
ceph: 16.2.6-pve2
ceph-fuse: 16.2.6-pve2
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-6
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-12
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-13
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.13-1
proxmox-backup-file-restore: 2.0.13-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.1-1
pve-docs: 7.0-5
pve-edk2-firmware: 3.20210831-1
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.5-1
pve-qemu-kvm: 6.1.0-1
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-18
smartmontools: 7.2-1
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3
root@pve1-cpu1:~# iperf3 -c 192.168.1.84 -T s1
s1: - - - - - - - - - - - - - - - - - - - - - - - - -
s1: [ ID] Interval Transfer Bitrate Retr
s1: [ 5] 0.00-10.00 sec 25.6 GBytes 22.0 Gbits/sec 0 sender
s1: [ 5] 0.00-10.00 sec 25.6 GBytes 22.0 Gbits/sec receiver
Any ideas on why the write is so terrible on the QVO compared to the v300s?
Thanks!