Hi
After days of searching and trying different things I am looking for some advice on how to solve this problem
.
The performance issue persists in the VM's on Proxmox. They have some old software running on them that requires good random 4K read/write performance and this therefore has a big impact. With or without the Optane 900P SSD's for the bluestore db's does not make much of a difference.
Newly created 3 node ProxmoxVE cluster with CEPH.
Server specs:
Dell R630
2x Intel E5-2660 v3
128 GB RAM
2x 250 GB SSD (ProxmoxVE)
6x 960GB Intel D3-S4510 SSD (OSD)
1x Intel Optane 900P (Bluestore DB)
SSD direct benchmark
- fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/dev/sdd --bs=4k --iodepth=1 --size=4G --readwrite=randread
-> read: IOPS=34.5k, BW=135MiB/s (141MB/s)(4096MiB/30388msec)
- fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/dev/sdd --bs=4k --iodepth=1 --size=4G --readwrite=randwrite
-> write: IOPS=28.0k, BW=113MiB/s (119MB/s)(4096MiB/36198msec)
CEPH Pool benchmark
rbd_iodepth32: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=rbd, iodepth=1
-> [r=6288KiB/s][r=1572 IOPS]
rbd_iodepth32: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=rbd, iodepth=1
-> [w=3359KiB/s][w=839 IOPS]
CEPH Config
Tried with default config and modified config (below) but no noticable difference
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 10.19.40.15/24
fsid = 884a21ce-7386-4dcd-a930-2318d472fb15
mon_allow_pool_delete = true
mon_host = 10.19.40.15 10.19.40.16 10.19.40.17
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 10.19.40.15/24
debug bluestore = 0/0
debug bluefs = 0/0
debug bdev = 0/0
debug rocksdb = 0/0
[osd]
bluestore_block_wal_create = false
bluestore_block_db_create = true
bluestore_fsck_on_mkfs = false
bdev_aio_max_queue_depth = 1024
bluefs_min_flush_size = 65536
bluestore_min_alloc_size = 4096
bluestore_max_blob_size = 65536
bluestore_max_contexts_per_kv_batch = 64
bluestore_rocksdb_options = "compression=kNoCompression,max_write_buffer_number=16,min_write_buffer_number_to_merge=2,recycle_log_file_num=16,compaction_threads=32,flusher_threads=8,max_background_compactions=32,max_background_flushes=8,max_bytes_for_level_base=5368709120,write_buffer_size=83886080,level0_file_num_compaction_trigger=4,level0_slowdown_writes_trigger=400,level0_stop_writes_trigger=800,disableWAL=false,compaction_readahead_size=2097152"
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[mds]
keyring = /var/lib/ceph/mds/ceph-$id/keyring
Network latency
Network is 10Gbit SFP+ over Force 10 s4810 switches.
21 packets transmitted, 21 received, 0% packet loss, time 507ms
rtt min/avg/max/mdev = 0.044/0.086/0.113/0.021 ms
Any ideas?
.
After days of searching and trying different things I am looking for some advice on how to solve this problem

The performance issue persists in the VM's on Proxmox. They have some old software running on them that requires good random 4K read/write performance and this therefore has a big impact. With or without the Optane 900P SSD's for the bluestore db's does not make much of a difference.
Newly created 3 node ProxmoxVE cluster with CEPH.
Server specs:
Dell R630
2x Intel E5-2660 v3
128 GB RAM
2x 250 GB SSD (ProxmoxVE)
6x 960GB Intel D3-S4510 SSD (OSD)
1x Intel Optane 900P (Bluestore DB)
SSD direct benchmark
- fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/dev/sdd --bs=4k --iodepth=1 --size=4G --readwrite=randread
-> read: IOPS=34.5k, BW=135MiB/s (141MB/s)(4096MiB/30388msec)
- fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/dev/sdd --bs=4k --iodepth=1 --size=4G --readwrite=randwrite
-> write: IOPS=28.0k, BW=113MiB/s (119MB/s)(4096MiB/36198msec)
CEPH Pool benchmark
rbd_iodepth32: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=rbd, iodepth=1
-> [r=6288KiB/s][r=1572 IOPS]
rbd_iodepth32: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=rbd, iodepth=1
-> [w=3359KiB/s][w=839 IOPS]
CEPH Config
Tried with default config and modified config (below) but no noticable difference
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 10.19.40.15/24
fsid = 884a21ce-7386-4dcd-a930-2318d472fb15
mon_allow_pool_delete = true
mon_host = 10.19.40.15 10.19.40.16 10.19.40.17
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 10.19.40.15/24
debug bluestore = 0/0
debug bluefs = 0/0
debug bdev = 0/0
debug rocksdb = 0/0
[osd]
bluestore_block_wal_create = false
bluestore_block_db_create = true
bluestore_fsck_on_mkfs = false
bdev_aio_max_queue_depth = 1024
bluefs_min_flush_size = 65536
bluestore_min_alloc_size = 4096
bluestore_max_blob_size = 65536
bluestore_max_contexts_per_kv_batch = 64
bluestore_rocksdb_options = "compression=kNoCompression,max_write_buffer_number=16,min_write_buffer_number_to_merge=2,recycle_log_file_num=16,compaction_threads=32,flusher_threads=8,max_background_compactions=32,max_background_flushes=8,max_bytes_for_level_base=5368709120,write_buffer_size=83886080,level0_file_num_compaction_trigger=4,level0_slowdown_writes_trigger=400,level0_stop_writes_trigger=800,disableWAL=false,compaction_readahead_size=2097152"
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[mds]
keyring = /var/lib/ceph/mds/ceph-$id/keyring
Network latency
Network is 10Gbit SFP+ over Force 10 s4810 switches.
21 packets transmitted, 21 received, 0% packet loss, time 507ms
rtt min/avg/max/mdev = 0.044/0.086/0.113/0.021 ms
Any ideas?
