Hi
After days of searching and trying different things I am looking for some advice on how to solve this problem .
The performance issue persists in the VM's on Proxmox. They have some old software running on them that requires good random 4K read/write performance and this therefore has a big impact. With or without the Optane 900P SSD's for the bluestore db's does not make much of a difference.
Newly created 3 node ProxmoxVE cluster with CEPH.
Server specs:
Dell R630
2x Intel E5-2660 v3
128 GB RAM
2x 250 GB SSD (ProxmoxVE)
6x 960GB Intel D3-S4510 SSD (OSD)
1x Intel Optane 900P (Bluestore DB)
SSD direct benchmark
- fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/dev/sdd --bs=4k --iodepth=1 --size=4G --readwrite=randread
-> read: IOPS=34.5k, BW=135MiB/s (141MB/s)(4096MiB/30388msec)
- fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/dev/sdd --bs=4k --iodepth=1 --size=4G --readwrite=randwrite
-> write: IOPS=28.0k, BW=113MiB/s (119MB/s)(4096MiB/36198msec)
CEPH Pool benchmark
rbd_iodepth32: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=rbd, iodepth=1
-> [r=6288KiB/s][r=1572 IOPS]
rbd_iodepth32: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=rbd, iodepth=1
-> [w=3359KiB/s][w=839 IOPS]
CEPH Config
Tried with default config and modified config (below) but no noticable difference
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 10.19.40.15/24
fsid = 884a21ce-7386-4dcd-a930-2318d472fb15
mon_allow_pool_delete = true
mon_host = 10.19.40.15 10.19.40.16 10.19.40.17
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 10.19.40.15/24
debug bluestore = 0/0
debug bluefs = 0/0
debug bdev = 0/0
debug rocksdb = 0/0
[osd]
bluestore_block_wal_create = false
bluestore_block_db_create = true
bluestore_fsck_on_mkfs = false
bdev_aio_max_queue_depth = 1024
bluefs_min_flush_size = 65536
bluestore_min_alloc_size = 4096
bluestore_max_blob_size = 65536
bluestore_max_contexts_per_kv_batch = 64
bluestore_rocksdb_options = "compression=kNoCompression,max_write_buffer_number=16,min_write_buffer_number_to_merge=2,recycle_log_file_num=16,compaction_threads=32,flusher_threads=8,max_background_compactions=32,max_background_flushes=8,max_bytes_for_level_base=5368709120,write_buffer_size=83886080,level0_file_num_compaction_trigger=4,level0_slowdown_writes_trigger=400,level0_stop_writes_trigger=800,disableWAL=false,compaction_readahead_size=2097152"
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[mds]
keyring = /var/lib/ceph/mds/ceph-$id/keyring
Network latency
Network is 10Gbit SFP+ over Force 10 s4810 switches.
21 packets transmitted, 21 received, 0% packet loss, time 507ms
rtt min/avg/max/mdev = 0.044/0.086/0.113/0.021 ms
Any ideas? .
After days of searching and trying different things I am looking for some advice on how to solve this problem .
The performance issue persists in the VM's on Proxmox. They have some old software running on them that requires good random 4K read/write performance and this therefore has a big impact. With or without the Optane 900P SSD's for the bluestore db's does not make much of a difference.
Newly created 3 node ProxmoxVE cluster with CEPH.
Server specs:
Dell R630
2x Intel E5-2660 v3
128 GB RAM
2x 250 GB SSD (ProxmoxVE)
6x 960GB Intel D3-S4510 SSD (OSD)
1x Intel Optane 900P (Bluestore DB)
SSD direct benchmark
- fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/dev/sdd --bs=4k --iodepth=1 --size=4G --readwrite=randread
-> read: IOPS=34.5k, BW=135MiB/s (141MB/s)(4096MiB/30388msec)
- fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/dev/sdd --bs=4k --iodepth=1 --size=4G --readwrite=randwrite
-> write: IOPS=28.0k, BW=113MiB/s (119MB/s)(4096MiB/36198msec)
CEPH Pool benchmark
rbd_iodepth32: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=rbd, iodepth=1
-> [r=6288KiB/s][r=1572 IOPS]
rbd_iodepth32: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=rbd, iodepth=1
-> [w=3359KiB/s][w=839 IOPS]
CEPH Config
Tried with default config and modified config (below) but no noticable difference
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 10.19.40.15/24
fsid = 884a21ce-7386-4dcd-a930-2318d472fb15
mon_allow_pool_delete = true
mon_host = 10.19.40.15 10.19.40.16 10.19.40.17
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 10.19.40.15/24
debug bluestore = 0/0
debug bluefs = 0/0
debug bdev = 0/0
debug rocksdb = 0/0
[osd]
bluestore_block_wal_create = false
bluestore_block_db_create = true
bluestore_fsck_on_mkfs = false
bdev_aio_max_queue_depth = 1024
bluefs_min_flush_size = 65536
bluestore_min_alloc_size = 4096
bluestore_max_blob_size = 65536
bluestore_max_contexts_per_kv_batch = 64
bluestore_rocksdb_options = "compression=kNoCompression,max_write_buffer_number=16,min_write_buffer_number_to_merge=2,recycle_log_file_num=16,compaction_threads=32,flusher_threads=8,max_background_compactions=32,max_background_flushes=8,max_bytes_for_level_base=5368709120,write_buffer_size=83886080,level0_file_num_compaction_trigger=4,level0_slowdown_writes_trigger=400,level0_stop_writes_trigger=800,disableWAL=false,compaction_readahead_size=2097152"
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[mds]
keyring = /var/lib/ceph/mds/ceph-$id/keyring
Network latency
Network is 10Gbit SFP+ over Force 10 s4810 switches.
21 packets transmitted, 21 received, 0% packet loss, time 507ms
rtt min/avg/max/mdev = 0.044/0.086/0.113/0.021 ms
Any ideas? .