Ceph Slow Performance On All Flash NVME

Teapot · Apr 4, 2023

Hello,
We have 6 servers and 36 x 3.84T NVME OSD. (All NVME's Enterprise Gen4 PCIE NVME)
Sometimes when I tried to VM disk import its importing with 60MB/sn but sometimes its importing with 1GB/sn.

VM Speedtest: https://prnt.sc/miiSD_s_F7xC
I checked all nodes CPU, RAM status its ok.
NVME disks are in write back mode.
All node CPU power policy is performance.
MTU 9216 , 2 x 100G LACP 3+4
Total PG : 1024
Ceph version 17.2.

what can cause this ?

Thanks.

Teapot · Apr 4, 2023

Ceph.conf


[global]
         auth_client_required = cephx
         auth_cluster_required = cephx
         auth_service_required = cephx
         cluster_network = 10.254.254.10/24
         debug_asok = 0/0
         debug_auth = 0/0
         debug_bdev = 0/0
         debug_bluefs = 0/0
         debug_bluestore = 0/0
         debug_buffer = 0/0
         debug_civetweb = 0/0
         debug_client = 0/0
         debug_compressor = 0/0
         debug_context = 0/0
         debug_crush = 0/0
         debug_crypto = 0/0
         debug_dpdk = 0/0
         debug_eventtrace = 0/0
         debug_filer = 0/0
         debug_filestore = 0/0
         debug_finisher = 0/0
         debug_fuse = 0/0
         debug_heartbeatmap = 0/0
         debug_javaclient = 0/0
         debug_journal = 0/0
         debug_journaler = 0/0
         debug_kinetic = 0/0
         debug_kstore = 0/0
         debug_leveldb = 0/0
         debug_lockdep = 0/0
         debug_mds = 0/0
         debug_mds_balancer = 0/0
         debug_mds_locker = 0/0
         debug_mds_log = 0/0
         debug_mds_log_expire = 0/0
         debug_mds_migrator = 0/0
         debug_memdb = 0/0
         debug_mgr = 0/0
         debug_mgrc = 0/0
         debug_mon = 0/0
         debug_monc = 0/00
         debug_ms = 0/0
         debug_none = 0/0
         debug_objclass = 0/0
         debug_objectcacher = 0/0
         debug_objecter = 0/0
         debug_optracker = 0/0
         debug_osd = 0/0
         debug_paxos = 0/0
         debug_perfcounter = 0/0
         debug_rados = 0/0
         debug_rbd = 0/0
         debug_rbd_mirror = 0/0
         debug_rbd_replay = 0/0
         debug_refs = 0/0
         debug_reserver = 0/0
         debug_rgw = 0/0
         debug_rocksdb = 0/0
         debug_striper = 0/0
         debug_throttle = 0/0
         debug_timer = 0/0
         debug_tp = 0/0
         debug_xio = 0/0
         fsid = d53c769a-dfa6-42ba-9176-9da308cad967
         mon_allow_pool_delete = true
         mon_host = 10.254.254.11 10.254.254.13 10.254.254.10 10.254.254.14 10.254.254.12
         mon_max_pg_per_osd = 800
         ms_bind_ipv4 = true
         ms_bind_ipv6 = false
         ms_type = async
         osd_pool_default_min_size = 2
         osd_pool_default_size = 3
         perf = True
         public_network = 10.254.254.10/24
         rocksdb_perf = True

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring
         rbd_cache = True
         rbd_cache_max_dirty = 134217728
         rbd_cache_max_dirty_age = 30
         rbd_cache_max_dirty_object = 2
         rbd_cache_size = 335544320
         rbd_cache_target_dirty = 235544320
         rbd_cache_writethrough_until_flush = False

[osd]
         bluestore_extent_map_shard_max_size = 200
         bluestore_extent_map_shard_min_size = 50
         bluestore_extent_map_shard_target_size = 100
         bluestore_rocksdb_options = compression=kNoCompression,max_write_buffer_number=64,min_write_buffer_number_to_merge=32,recycle_log_file_num=64,compaction_style=kCompactionStyleLevel,
         level0_stop_writes_trigger = 256,max_bytes_for_level_base=6GB,compaction_threads=32,flusher_threads=8,compaction_readahead_size=2MB
         osd_disk_threads = 8
         osd_enable_op_tracker = false
         osd_max_pg_log_entries = 10
         osd_memory_target = 14150664191
         osd_min_pg_log_entries = 10
         osd_op_threads = 16
         osd_pg_log_dups_tracked = 10
         osd_pg_log_trim_min = 10
         write_buffer_size = 4MB,target_file_size_base=4MB,max_background_compactions=64,level0_file_num_compaction_trigger=64,level0_slowdown_writes_trigger=128,

[mon.node0]
         public_addr = 10.254.254.10

[mon.node1]
         public_addr = 10.254.254.11

[mon.node2]
         public_addr = 10.254.254.12

[mon.node3]
         public_addr = 10.254.254.13

[mon.node4]
         public_addr = 10.254.254.14

Falk R. · Apr 4, 2023

Hi @Teapot ,
the configuration looks good.
What kind of switch configuration are you using for the LACP?
I have seen problems several times with e.g. DELL, Cisco and Arista with MLAG.
Have you ever run iPerf through the LACP?

Teapot · Apr 4, 2023

Falk R. said:
Hi @Teapot ,
the configuration looks good.
What kind of switch configuration are you using for the LACP?
I have seen problems several times with e.g. DELL, Cisco and Arista with MLAG.
Have you ever run iPerf through the LACP?

Yes , I tested with IPERF. With multiple thread I can get 100-110Gbps.
When I was the restart OSD from Proxmox CEPH GUI, osd won't restart after stopping. I have to completely delete disk delete and add.

pille99 · Apr 4, 2023

small improvements
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx

this you can change tp "none" / but i dont know if if can be changed in on the running system / may somebody can confirm it

i will adapt your settings in your initial post / can i modify with your entries on a running system ?

spirit · Apr 4, 2023

pille99 said:
small improvements
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx

this you can change tp "none" / but i dont know if if can be changed in on the running system / may somebody can confirm it

i will adapt your settings in your initial post / can i modify with your entries on a running system ?

you need to restart all osd/mon and vms

spirit · Apr 4, 2023

if you want to big boost, you can try this qemu version, with tcmalloc compilated

https://mutulin1.odiso.net/pve-qemu-kvm_7.2.0-8_amd64.deb

(I'm currently look to add an option in proxmox to be able to dynamicaly choose it, but this build enabled tcmalloc statically for now).

I'm going from 60k iops-> 90kiops with 4k randread with 1 virtio-scsi disk.

Falk R. · Apr 4, 2023

pve-qemu-kvm: 7.2.0-8 is also automaticly shipped with 7.4 on Enterprise Repository

spirit · Apr 4, 2023

Falk R. said:
pve-qemu-kvm: 7.2.0-8 is also automaticly shipped with 7.4 on Enterprise Repository

as I said, this is a custom build with different compilation option (--tcmalloc).
(Just for testing of course, but you should see big difference in performance).

I'm trying to push it officially on pve-devel mailing list with an new option on the vm, to choose this version.

Teapot · Apr 4, 2023

Solved.
In server all NVME disks same but some of them slower. (it wasn't like that on the first day, they slowed down afterwards)
I changed these disk and problem solved.

But another problem,
When I try to restart OSD in Proxmox CEPH GUI OSD not starting.
What could this be due to?

Teapot · Apr 4, 2023

spirit said:
if you want to big boost, you can try this qemu version, with tcmalloc compilated

https://mutulin1.odiso.net/pve-qemu-kvm_7.2.0-8_amd64.deb

(I'm currently look to add an option in proxmox to be able to dynamicaly choose it, but this build enabled tcmalloc statically for now).

I'm going from 60k iops-> 90kiops with 4k randread with 1 virtio-scsi disk.

This is great ! I will try on my test cluster.
Thanks.

itNGO · Sep 15, 2023

spirit said:
as I said, this is a custom build with different compilation option (--tcmalloc).
(Just for testing of course, but you should see big difference in performance).

I'm trying to push it officially on pve-devel mailing list with an new option on the vm, to choose this version.

Has this ever found its way into official PVE Repositories?

spirit · Sep 15, 2023

itNGO said:
Has this ever found its way into official PVE Repositories?

not yet. I have sent some dynamic patch too, but it's not applied yet.
I'll try to push it again this month.

Search

Search

Ceph Slow Performance On All Flash NVME

Teapot

Active Member

Teapot

Active Member

Falk R.

Distinguished Member

Teapot

Active Member

pille99

Active Member

spirit

Distinguished Member

spirit

Distinguished Member

Falk R.

Distinguished Member

spirit

Distinguished Member

Teapot

Active Member

Teapot

Active Member

itNGO

Renowned Member

spirit

Distinguished Member