Extremely SLOW Ceph Storage from over 60% usage ???

May 18, 2021
25
3
8
41
We have a "Lab" Ceph Object Storage consisting of a 4x Multinode Server and the following Node components:

Per Node:
  1. PVE Manager Version pve-manager/7.1-7/df5740ad
  2. Kernel Version Linux 5.13.19-2-pve #1 SMP PVE 5.13.19-4 (Mon, 29 Nov 2021 12:10:09 +0100)
  3. 24 x Intel(R) Xeon(R) CPU X5675 @ 3.07GHz (2 Sockets)
  4. RAM usage 48.34% (11.37 GiB of 23.51 GiB)
  5. SWAP usage 0.00% (0 B of 1024.00 MiB) - ZRAM
  6. / HD space 0.07% (2.58 GiB of 3.51 TiB) - ZFS (Root)
  7. SATA - 2x 4 TB Ceph OSDs
  8. Ceph Cluster Network - 1 Gbit - 9000 MTU
  9. Proxmox Cluster / Ceph Mon. - 1 Gbit - 1500 MTU

ceph_slow_1.jpg

The Ceph Cluster thus consists of (2x 4 TB per node) x4 = a total of 8 OSDs with 32 TB gross capacity.


ceph_slow_2.jpg

ceph_slow_3.jpg
ceph_slow_4.jpg

The cephfs_data pool is compressed with LZ4 under aggressive.

ceph_slow_5.jpg

The Ceph storage is accessed exclusively via a "Samba-to-Ceph" Debian 11 based VM, on a remote Proxmox node, which has mounted the cephfs pool under /ceph-data and provides it via Samba (as Active Directory Domain Member) for file storage as an archive system.

root@samba-to-ceph:~# tail -n 6 /etc/fstab
### ### ###

#// Mini-Rack Ceph Object Storage
172.16.100.11:6789,172.16.100.12:6789,172.16.100.13:6789,172.16.100.14:6789:/ /ceph-data ceph name=admin,secret="SECRET",noatime,acl,_netdev 0 2

### ### ###
# EOF
root@samba-to-ceph:~#

Up to the 60% pool occupancy, SMB access was performant with an average of 120-130 MB/s copying from a Windows VM client (via the Samba-To-Ceph VM)

However, the entire Ceph performance seems to have plummeted after the last scrub!

!!! The write performance is now around the 200-300 Kbit/s !!!!

ceph_performance_001.jpg

The RAM is completely in use (approx. 50% is CACHED)
The SWAP is also running fuller and fuller.

... after much debugging ...

!!! To regain normal performance, a cleanup of the PageCache and a SWAP cleanup helped !!!!

root@ceph1-minirack:~#
root@ceph1-minirack:~# sync; echo 1 > /proc/sys/vm/drop_caches
root@ceph1-minirack:~#
root@ceph1-minirack:~# swapoff -a
root@ceph1-minirack:~#
root@ceph1-minirack:~# swapon /dev/zram0 -p 10
root@ceph1-minirack:~#
root@ceph1-minirack:~# free
total used free shared buff/cache available
Mem: 24656724 11893748 12400936 71968 362040 12346804
Swap: 1048572 0 1048572
root@ceph1-minirack:~#

What is a permanent solution to this phenomenon ?
What parameters on the Linux kernel? Proxmox? or Ceph itself? need to be adjusted?
 
And again the RAM is full to swapping

ceph_osd_swapping_1.jpg

root@ceph1-minirack:~# cat /proc/meminfo
MemTotal: 24656724 kB
MemFree: 208708 kB
MemAvailable: 14458664 kB
Buffers: 14197188 kB
Cached: 229064 kB
SwapCached: 4968 kB
Active: 4332072 kB
Inactive: 18277808 kB
Active(anon): 47384 kB
Inactive(anon): 8225416 kB
Active(file): 4284688 kB
Inactive(file): 10052392 kB
Unevictable: 158424 kB
Mlocked: 155352 kB
SwapTotal: 1048572 kB
SwapFree: 1004028 kB
Dirty: 12 kB
Writeback: 0 kB
AnonPages: 8337664 kB
Mapped: 168712 kB
Shmem: 71920 kB
KReclaimable: 332728 kB
Slab: 587740 kB
SReclaimable: 332728 kB
SUnreclaim: 255012 kB
KernelStack: 11920 kB
PageTables: 26592 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 13376932 kB
Committed_AS: 14978128 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 624020 kB
VmallocChunk: 0 kB
Percpu: 27264 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
FileHugePages: 0 kB
FilePmdMapped: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 349568 kB
DirectMap2M: 9078784 kB
DirectMap1G: 17825792 kB
root@ceph1-minirack:~#
 
It is probably due to the activated SWAP

ceph_osd_free_1.jpg

root@ceph1-minirack:~# for file in /proc/*/status ; do awk '/VmSwap|Name/{printf $2 " " $3}END{ print ""}' $file; done | grep kB | egrep -v "0 kB"
pvedaemon worke8536 kB
pmxcfs 1192 kB
ceph-osd 3188 kB
ceph-osd 3184 kB
pve-firewall 5544 kB
pvestatd 796 kB
pvescheduler 3648 kB
pvedaemon 9816 kB
pve-ha-lrm 7256 kB
pvedaemon worke8452 kB
root@ceph1-minirack:~#
 
You cant't evade not using swap when swap is present. You can set swappiness parameter, remove swap, raise RAM, or more debug the problem.