Extremely SLOW Ceph Storage from over 60% usage ???

May 18, 2021
20
2
8
40
We have a "Lab" Ceph Object Storage consisting of a 4x Multinode Server and the following Node components:

Per Node:
  1. PVE Manager Version pve-manager/7.1-7/df5740ad
  2. Kernel Version Linux 5.13.19-2-pve #1 SMP PVE 5.13.19-4 (Mon, 29 Nov 2021 12:10:09 +0100)
  3. 24 x Intel(R) Xeon(R) CPU X5675 @ 3.07GHz (2 Sockets)
  4. RAM usage 48.34% (11.37 GiB of 23.51 GiB)
  5. SWAP usage 0.00% (0 B of 1024.00 MiB) - ZRAM
  6. / HD space 0.07% (2.58 GiB of 3.51 TiB) - ZFS (Root)
  7. SATA - 2x 4 TB Ceph OSDs
  8. Ceph Cluster Network - 1 Gbit - 9000 MTU
  9. Proxmox Cluster / Ceph Mon. - 1 Gbit - 1500 MTU

ceph_slow_1.jpg

The Ceph Cluster thus consists of (2x 4 TB per node) x4 = a total of 8 OSDs with 32 TB gross capacity.


ceph_slow_2.jpg

ceph_slow_3.jpg
ceph_slow_4.jpg

The cephfs_data pool is compressed with LZ4 under aggressive.

ceph_slow_5.jpg

The Ceph storage is accessed exclusively via a "Samba-to-Ceph" Debian 11 based VM, on a remote Proxmox node, which has mounted the cephfs pool under /ceph-data and provides it via Samba (as Active Directory Domain Member) for file storage as an archive system.

root@samba-to-ceph:~# tail -n 6 /etc/fstab
### ### ###

#// Mini-Rack Ceph Object Storage
172.16.100.11:6789,172.16.100.12:6789,172.16.100.13:6789,172.16.100.14:6789:/ /ceph-data ceph name=admin,secret="SECRET",noatime,acl,_netdev 0 2

### ### ###
# EOF
root@samba-to-ceph:~#

Up to the 60% pool occupancy, SMB access was performant with an average of 120-130 MB/s copying from a Windows VM client (via the Samba-To-Ceph VM)

However, the entire Ceph performance seems to have plummeted after the last scrub!

!!! The write performance is now around the 200-300 Kbit/s !!!!

ceph_performance_001.jpg

The RAM is completely in use (approx. 50% is CACHED)
The SWAP is also running fuller and fuller.

... after much debugging ...

!!! To regain normal performance, a cleanup of the PageCache and a SWAP cleanup helped !!!!

root@ceph1-minirack:~#
root@ceph1-minirack:~# sync; echo 1 > /proc/sys/vm/drop_caches
root@ceph1-minirack:~#
root@ceph1-minirack:~# swapoff -a
root@ceph1-minirack:~#
root@ceph1-minirack:~# swapon /dev/zram0 -p 10
root@ceph1-minirack:~#
root@ceph1-minirack:~# free
total used free shared buff/cache available
Mem: 24656724 11893748 12400936 71968 362040 12346804
Swap: 1048572 0 1048572
root@ceph1-minirack:~#

What is a permanent solution to this phenomenon ?
What parameters on the Linux kernel? Proxmox? or Ceph itself? need to be adjusted?
 
And again the RAM is full to swapping

ceph_osd_swapping_1.jpg

root@ceph1-minirack:~# cat /proc/meminfo
MemTotal: 24656724 kB
MemFree: 208708 kB
MemAvailable: 14458664 kB
Buffers: 14197188 kB
Cached: 229064 kB
SwapCached: 4968 kB
Active: 4332072 kB
Inactive: 18277808 kB
Active(anon): 47384 kB
Inactive(anon): 8225416 kB
Active(file): 4284688 kB
Inactive(file): 10052392 kB
Unevictable: 158424 kB
Mlocked: 155352 kB
SwapTotal: 1048572 kB
SwapFree: 1004028 kB
Dirty: 12 kB
Writeback: 0 kB
AnonPages: 8337664 kB
Mapped: 168712 kB
Shmem: 71920 kB
KReclaimable: 332728 kB
Slab: 587740 kB
SReclaimable: 332728 kB
SUnreclaim: 255012 kB
KernelStack: 11920 kB
PageTables: 26592 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 13376932 kB
Committed_AS: 14978128 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 624020 kB
VmallocChunk: 0 kB
Percpu: 27264 kB
HardwareCorrupted: 0 kB
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
FileHugePages: 0 kB
FilePmdMapped: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 349568 kB
DirectMap2M: 9078784 kB
DirectMap1G: 17825792 kB
root@ceph1-minirack:~#
 
It is probably due to the activated SWAP

ceph_osd_free_1.jpg

root@ceph1-minirack:~# for file in /proc/*/status ; do awk '/VmSwap|Name/{printf $2 " " $3}END{ print ""}' $file; done | grep kB | egrep -v "0 kB"
pvedaemon worke8536 kB
pmxcfs 1192 kB
ceph-osd 3188 kB
ceph-osd 3184 kB
pve-firewall 5544 kB
pvestatd 796 kB
pvescheduler 3648 kB
pvedaemon 9816 kB
pve-ha-lrm 7256 kB
pvedaemon worke8452 kB
root@ceph1-minirack:~#
 
You cant't evade not using swap when swap is present. You can set swappiness parameter, remove swap, raise RAM, or more debug the problem.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!