poor CEPH performance

frantek · Jul 3, 2019

Hi,

I've a Ceph setup wich I upgraded to the latest version and moved all disks to bluestore. Now performance is pretty bad. I get IO delay of about 10 in worst case.

I use 10GE mesh networking for Ceph. DBs are on SSD's and the OSD's are spinning disks.

[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
cluster network = 10.15.15.0/24
filestore xattr use omap = true
fsid = e9a07274-cba6-4c72-9788-a7b65c93e477
keyring = /etc/pve/priv/$cluster.$name.keyring
mon allow pool delete = true
osd journal size = 5120
osd pool default min size = 1
public network = 10.15.15.0/24
[mds]
keyring = /var/lib/ceph/mds/ceph-$id/keyring
[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring
[mds.pve02]
host = pve02
mds standby for name = pve
[mds.pve03]
host = pve03
mds standby for name = pve
[mds.pve01]
host = pve01
mds standby for name = pve
[mon.2]
host = pve03
mon addr = 10.15.15.7:6789
[mon.1]
host = pve02
mon addr = 10.15.15.6:6789
[mon.0]
host = pve01
mon addr = 10.15.15.5:6789

cluster:
id: e9a07274-cba6-4c72-9788-a7b65c93e477
health: HEALTH_OK

services:
mon: 3 daemons, quorum 0,1,2
mgr: pve01(active), standbys: pve03, pve02
mds: cephfs-1/1/1 up {0=pve02=up:active}, 2 up:standby
osd: 18 osds: 18 up, 18 in

data:
pools: 4 pools, 1248 pgs
objects: 320.83k objects, 1.19TiB
usage: 3.64TiB used, 4.55TiB / 8.19TiB avail
pgs: 1248 active+clean

io:
client: 115KiB/s rd, 53.2MiB/s wr, 79op/s rd, 182op/s wr

proxmox-ve: 5.4-2 (running kernel: 4.15.18-15-pve)
pve-manager: 5.4-8 (running version: 5.4-8/51d494ca)
pve-kernel-4.15: 5.4-5
pve-kernel-4.15.18-17-pve: 4.15.18-43
pve-kernel-4.15.18-16-pve: 4.15.18-41
pve-kernel-4.15.18-15-pve: 4.15.18-40
pve-kernel-4.15.18-14-pve: 4.15.18-39
pve-kernel-4.15.18-13-pve: 4.15.18-37
pve-kernel-4.15.18-12-pve: 4.15.18-36
pve-kernel-4.15.18-11-pve: 4.15.18-34
pve-kernel-4.15.18-10-pve: 4.15.18-32
ceph: 12.2.12-pve1
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-11
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-53
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-13
libpve-storage-perl: 5.0-44
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-3
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-28
pve-cluster: 5.0-37
pve-container: 2.0-39
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-22
pve-firmware: 2.0-6
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 3.0.1-4
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-54
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2

Situation while doing a W10 setup started about 23:35:

In "normal" operation (before 23:35) IO delay never drops below 2. In my other, non Ceph setups, it normally is zero. How to fix this?

TIA

Alwin · Jul 3, 2019

The IO delay is the outstanding IO of the system, with 6x OSDs (HDDs) in each node, there is considerable more IO going on. Besides the IO delay, how do you define that Ceph's performance is poor?

Please further describe your hardware and run a rados bench.
The benchmark commands can be found in the Ceph benchmark paper (PDF), plus their results.
https://forum.proxmox.com/threads/proxmox-ve-ceph-benchmark-2018-02.41761/

frantek · Jul 3, 2019

Poor means W10 setup takes about 30 minutes, instead of less than 10 minutes due to slow disks. VM's are slow. With my old PVE4 setup with Ceph and without blue store on the same hardware the problem did not exist. The old system was slower than singel nodes with a RAID controller too but not that drastically.

Total time run: 60.602509
Total writes made: 2096
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 138.344
Stddev Bandwidth: 28.1865
Max bandwidth (MB/sec): 204
Min bandwidth (MB/sec): 68
Average IOPS: 34
Stddev IOPS: 7
Max IOPS: 51
Min IOPS: 17
Average Latency(s): 0.462364
Stddev Latency(s): 0.222208
Max latency(s): 2.16543
Min latency(s): 0.110608

"rados bench 60 read -t 16 -p pve" did not work for some reason.

Hardware, 3 Nodes with two 10GE Intel X540-AT2 NICs for Ceph mesh, each:

FAMILY System x
MANUFACTURER IBM
PRODUCT System x3650 M3
TOTAL USABLE RAM 70.74 GB
TOTAL NUMBER OF CORES 8
CORES PER CPU 4
TOTAL NUMBER OF CPUS 2
MAXIMUM SPEED 4.40 GHz

Alwin · Jul 3, 2019

What disks are you using? And did the rados read test give any errors?

From a quick search, it seems that it has a RAID card by default, is Ceph running on it? If so, please read the following link.
https://pve.proxmox.com/pve-docs/chapter-pveceph.html#_precondition

frantek · Jul 3, 2019

Alwin said:
What disks are you using?

Different brands and models of 500 GB SATA disks.

Alwin said:
And did the rados read test give any errors?

None, just the usage.

Alwin said:
From a quick search, it seems that it has a RAID card by default, is Ceph running on it? If so, please read the following link. https://pve.proxmox.com/pve-docs/chapter-pveceph.html#_precondition

Off course not. It runs in JBOD mode.

Again: The problem popped up after upgrade from PVE4 to 5 and got even worse by switching to blue store.

adamb · Jul 3, 2019

frantek said:
Different brands and models of 500 GB SATA disks.

None, just the usage.

Off course not. It runs in JBOD mode.

Again: The problem popped up after upgrade from PVE4 to 5 and got even worse by switching to blue store.

JBOD mode still isn't a HBA and can cause issues. I get you had ok performance before hand (Allthough it sounds like it still wasn't were it should be), but its still not the proper configuration from my understanding and experience. A true HBA really makes a difference.

I noticed from your ceph status your seeing about 50MB/s in writes, not amazing but not horrible. Do you recall what you were getting before the upgrade? Is that 50MB/s from the win10 install or is that IO created from another process?

frantek · Jul 4, 2019

adamb said:
I noticed from your ceph status your seeing about 50MB/s in writes, not amazing but not horrible. Do you recall what you were getting before the upgrade? Is that 50MB/s from the win10 install or is that IO created from another process?

Sadly not. And yes, the 50 MB/s are from the Win10 install.

I had a look at my Nagios graphs and they proof me wrong:

Perhaps it's just me, but compared to singel nodes with RAID5 my Ceph cluster is slow.

sb-jw · Jul 4, 2019

Did you change any other settings? Did you add an Pool or increase the PG?

Any change you made after upgrade to PVE5 can cause this problem.

frantek · Jul 4, 2019

No, I've just followed the instructions in the PVE wiki for the upgrade.

frantek · Jul 5, 2019

Any suggestions how to tune my Ceph setup?

Search

Search

poor CEPH performance

frantek

Renowned Member

Alwin

Proxmox Retired Staff

frantek

Renowned Member

Alwin

Proxmox Retired Staff

frantek

Renowned Member

adamb

Famous Member

frantek

Renowned Member

sb-jw

Famous Member

frantek

Renowned Member

frantek

Renowned Member

We value your privacy