Two clusters, same config, different performance

Mikepop · May 10, 2019

Hi, we've been using proxmox ceph cluster for a year in one datacenter and decided to build a new one in another one recently, but we are unable to get the first one performance with the new one, in fact we have a 10X latency in the new one with a few VMs running for one hundred in the old one.
Theses are one week metrics from the old one:

And these are from the new one:

Disks are the same ones in both clusters, 24 x Samsung PM863. Network is also the same, ceph in dual ConnectX-3 40 Gb/sec ballance-rr, corosync in a dedicated network iface and same Arista switches for 40 Gb and Nexus/Fex for the 1Gb/10Gb ifaces.
HBA's in one cluster are SAS2308 flashed to IT mode with Supermicro firmware P20.
HBA's in the new cluster are Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02) flashed IT supermicro P16 firmware. Previously we tested with 2008 It flashed with P19 in this new cluster with same bad latencies.
I've checked osds latency, one osd per disk, to see if there is any working slow and increasing overall latency but cannot see any pattern, old one cluster:

And the new one

PVE version is the same on both clusters:

pveversion
pve-manager/5.4-4/97a96833 (running kernel: 4.15.18-13-pve)

Ceph config are also the same in both clusters:

[global]
auth client required = none
auth service required = none
auth_cluster_required = none
cephx require signatures = false
cephx sign messages = false
cluster network = 10.10.40.0/24
debug_asok = 0/0
debug_auth = 0/0
debug_buffer = 0/0
debug_context = 0/0
debug_crush = 0/0
debug_filestore = 0/0
debug_finisher = 0/0
debug_heartbeatmap = 0/0
debug_journal = 0/0
debug_journaler = 0/0
debug_lockdep = 0/0
debug_monc = 0/0
debug_ms = 0/0
debug_objclass = 0/0
debug_optracker = 0/0
debug_osd = 0/0
debug_perfcounter = 0/0
debug_throttle = 0/0
debug_timer = 0/0
debug_tp = 0/0
fsid = b70b6772-1c34-407d-a701-462c14fde916
keyring = /etc/pve/priv/$cluster.$name.keyring
mon allow pool delete = true
osd journal size = 5120
osd pool default min size = 2
osd pool default size = 3
public network = 10.10.40.0/24

[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon.int102]
host = int102
mon addr = 10.10.40.102:6789
mgr initial modules = prometheus
[mon.int103]
host = int103
mon addr = 10.10.40.103:6789
mgr initial modules = prometheus
[mon.int101]
host = int101
mon addr = 10.10.40.101:6789
mgr initial modules = prometheus

First cluster is Supermicro based and the new one is Dell R620. I'm out of ideas so if anyone can suggest where to look ot test I'd appreciate.

Thanks

Mikepop · May 15, 2019

No ideas? I was thinking in backplane, but I do not know how to test that. Any R620 based cluster outthere to compare latency metrics?

Regards

Mikepop · May 16, 2019

I've checked Individual latency for one disk in one node of each cluster and I'm notable to see such 10X difference in latency with ioping

LnxBil · May 18, 2019

I had a similar issue in a ZFS pool that was very slow. One bad device (non-enterprise firmware) had a huge impact on performance. Those PM863 of yours are however enterprise-grade, so this latency-problem should not be there. The difference, as I recall it, is that enterprise-grade firmware should enforce a certain latency or report back an error. Can you check the ssd with some samsung test tools? I have no idea if the tools from samsung are linux compatible, I only used them on Windows.

sb-jw · May 18, 2019

Mikepop said:
First cluster is Supermicro based and the new one is Dell R620. I'm out of ideas so if anyone can suggest where to look ot test I'd appreciate.

This is not really the same Config, if you using different Hardware.

Mikepop said:
HBA's in one cluster are SAS2308 flashed to IT mode with Supermicro firmware P20.
HBA's in the new cluster are Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02) flashed IT supermicro P16 firmware.

Same here again, different config, the Chips and Performance are much different. Check the Spec Sheets, the SAS3008 has 1,2GHz instead of 800MHz, 6MB Memory instead of 4MB and can do aroud 9,6GB/s in Full Duplex instead of 4,8GB/s.

If you using Dell Server, i would recommend to use the PERC Controller not other LSI Controller. Use the newest Firmware only. Best way is to check what is on the VMware HCL for vSAN itself. These Components are tested for Enterprise, so normally this should work perfectly for CEPH too.

Could you please list EXACTLY what Hardware you using for the Cluster #1 and for Cluster #2?

Mikepop · May 20, 2019

Thanks for replying. I know it's not exactly same hardware, disks are the same, ceph config it's the same, cpu's and network are also the same. I know mbs and controllers are different, but SAS3008 are in the slow one. I've started in the new/slowest cluster using PERC Controller but performance was even worst, later we changed to 2008 and now we are using 3008.
We've decided to build a hole new cluster just to compare, this time with supermicro servers. I'll post the results when builded.
LnxBil, therre is no much more latency per disk in the new/slowest cluster compared to the old/fastest one, we've already checked that in metrics, I'll check if there is any samsung util in linux to check what you suggest.

Regards

Mikepop · Jul 31, 2019

Just in case someone it's interested I've already builded the new cluster with Supermicro servers and can confirm performance it's normal. I had no way to know exactly what happened with Dell R620, tried to open a support case with Dell but they only support through support forums, I do not use h310 or listed compatible hba, so I can guess the answer there is they do not support that hba, etc..

Regards

Alwin · Aug 1, 2019

To add my two cents. I only see spikes in different intensity, but the average latency look similar for both. And depending on workload, it might just be normal noise.

Aside the differences in hardware, BIOS settings (eg. NUMA, power settings, ...) are also very crucial.

Search

Search

Two clusters, same config, different performance

Mikepop

Well-Known Member

Mikepop

Well-Known Member

Mikepop

Well-Known Member

LnxBil

Distinguished Member

sb-jw

Famous Member

Mikepop

Well-Known Member

Mikepop

Well-Known Member

Alwin

Proxmox Retired Staff

We value your privacy