Hi, I am building my a Ceph cluster with Proxmox 6.3, and I am experiencing a low performance instead of the proxmox benchmark (https://www.proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark). Hope you can help me identify where is my bottleneck.
At this moment I am using 3 nodes, with 4 OSDs on each node (all SSD).
Specs per node:
DELL R730XD with 2x Xeon E5-2680 v4 2.40GHz
320 Gb DDR4
4x Samsung s883 960Gb for Ceph
1x Intel s3700 for Proxmox
2x Gigabit NIC 1Gb ( just 1 use for vm traffic and corosync )
2x Gigabit NIC 10Gb ( for ceph in LACP ) - mtu 9000
No journal
Switch ceph network: Cisco catalyst ws6509e ( just for test )
#rados bench -p ssd_pool 10 write
# pveversion -v
# cat /etc/pve/ceph.conf
# cat /etc/network/interfaces
# iperf -c 10.10.10.151
# ceph status
Thank you very much.
At this moment I am using 3 nodes, with 4 OSDs on each node (all SSD).
Specs per node:
DELL R730XD with 2x Xeon E5-2680 v4 2.40GHz
320 Gb DDR4
4x Samsung s883 960Gb for Ceph
1x Intel s3700 for Proxmox
2x Gigabit NIC 1Gb ( just 1 use for vm traffic and corosync )
2x Gigabit NIC 10Gb ( for ceph in LACP ) - mtu 9000
No journal
Switch ceph network: Cisco catalyst ws6509e ( just for test )
#rados bench -p ssd_pool 10 write
Code:
Object prefix: benchmark_data_pve02_620804
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 118 102 407.971 408 0.0841476 0.140718
2 16 224 208 415.957 424 0.133007 0.146998
3 16 333 317 422.611 436 0.0739292 0.147036
4 16 437 421 420.941 416 0.157942 0.147613
5 16 549 533 426.338 448 0.306489 0.147389
6 16 659 643 428.602 440 0.150727 0.147659
7 16 770 754 430.793 444 0.115754 0.146412
8 16 878 862 430.936 432 0.158047 0.146082
9 16 991 975 433.267 452 0.110943 0.1465
10 16 1097 1081 432.335 424 0.208018 0.14674
Total time run: 10.1027
Total writes made: 1097
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 434.338
Stddev Bandwidth: 14.2922
Max bandwidth (MB/sec): 452
Min bandwidth (MB/sec): 408
Average IOPS: 108
Stddev IOPS: 3.57305
Max IOPS: 113
Min IOPS: 102
Average Latency(s): 0.146708
Stddev Latency(s): 0.0684433
Max latency(s): 0.543467
Min latency(s): 0.0493101
# pveversion -v
Code:
proxmox-ve: 6.3-1 (running kernel: 5.4.73-1-pve)
pve-manager: 6.3-2 (running version: 6.3-2/22f57405)
pve-kernel-5.4: 6.3-1
pve-kernel-helper: 6.3-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph: 15.2.9-pve1
ceph-fuse: 15.2.9-pve1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-6
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.3-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.5-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-1
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-1
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1
# cat /etc/pve/ceph.conf
Code:
[global]
debug asok = 0/0
debug auth = 0/0
debug buffer = 0/0
debug client = 0/0
debug context = 0/0
debug crush = 0/0
debug filer = 0/0
debug filestore = 0/0
debug finisher = 0/0
debug heartbeatmap = 0/0
debug journal = 0/0
debug journaler = 0/0
debug lockdep = 0/0
debug mds = 0/0
debug mds balancer = 0/0
debug mds locker = 0/0
debug mds log = 0/0
debug mds log expire = 0/0
debug mds migrator = 0/0
debug mon = 0/0
debug monc = 0/0
debug ms = 0/0
debug objclass = 0/0
debug objectcacher = 0/0
debug objecter = 0/0
debug optracker = 0/0
debug osd = 0/0
debug paxos = 0/0
debug perfcounter = 0/0
debug rados = 0/0
debug rbd = 0/0
debug rgw = 0/0
debug throttle = 0/0
debug timer = 0/0
debug tp = 0/0
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 10.10.10.151/24
fsid = 3c727e0a-14f4-40d6-9346-6426a3c7d5fa
mon_allow_pool_delete = true
mon_host = 10.10.10.151 10.10.10.152 10.10.10.153
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 10.10.10.151/24
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[mon.pve01]
public_addr = 10.10.10.151
[mon.pve02]
public_addr = 10.10.10.152
[mon.pve03]
public_addr = 10.10.10.153
# cat /etc/network/interfaces
Code:
auto lo
iface lo inet loopback
iface eno4 inet manual
iface eno3 inet manual
auto eno1
iface eno1 inet manual
mtu 9000
auto eno2
iface eno2 inet manual
auto bond0
iface bond0 inet static
address 10.10.10.152/24
bond-slaves eno1 eno2
bond-miimon 100
bond-mode 802.3ad
mtu 9000
auto vmbr0
iface vmbr0 inet static
address 192.***.**.212/24
gateway 192.***.**.3
bridge-ports eno4
bridge-stp off
bridge-fd 0
# iperf -c 10.10.10.151
Code:
------------------------------------------------------------
Client connecting to 10.10.10.151, TCP port 5001
TCP window size: 715 KByte (default)
------------------------------------------------------------
[ 3] local 10.10.10.152 port 50168 connected with 10.10.10.151 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 11.4 GBytes 9.82 Gbits/sec
# ceph status
Code:
cluster:
id: 3c727e0a-14f4-40d6-9346-6426a3c7d5fa
health: HEALTH_OK
services:
mon: 3 daemons, quorum pve01,pve02,pve03 (age 17h)
mgr: pve01(active, since 17h), standbys: pve02, pve03
osd: 12 osds: 12 up (since 17h), 12 in (since 2d)
data:
pools: 2 pools, 33 pgs
objects: 12 objects, 0 B
usage: 12 GiB used, 10 TiB / 10 TiB avail
pgs: 33 active+clean
Thank you very much.