Dear Proxmoy-Experts,
Since some days now, the performance of every machine and container in my cluster is extremely slow.
Here some general Info of my setup
I am running a 3 node proxmox-cluster with up-to-date packages.
All three cluster nodes are almost identical in heir hardware specs:
In each node, I have added a Smsung QVO 4TB 2,5-Zoll-SATA-SSD and put them into a Ceph-Cluster. These are the Ceph-Specs
Since there are 2 2,5GBit Network Ports on each Node. I have separated the Ceph-Network from the normal Data-Access-Network to the machines. All Nodes are connected through a Ubiquiti Enterprise Switch, which support 2,5GBit Connections. The Throughput on the Switch is not very high in general:
the blue graph is download and the purple graph is upload traffic.
Current situation
All three system have 4-10 VMs and containers running and the CPU-load is very low. The RAM is between 40 and 55% usage on all hosts and the system-storage is also not running full. here are screenshots of the hosts
What I noticed is the high IO Delay time - some forum posts say, it shouldn't be over 10%. I assume this is the reason for the really bad performance.
For instance, I tried to issue "docker ps" on one of the Ubuntu machines, and it took several minutes for the system to display the output - this is not normal for sure. Here are the specs for this particular machine:
I read through several posts about low Ceph performance and issued the following commands to test my ceph-cluster:
I also moved some volumes off of the ceph-storage an it takes a VERY long time to copy. 10GB took about 30minutes copying it from the ceph-Pool (on SSDs) to the local NVME-storage.
The problems started, I think, when I exchanged the 3 SSDs. First had 3 Samsung EVO with 1TB connected to each node. I exchanged on by one by doing:
Since some days now, the performance of every machine and container in my cluster is extremely slow.
Here some general Info of my setup
I am running a 3 node proxmox-cluster with up-to-date packages.
proxmox-ve: 8.0.2 (running kernel: 6.2.16-12-pve)
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
pve-kernel-6.2: 8.0.5
proxmox-kernel-helper: 8.0.3
pve-kernel-5.15: 7.4-3
proxmox-kernel-6.2.16-12-pve: 6.2.16-12
proxmox-kernel-6.2: 6.2.16-12
proxmox-kernel-6.2.16-10-pve: 6.2.16-10
proxmox-kernel-6.2.16-8-pve: 6.2.16-8
pve-kernel-5.15.107-2-pve: 5.15.107-2
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph: 17.2.6-pve1+3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.5
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.8
libpve-guest-common-perl: 5.0.4
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.5
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.2-1
proxmox-backup-file-restore: 3.0.2-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.6
pve-cluster: 8.0.3
pve-container: 5.0.4
pve-docs: 8.0.4
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.8-2
pve-ha-manager: 4.0.2
pve-i18n: 3.0.5
pve-qemu-kvm: 8.0.2-5
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.7
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
pve-kernel-6.2: 8.0.5
proxmox-kernel-helper: 8.0.3
pve-kernel-5.15: 7.4-3
proxmox-kernel-6.2.16-12-pve: 6.2.16-12
proxmox-kernel-6.2: 6.2.16-12
proxmox-kernel-6.2.16-10-pve: 6.2.16-10
proxmox-kernel-6.2.16-8-pve: 6.2.16-8
pve-kernel-5.15.107-2-pve: 5.15.107-2
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph: 17.2.6-pve1+3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.5
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.8
libpve-guest-common-perl: 5.0.4
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.5
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.2-1
proxmox-backup-file-restore: 3.0.2-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.6
pve-cluster: 8.0.3
pve-container: 5.0.4
pve-docs: 8.0.4
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.8-2
pve-ha-manager: 4.0.2
pve-i18n: 3.0.5
pve-qemu-kvm: 8.0.2-5
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.7
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1
All three cluster nodes are almost identical in heir hardware specs:
CPU
Intel® Core™ i7-12650H Prozessor, 10 Kerne/16 Threads
(24 MB Cache, up to 4,70 GHz)
GPU
Intel® UHD Graphics for Intel® Processors 12th gen (Frequence 1,40 GHz)
RAM
DDR4 16GB×2 Dual Channel SODIMM
Storage
M.2 2280 512 GB PCIe4.0 SSD
Intel® Core™ i7-12650H Prozessor, 10 Kerne/16 Threads
(24 MB Cache, up to 4,70 GHz)
GPU
Intel® UHD Graphics for Intel® Processors 12th gen (Frequence 1,40 GHz)
RAM
DDR4 16GB×2 Dual Channel SODIMM
Storage
M.2 2280 512 GB PCIe4.0 SSD
In each node, I have added a Smsung QVO 4TB 2,5-Zoll-SATA-SSD and put them into a Ceph-Cluster. These are the Ceph-Specs
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 10.0.0.1/24
fsid = fe55267f-6e22-4e16-b49e-3ff82fa193a4
mon_allow_pool_delete = true
mon_host = 10.0.0.1 10.0.0.2 10.0.0.3
ms_bind_ipv4 = true
ms_bind_ipv6 = false
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 10.0.0.1/24
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[mds]
keyring = /var/lib/ceph/mds/ceph-$id/keyring
[mds.prx-host01]
host = prx-host01
mds_standby_for_name = pve
[mds.prx-host02]
host = prx-host02
mds_standby_for_name = pve
[mds.prx-host03]
host = prx-host03
mds standby for name = pve
[mon.prx-host01]
public_addr = 10.0.0.1
[mon.prx-host02]
public_addr = 10.0.0.2
[mon.prx-host03]
public_addr = 10.0.0.3
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 10.0.0.1/24
fsid = fe55267f-6e22-4e16-b49e-3ff82fa193a4
mon_allow_pool_delete = true
mon_host = 10.0.0.1 10.0.0.2 10.0.0.3
ms_bind_ipv4 = true
ms_bind_ipv6 = false
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 10.0.0.1/24
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[mds]
keyring = /var/lib/ceph/mds/ceph-$id/keyring
[mds.prx-host01]
host = prx-host01
mds_standby_for_name = pve
[mds.prx-host02]
host = prx-host02
mds_standby_for_name = pve
[mds.prx-host03]
host = prx-host03
mds standby for name = pve
[mon.prx-host01]
public_addr = 10.0.0.1
[mon.prx-host02]
public_addr = 10.0.0.2
[mon.prx-host03]
public_addr = 10.0.0.3
Since there are 2 2,5GBit Network Ports on each Node. I have separated the Ceph-Network from the normal Data-Access-Network to the machines. All Nodes are connected through a Ubiquiti Enterprise Switch, which support 2,5GBit Connections. The Throughput on the Switch is not very high in general:
the blue graph is download and the purple graph is upload traffic.
Current situation
All three system have 4-10 VMs and containers running and the CPU-load is very low. The RAM is between 40 and 55% usage on all hosts and the system-storage is also not running full. here are screenshots of the hosts
What I noticed is the high IO Delay time - some forum posts say, it shouldn't be over 10%. I assume this is the reason for the really bad performance.
For instance, I tried to issue "docker ps" on one of the Ubuntu machines, and it took several minutes for the system to display the output - this is not normal for sure. Here are the specs for this particular machine:
I read through several posts about low Ceph performance and issued the following commands to test my ceph-cluster:
ceph -s
Code:
cluster:
id: fe55267f-6e22-4e16-b49e-3ff82fa193a4
health: HEALTH_WARN
Module 'restful' has failed dependency: PyO3 modules may only be initialized once per interpreter process
1 subtrees have overcommitted pool target_size_bytes
services:
mon: 3 daemons, quorum prx-host01,prx-host02,prx-host03 (age 11h)
mgr: prx-host01(active, since 11h), standbys: prx-host02, prx-host03
mds: 1/1 daemons up, 2 standby
osd: 3 osds: 3 up (since 11h), 3 in (since 4d)
data:
volumes: 1/1 healthy
pools: 4 pools, 97 pgs
objects: 105.13k objects, 403 GiB
usage: 1.2 TiB used, 9.7 TiB / 11 TiB avail
pgs: 96 active+clean
1 active+clean+scrubbing+deep
io:
client: 6.6 MiB/s rd, 666 KiB/s wr, 84 op/s rd, 78 op/s wr
ceph tell osd.x bench
Code:
ceph tell osd.0 bench
{
"bytes_written": 1073741824,
"blocksize": 4194304,
"elapsed_sec": 11.266977012,
"bytes_per_sec": 95299903.679256752,
"iops": 22.721267623724163
}
ceph tell osd.1 bench
{
"bytes_written": 1073741824,
"blocksize": 4194304,
"elapsed_sec": 12.500486085,
"bytes_per_sec": 85896005.699205577,
"iops": 20.479203629304308
}
ceph tell osd.2 bench
{
"bytes_written": 1073741824,
"blocksize": 4194304,
"elapsed_sec": 11.687114086999999,
"bytes_per_sec": 91873991.817566141,
"iops": 21.904466585532699
}
I also moved some volumes off of the ceph-storage an it takes a VERY long time to copy. 10GB took about 30minutes copying it from the ceph-Pool (on SSDs) to the local NVME-storage.
The problems started, I think, when I exchanged the 3 SSDs. First had 3 Samsung EVO with 1TB connected to each node. I exchanged on by one by doing:
- disable backfilling-flag
- set OSD to down
- set OSD to out
- removed the OSD-entry from the cluster-manager
- shutdown the node and replace the drives
- add in the new OSD in the cluster-manager
- set OSD to in and up
- enabled backfilling-flag
- wait till backfilling finished and moved on to next node
Attachments
Last edited: