Virtual Machines and Container extremely slow

weirdwiesel

Member
Jul 20, 2021
5
5
23
45
Dear Proxmoy-Experts,

Since some days now, the performance of every machine and container in my cluster is extremely slow.

Here some general Info of my setup
I am running a 3 node proxmox-cluster with up-to-date packages.
proxmox-ve: 8.0.2 (running kernel: 6.2.16-12-pve)
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
pve-kernel-6.2: 8.0.5
proxmox-kernel-helper: 8.0.3
pve-kernel-5.15: 7.4-3
proxmox-kernel-6.2.16-12-pve: 6.2.16-12
proxmox-kernel-6.2: 6.2.16-12
proxmox-kernel-6.2.16-10-pve: 6.2.16-10
proxmox-kernel-6.2.16-8-pve: 6.2.16-8
pve-kernel-5.15.107-2-pve: 5.15.107-2
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph: 17.2.6-pve1+3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.5
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.8
libpve-guest-common-perl: 5.0.4
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.5
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.2-1
proxmox-backup-file-restore: 3.0.2-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.6
pve-cluster: 8.0.3
pve-container: 5.0.4
pve-docs: 8.0.4
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.8-2
pve-ha-manager: 4.0.2
pve-i18n: 3.0.5
pve-qemu-kvm: 8.0.2-5
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.7
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1

All three cluster nodes are almost identical in heir hardware specs:
CPU
Intel® Core™ i7-12650H Prozessor, 10 Kerne/16 Threads
(24 MB Cache, up to 4,70 GHz)
GPU
Intel® UHD Graphics for Intel® Processors 12th gen (Frequence 1,40 GHz)

RAM
DDR4 16GB×2 Dual Channel SODIMM

Storage
M.2 2280 512 GB PCIe4.0 SSD

In each node, I have added a Smsung QVO 4TB 2,5-Zoll-SATA-SSD and put them into a Ceph-Cluster. These are the Ceph-Specs
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 10.0.0.1/24
fsid = fe55267f-6e22-4e16-b49e-3ff82fa193a4
mon_allow_pool_delete = true
mon_host = 10.0.0.1 10.0.0.2 10.0.0.3
ms_bind_ipv4 = true
ms_bind_ipv6 = false
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 10.0.0.1/24
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[mds]
keyring = /var/lib/ceph/mds/ceph-$id/keyring
[mds.prx-host01]
host = prx-host01
mds_standby_for_name = pve
[mds.prx-host02]
host = prx-host02
mds_standby_for_name = pve
[mds.prx-host03]
host = prx-host03
mds standby for name = pve
[mon.prx-host01]
public_addr = 10.0.0.1
[mon.prx-host02]
public_addr = 10.0.0.2
[mon.prx-host03]
public_addr = 10.0.0.3

Since there are 2 2,5GBit Network Ports on each Node. I have separated the Ceph-Network from the normal Data-Access-Network to the machines. All Nodes are connected through a Ubiquiti Enterprise Switch, which support 2,5GBit Connections. The Throughput on the Switch is not very high in general:
1695025127316.png
the blue graph is download and the purple graph is upload traffic.


Current situation
All three system have 4-10 VMs and containers running and the CPU-load is very low. The RAM is between 40 and 55% usage on all hosts and the system-storage is also not running full. here are screenshots of the hosts
1695025462441.png
1695025481007.png
1695025518210.png

What I noticed is the high IO Delay time - some forum posts say, it shouldn't be over 10%. I assume this is the reason for the really bad performance.

For instance, I tried to issue "docker ps" on one of the Ubuntu machines, and it took several minutes for the system to display the output - this is not normal for sure. Here are the specs for this particular machine:
1695025747449.png
1695025767209.png

I read through several posts about low Ceph performance and issued the following commands to test my ceph-cluster:
ceph -s
Code:
  cluster:
    id:     fe55267f-6e22-4e16-b49e-3ff82fa193a4
    health: HEALTH_WARN
            Module 'restful' has failed dependency: PyO3 modules may only be initialized once per interpreter process
            1 subtrees have overcommitted pool target_size_bytes
 
  services:
    mon: 3 daemons, quorum prx-host01,prx-host02,prx-host03 (age 11h)
    mgr: prx-host01(active, since 11h), standbys: prx-host02, prx-host03
    mds: 1/1 daemons up, 2 standby
    osd: 3 osds: 3 up (since 11h), 3 in (since 4d)
 
  data:
    volumes: 1/1 healthy
    pools:   4 pools, 97 pgs
    objects: 105.13k objects, 403 GiB
    usage:   1.2 TiB used, 9.7 TiB / 11 TiB avail
    pgs:     96 active+clean
             1  active+clean+scrubbing+deep
 
  io:
    client:   6.6 MiB/s rd, 666 KiB/s wr, 84 op/s rd, 78 op/s wr

ceph tell osd.x bench
Code:
ceph tell osd.0 bench
{
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 11.266977012,
    "bytes_per_sec": 95299903.679256752,
    "iops": 22.721267623724163
}

ceph tell osd.1 bench
{
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 12.500486085,
    "bytes_per_sec": 85896005.699205577,
    "iops": 20.479203629304308
}

 ceph tell osd.2 bench
{
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 11.687114086999999,
    "bytes_per_sec": 91873991.817566141,
    "iops": 21.904466585532699
}

I also moved some volumes off of the ceph-storage an it takes a VERY long time to copy. 10GB took about 30minutes copying it from the ceph-Pool (on SSDs) to the local NVME-storage.

The problems started, I think, when I exchanged the 3 SSDs. First had 3 Samsung EVO with 1TB connected to each node. I exchanged on by one by doing:
  1. disable backfilling-flag
  2. set OSD to down
  3. set OSD to out
  4. removed the OSD-entry from the cluster-manager
  5. shutdown the node and replace the drives
  6. add in the new OSD in the cluster-manager
  7. set OSD to in and up
  8. enabled backfilling-flag
  9. wait till backfilling finished and moved on to next node
 

Attachments

  • 1695025370982.png
    1695025370982.png
    151.3 KB · Views: 3
  • 1695025362143.png
    1695025362143.png
    151.3 KB · Views: 3
  • 1695025348729.png
    1695025348729.png
    151.3 KB · Views: 3
  • 1695025339842.png
    1695025339842.png
    151.3 KB · Views: 6
Last edited:
Now the Apply/Commit Latency is very high (in my opinio):
1695026330359.png

the write-speed is also very low in my opinion with way under 1MiB/s:
1695030809314.png

I also adjusted (today) the value of the only CephFS-Pool i added by putting in a specific target Size, so it looks like this right now:
1695026398393.png

And this is the overview of all pools:
1695026428140.png
The ceph-setup is created by using (almost always) the standard values. I created the ceph-instance BEFORE I upgraded from Proxmox 7 to 8 and its running ceph-quincy now. The slow performance happens way after upgradeing the software-packages, os I don´t think this can be a reason.

I assume it's me not configuring ceph the right way, so the error sits in front of the device, because I´a newbie in the whole Ceph-Thing.

Right now I´m moving everything off of the ceph pool in case I need to recreate it and to test, if the performance is better on the local lvm storage.

Feel free to ask for further info and test I shall run

Thanks so much in advance for helping me out!
 
Last edited: