Virtual Machines and Container extremely slow


Jul 20, 2021
Dear Proxmoy-Experts,

Since some days now, the performance of every machine and container in my cluster is extremely slow.

Here some general Info of my setup
I am running a 3 node proxmox-cluster with up-to-date packages.
proxmox-ve: 8.0.2 (running kernel: 6.2.16-12-pve)
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
pve-kernel-6.2: 8.0.5
proxmox-kernel-helper: 8.0.3
pve-kernel-5.15: 7.4-3
proxmox-kernel-6.2.16-12-pve: 6.2.16-12
proxmox-kernel-6.2: 6.2.16-12
proxmox-kernel-6.2.16-10-pve: 6.2.16-10
proxmox-kernel-6.2.16-8-pve: 6.2.16-8
pve-kernel-5.15.107-2-pve: 5.15.107-2
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph: 17.2.6-pve1+3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.5
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.8
libpve-guest-common-perl: 5.0.4
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.5
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.2-1
proxmox-backup-file-restore: 3.0.2-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.6
pve-cluster: 8.0.3
pve-container: 5.0.4
pve-docs: 8.0.4
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.8-2
pve-ha-manager: 4.0.2
pve-i18n: 3.0.5
pve-qemu-kvm: 8.0.2-5
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.7
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1

All three cluster nodes are almost identical in heir hardware specs:
Intel® Core™ i7-12650H Prozessor, 10 Kerne/16 Threads
(24 MB Cache, up to 4,70 GHz)
Intel® UHD Graphics for Intel® Processors 12th gen (Frequence 1,40 GHz)

DDR4 16GB×2 Dual Channel SODIMM

M.2 2280 512 GB PCIe4.0 SSD

In each node, I have added a Smsung QVO 4TB 2,5-Zoll-SATA-SSD and put them into a Ceph-Cluster. These are the Ceph-Specs
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network =
fsid = fe55267f-6e22-4e16-b49e-3ff82fa193a4
mon_allow_pool_delete = true
mon_host =
ms_bind_ipv4 = true
ms_bind_ipv6 = false
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network =
keyring = /etc/pve/priv/$cluster.$name.keyring
keyring = /var/lib/ceph/mds/ceph-$id/keyring
host = prx-host01
mds_standby_for_name = pve
host = prx-host02
mds_standby_for_name = pve
host = prx-host03
mds standby for name = pve
public_addr =
public_addr =
public_addr =

Since there are 2 2,5GBit Network Ports on each Node. I have separated the Ceph-Network from the normal Data-Access-Network to the machines. All Nodes are connected through a Ubiquiti Enterprise Switch, which support 2,5GBit Connections. The Throughput on the Switch is not very high in general:
the blue graph is download and the purple graph is upload traffic.

Current situation
All three system have 4-10 VMs and containers running and the CPU-load is very low. The RAM is between 40 and 55% usage on all hosts and the system-storage is also not running full. here are screenshots of the hosts

What I noticed is the high IO Delay time - some forum posts say, it shouldn't be over 10%. I assume this is the reason for the really bad performance.

For instance, I tried to issue "docker ps" on one of the Ubuntu machines, and it took several minutes for the system to display the output - this is not normal for sure. Here are the specs for this particular machine:

I read through several posts about low Ceph performance and issued the following commands to test my ceph-cluster:
ceph -s
    id:     fe55267f-6e22-4e16-b49e-3ff82fa193a4
    health: HEALTH_WARN
            Module 'restful' has failed dependency: PyO3 modules may only be initialized once per interpreter process
            1 subtrees have overcommitted pool target_size_bytes
    mon: 3 daemons, quorum prx-host01,prx-host02,prx-host03 (age 11h)
    mgr: prx-host01(active, since 11h), standbys: prx-host02, prx-host03
    mds: 1/1 daemons up, 2 standby
    osd: 3 osds: 3 up (since 11h), 3 in (since 4d)
    volumes: 1/1 healthy
    pools:   4 pools, 97 pgs
    objects: 105.13k objects, 403 GiB
    usage:   1.2 TiB used, 9.7 TiB / 11 TiB avail
    pgs:     96 active+clean
             1  active+clean+scrubbing+deep
    client:   6.6 MiB/s rd, 666 KiB/s wr, 84 op/s rd, 78 op/s wr

ceph tell osd.x bench
ceph tell osd.0 bench
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 11.266977012,
    "bytes_per_sec": 95299903.679256752,
    "iops": 22.721267623724163

ceph tell osd.1 bench
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 12.500486085,
    "bytes_per_sec": 85896005.699205577,
    "iops": 20.479203629304308

 ceph tell osd.2 bench
    "bytes_written": 1073741824,
    "blocksize": 4194304,
    "elapsed_sec": 11.687114086999999,
    "bytes_per_sec": 91873991.817566141,
    "iops": 21.904466585532699

I also moved some volumes off of the ceph-storage an it takes a VERY long time to copy. 10GB took about 30minutes copying it from the ceph-Pool (on SSDs) to the local NVME-storage.

The problems started, I think, when I exchanged the 3 SSDs. First had 3 Samsung EVO with 1TB connected to each node. I exchanged on by one by doing:
  1. disable backfilling-flag
  2. set OSD to down
  3. set OSD to out
  4. removed the OSD-entry from the cluster-manager
  5. shutdown the node and replace the drives
  6. add in the new OSD in the cluster-manager
  7. set OSD to in and up
  8. enabled backfilling-flag
  9. wait till backfilling finished and moved on to next node


  • 1695025370982.png
    151.3 KB · Views: 3
  • 1695025362143.png
    151.3 KB · Views: 3
  • 1695025348729.png
    151.3 KB · Views: 3
  • 1695025339842.png
    151.3 KB · Views: 4
Last edited:
Now the Apply/Commit Latency is very high (in my opinio):

the write-speed is also very low in my opinion with way under 1MiB/s:

I also adjusted (today) the value of the only CephFS-Pool i added by putting in a specific target Size, so it looks like this right now:

And this is the overview of all pools:
The ceph-setup is created by using (almost always) the standard values. I created the ceph-instance BEFORE I upgraded from Proxmox 7 to 8 and its running ceph-quincy now. The slow performance happens way after upgradeing the software-packages, os I don´t think this can be a reason.

I assume it's me not configuring ceph the right way, so the error sits in front of the device, because I´a newbie in the whole Ceph-Thing.

Right now I´m moving everything off of the ceph pool in case I need to recreate it and to test, if the performance is better on the local lvm storage.

Feel free to ask for further info and test I shall run

Thanks so much in advance for helping me out!
Last edited:


The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!