High latency in VMs, No latency on backend?


Renowned Member
Dec 22, 2015
3-Node Cluster with CEPH RDP backend. Note, this should not be due to Proxmox or CEPH updates, since it never changed while this issue started to occur.

Virtual Environment 8.0.3
Node 'VMHost2'
Day (maximum)
 CPU usage 2.20% of 24 CPU(s)
 IO delay 0.13%
 Load average 0.33,0.52,0.53
 RAM usage 20.21% (25.44 GiB of 125.87 GiB)
KSM sharing 0 B
 / HD space 25.89% (24.32 GiB of 93.93 GiB)
 SWAP usage 0.00% (0 B of 8.00 GiB)
CPU(s) 24 x Intel(R) Xeon(R) CPU X5679 @ 3.20GHz (2 Sockets)
Kernel Version Linux 6.2.16-3-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.16-3 (2023-06-17T05:58Z)
PVE Manager Version pve-manager/8.0.3/bbf3993334bfa916
Repository Status Proxmox VE updates Non production-ready repository enabled!
Server View
proxmox-ve: 8.0.1 (running kernel: 6.2.16-3-pve)
pve-manager: 8.0.3 (running version: 8.0.3/bbf3993334bfa916)
pve-kernel-6.2: 8.0.2
pve-kernel-5.15: 7.4-4
pve-kernel-6.2.16-3-pve: 6.2.16-3
pve-kernel-5.15.108-1-pve: 5.15.108-1
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph: 17.2.6-pve1+3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown: not correctly installed
ifupdown2: 3.2.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-3
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.0
libpve-access-control: 8.0.3
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.5
libpve-guest-common-perl: 5.0.3
libpve-http-server-perl: 5.0.3
libpve-rs-perl: 0.8.3
libpve-storage-perl: 8.0.2
libqb0: 1.0.5-1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.1-1
proxmox-backup-file-restore: 3.0.1-1
proxmox-kernel-helper: 8.0.2
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.1
proxmox-widget-toolkit: 4.0.5
pve-cluster: 8.0.1
pve-container: 5.0.4
pve-docs: 8.0.4
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.2
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.4
pve-qemu-kvm: 8.0.2-3
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1

I was doing scheduled monthly updates to all our devices and software and the read/write times increased considerably (over 20ms) on all of the VMs, whose cause I cannot pinpoint.

The only updates I was able to complete was some packages for our pfSense routers (which shouldn't have anything to do with this issue), our UniFi switches and some of the VMs, most of which I have shut down to try and find the culprit.

Since most of the services are write heavy, I will focus on those.

What I cannot understand is why the w_await (The average time (in milliseconds) for write requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.) differ completely on every VM from what CEPH shows as latency for the cluster.

Example average w_await from selected VMs (using iostat -x 5):

  1. 25 - 100 ms
  2. 15 - 40 ms
  3. 10 - 350 ms

However, when I check the latency on the ceph cluster, here is what the OSDs are showing (using ceph osd perf and also zabbix ceph plugin):

0 - 2 ms.

How can I go about diagnosing high latency on the VMs, when the backend itself apparently has no latency issues at all?
Please provide more details about your entire setup so that we can get an overview of what you have going on and better assess it.


The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!