[SOLVED] Ceph gradual growth in VM IOWaits

chrispage1

Member
Sep 1, 2021
90
46
23
32
Hi,

We've installed a 3 node Proxmox cluster running Ceph. On the cluster we have two VM's. When we reboot all PVE nodes, the IOWait of the VM's drops to pretty much nothing.

However over time, the IOWait creeps up - there is no load on these VM's. Any idea why this might be? As you can see it's a consistent increase and happens across both VMs. The circle I've highlighted is after all Ceph nodes are rebooted.

Any ideas why this might be / a way I can diagnose this?

1644673121346.png
 
Some more information about the setup would be nice. How many OSDs do you have? What disks are they?
What is the network setup?
What other hardware do the nodes have? CPU, memory,...

Anything else in the monitoring that correlates with that graph that could give an idea or point into a direction?
 
Hi Aaron, I've had a breakthrough on this today.

I have 3 nodes that were originally spec'd by our provider with Samsung EVO 870's (useless for Ceph) so we're in the process of swapping them out with Samsung PM893's.

We added one PM893 to each node just to test and this is where the benchmarks are coming from (we will be adding a further 3 OSD's to each machine). The network is 10G, CPU is Xeon E5-2697 v4 and each node has 128GB DDR4 RAM.

Anyway, after testing today it seems that with write caching enabled on the drives and cache type set to write back we were getting some erratic behaviour as in my original post. I've since disabled write caching on each drive and set the cache type to write through and everything is a lot more stable from VM IOWait to Ceph Apply/Commit latency.

For reference here is an updated graph of the IOWait's from the same VM -

1644937577806.png

The fix was put into place shortly after 10am this morning. To ensure the cache settings are persisted, I've created a systemd service called ceph-disk-cache.service. I've added the code below if this is of use to anyone.

Code:
# /etc/systemd/system/ceph-disk-cache.service
[Unit]
Description=Set Ceph disk cache setup on boot
After=local-fs.target
StartLimitIntervalSec=0

[Service]
Type=simple
ExecStart=/etc/init.d/ceph-disk-cache.sh

[Install]
WantedBy=multi-user.target

Code:
#!/bin/bash
# /etc/init.d/ceph-disk-cache.sh

DISKS=("0:0:2:0" "0:0:3:0" "0:0:4:0" "0:0:5:0")


for DISK in ${DISKS[@]}; do
  echo "Setting write through for SCSI ${DISK}"
  hdparm -W0 "/dev/disk/by-path/pci-0000:02:00.0-scsi-${DISK}"
  echo "write through" > "/sys/class/scsi_disk/${DISK}/cache_type"
done

echo "Done"
 
Last edited:
Cool thanks for the update! I went ahead and marked the thread as solved.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!