Why do I see so high IO delays?

proxwolfe

Renowned Member
Jun 20, 2020
546
67
68
50
Hi,

I am running a little cluster in my homelab:

2x PVE on Xeon E3-1220 with 64GB Ram each
1x PVE on virtual Xeon E3-1220 mit 48GB (running in a VM on a real Xeon E3-1220 with 64GB; the physical machine also houses a PBS)

Each node has a 512GB nvme drive as part of a Ceph pool for VM storage and a 3GB HDD as part of a Ceph pool for data storage. The PVE cluster and the Ceph cluster each have their own 10gbe network.

On each physical node I have approx. 10 VMs running and I see CPU usage of around 10% (peaking at 20% once in a while). My IO delay oscillates around 5% (peaking at 10% once in a while).

Why is my IO delay so high? How can I improve (reduce) IO delay? (Adding more disks would not be my preferred option.)

Thanks!
 
1x PVE on virtual Xeon E3-1220 mit 48GB (running in a VM on a real Xeon E3-1220 with 64GB; the physical machine also houses a PBS)
Why? PVE in a VM is a bad idea, because of nested virtualization. It's totally fine to run PVE+PBS both bare metal on the same server. How to install PBS on a PVE node is described there: https://pbs.proxmox.com/docs/installation.html#install-proxmox-backup-server-on-proxmox-ve

My IO delay oscillates around 5% (peaking at 10% once in a while).

Why is my IO delay so high?
That isn't that high. I'm seeing 0-5% with local SSDs and 30+% with local HDDs. So a mix of SSDs and HDDs and that all over the network because of ceph doesn't sound that unresonable.
 
Well, a 3 node cluster is already overkill for my purposes. So I figured I would only run two physical nodes and a quorum device instead of the third. But then I thought, I might as well virtualize the third node on the machine I use for PBS. And I think one needs a minimum of three nodes for Ceph. And I wanted to keep the machine I use for PBS outside the cluster so that if anything happens to the cluster, I can still access my backups.

PVE in a VM is a bad idea, because of nested virtualization.
What is the issue with nested virtualization? I would expect there to be a performance penalty but otherwise... And when it comes to performance, the virtual node actually has the lowest IO delay of the three (oscillating around 4%). So far, I am happy with my decision.

It's totally fine to run PVE+PBS both bare metal on the same server. How to install PBS on a PVE node is described there: https://pbs.proxmox.com/docs/installation.html#install-proxmox-backup-server-on-proxmox-ve
That's how I am running it. It's just that I run another PVE (node 3) inside PVE.

That isn't that high. I'm seeing 0-5% with local SSDs and 30+% with local HDDs. So a mix of SSDs and HDDs and that all over the network because of ceph doesn't sound that unresonable.
Oh, I thought 5% was on the high end of what is acceptable. It is good to know your numbers to put mine into perspective! Thanks for that.