Hi,
We have a Proxmox cluster with 5 hypervisors. We are using Ceph:
- 3 Ceph monitors
- 47 OSD
- 2 PG (rbd & ceph hdd)
- 3 replicas
Each HV has 4x 10Gbit
2x 10Gbit bond for network
2x 10Gbit bond for storage
For some reason our hypervisors are under heavy IO Delay:
HV01
CPU usage 11% of 72 CPU(s)
Load average 10.60, 10.95, 10.85
IO delay 1.35%
RAM usage 72% of 377.76 GiB
SWAP usage 81,75% of 8GiB
HV02
CPU usage 21% of 72 CPU(s)
Load average 10.65, 10.80, 16.85
IO delay 2.10%
RAM usage 67% of 377.76 GiB
SWAP usage 56,90% of 8GiB
HV03
CPU usage 24% of 72 CPU(s)
Load average 20.33, 19.00, 18.25
IO delay 1.70%
RAM usage 67% of 377.76 GiB
SWAP usage 78,86% of 8GiB
HV04
CPU usage 29% of 40 CPU(s)
Load average 14.03, 13.33, 13.54
IO delay 0.49%
RAM usage 35.77% of 566.82 GiB
SWAP usage 00.00% of 8GiB
HV05
CPU usage 39% of 40 CPU(s)
Load average 14.03, 14.33, 14.08
IO delay 0.09%
RAM usage 45.77% of 377.82 GiB
SWAP usage N/A
The CPU usage is low but for some reason, we have a high IO delay. If we perform a backup/clone it goes up with 5-10%. What is causing this high load? any suggestions.
We have a Proxmox cluster with 5 hypervisors. We are using Ceph:
- 3 Ceph monitors
- 47 OSD
- 2 PG (rbd & ceph hdd)
- 3 replicas
Each HV has 4x 10Gbit
2x 10Gbit bond for network
2x 10Gbit bond for storage
For some reason our hypervisors are under heavy IO Delay:
HV01
CPU usage 11% of 72 CPU(s)
Load average 10.60, 10.95, 10.85
IO delay 1.35%
RAM usage 72% of 377.76 GiB
SWAP usage 81,75% of 8GiB
HV02
CPU usage 21% of 72 CPU(s)
Load average 10.65, 10.80, 16.85
IO delay 2.10%
RAM usage 67% of 377.76 GiB
SWAP usage 56,90% of 8GiB
HV03
CPU usage 24% of 72 CPU(s)
Load average 20.33, 19.00, 18.25
IO delay 1.70%
RAM usage 67% of 377.76 GiB
SWAP usage 78,86% of 8GiB
HV04
CPU usage 29% of 40 CPU(s)
Load average 14.03, 13.33, 13.54
IO delay 0.49%
RAM usage 35.77% of 566.82 GiB
SWAP usage 00.00% of 8GiB
HV05
CPU usage 39% of 40 CPU(s)
Load average 14.03, 14.33, 14.08
IO delay 0.09%
RAM usage 45.77% of 377.82 GiB
SWAP usage N/A
The CPU usage is low but for some reason, we have a high IO delay. If we perform a backup/clone it goes up with 5-10%. What is causing this high load? any suggestions.
Last edited: