Hi,
3-node hyperconverged cluster, PVE 8.x, Ceph Reef. Each node has:
- 2x NVMe (DB/WAL devices)
- 6x 4TB SAS HDD (OSDs, bluestore, db on NVMe partition)
- separate 10GbE cluster network
I enabled the ceph balancer in upmap mode a few days ago because PG distribution
was uneven (some OSDs at 65% used, some at 38%):
ceph balancer mode upmap
ceph balancer on
Since then I see periodic "slow ops" warnings during balancer activity, mostly on
the HDD OSDs:
HEALTH_WARN 1 slow ops, oldest one blocked for 34 sec, osd.7 has slow ops
Cluster is otherwise healthy, no scrub errors, network looks clean (no
retransmits, MTU 9000 end-to-end and verified).
Is this expected on HDD-backed OSDs during rebalancing, or am I missing a tuning
knob? VMs on the pool are not screaming yet but I'd rather fix it before they do.
3-node hyperconverged cluster, PVE 8.x, Ceph Reef. Each node has:
- 2x NVMe (DB/WAL devices)
- 6x 4TB SAS HDD (OSDs, bluestore, db on NVMe partition)
- separate 10GbE cluster network
I enabled the ceph balancer in upmap mode a few days ago because PG distribution
was uneven (some OSDs at 65% used, some at 38%):
ceph balancer mode upmap
ceph balancer on
Since then I see periodic "slow ops" warnings during balancer activity, mostly on
the HDD OSDs:
HEALTH_WARN 1 slow ops, oldest one blocked for 34 sec, osd.7 has slow ops
Cluster is otherwise healthy, no scrub errors, network looks clean (no
retransmits, MTU 9000 end-to-end and verified).
Is this expected on HDD-backed OSDs during rebalancing, or am I missing a tuning
knob? VMs on the pool are not screaming yet but I'd rather fix it before they do.