Hi,
I have a PVE cluster with 7 hosts of which each host has 2 16tb HDDs.
The HDDs all use NVMEs as DB discs.
There are no running VMs on the HDDs. They are only used as cold storage.
A few days ago I had to swap 2 of these HDDs on PVE1. And since I already had the server open, I added two 16TB discs.
Since then the recovery speed has been terrible.
I tried the following without success and then reset to the default values:
I have a PVE cluster with 7 hosts of which each host has 2 16tb HDDs.
The HDDs all use NVMEs as DB discs.
There are no running VMs on the HDDs. They are only used as cold storage.
A few days ago I had to swap 2 of these HDDs on PVE1. And since I already had the server open, I added two 16TB discs.
Since then the recovery speed has been terrible.
I tried the following without success and then reset to the default values:
- disable delay between recovery operations on hdds
- osd_max_backfills changed to different values
- osd_recovery_max_active changed to different values
Code:
root@pve1:~# ceph status
cluster:
id: f9ffe0dc-5126-4e05-92d9-18018cdae35a
health: HEALTH_WARN
nodeep-scrub flag(s) set
Degraded data redundancy: 27263656/224226776 objects degraded (12.159%), 52 pgs degraded, 59 pgs undersized
1 pgs not deep-scrubbed in time
services:
mon: 3 daemons, quorum pve3,pve5,pve1 (age 45h)
mgr: pve1(active, since 45h)
mds: 2/2 daemons up, 1 standby
osd: 44 osds: 44 up (since 45h), 38 in (since 75m); 127 remapped pgs
flags nodeep-scrub
data:
volumes: 2/2 healthy
pools: 12 pools, 369 pgs
objects: 39.66M objects, 58 TiB
usage: 93 TiB used, 209 TiB / 302 TiB avail
pgs: 27263656/224226776 objects degraded (12.159%)
48876253/224226776 objects misplaced (21.798%)
235 active+clean
50 active+undersized+degraded+remapped+backfill_wait
45 active+remapped+backfill_wait
25 active+clean+remapped
7 active+undersized
5 active+remapped+backfilling
2 active+undersized+degraded+remapped+backfilling
io:
client: 1.1 MiB/s rd, 29 MiB/s wr, 72 op/s rd, 360 op/s wr
recovery: 18 MiB/s, 10 objects/s