Hi!
I've got 3 node PVE cluster with CEPH.
1 pool, 4 OSDs per node
Yesterday CEPH started rabalancing and it still goes.
I think this because I've added abot 6Tb of data and autoscale changed pgs number from 32 to 128.
It's ok but I'm bit confused because it's doing recovery fine until 95% is reached and than it goes back to 94% again.
In the log I can see this (as you can see 4.99% and then 5.77% misplaced again):
2023-05-03T21:29:08.670288+0300 mgr.pve1 (mgr.13426191) 551 : cluster [DBG] pgmap v474: 129 pgs: 18 active+remapped+backfilling, 111 active+clean; 15 TiB data, 48 TiB used, 170 TiB / 218 TiB avail; 2.7 MiB/s rd, 2.5 MiB/s wr, 167 op/s; 602677/12051759 objects misplaced (5.001%); 164 MiB/s, 41 objects/s recovering
2023-05-03T21:29:10.670835+0300 mgr.pve1 (mgr.13426191) 552 : cluster [DBG] pgmap v475: 129 pgs: 18 active+remapped+backfilling, 111 active+clean; 15 TiB data, 48 TiB used, 170 TiB / 218 TiB avail; 2.5 MiB/s rd, 2.5 MiB/s wr, 159 op/s; 602571/12051759 objects misplaced (5.000%); 146 MiB/s, 37 objects/s recovering
2023-05-03T21:29:10.867345+0300 mon.pve1 (mon.0) 1002 : cluster [DBG] osdmap e3661: 12 total, 12 up, 12 in
2023-05-03T21:29:11.878821+0300 mon.pve1 (mon.0) 1003 : cluster [DBG] osdmap e3662: 12 total, 12 up, 12 in
2023-05-03T21:29:12.671381+0300 mgr.pve1 (mgr.13426191) 553 : cluster [DBG] pgmap v478: 129 pgs: 1 remapped+peering, 18 active+remapped+backfilling, 110 active+clean; 15 TiB data, 48 TiB used, 170 TiB / 218 TiB avail; 3.4 MiB/s rd, 3.6 MiB/s wr, 262 op/s; 602225/12051759 objects misplaced (4.997%); 194 MiB/s, 49 objects/s recovering
2023-05-03T21:29:12.881694+0300 mon.pve1 (mon.0) 1008 : cluster [DBG] osdmap e3663: 12 total, 12 up, 12 in
2023-05-03T21:29:12.899231+0300 osd.1 (osd.1) 2611 : cluster [DBG] 2.51 starting backfill to osd.3 from (0'0,0'0] MAX to 3660'22103919
2023-05-03T21:29:12.925535+0300 osd.1 (osd.1) 2612 : cluster [DBG] 2.51 starting backfill to osd.5 from (0'0,0'0] MAX to 3660'22103919
2023-05-03T21:29:12.948445+0300 osd.1 (osd.1) 2613 : cluster [DBG] 2.51 starting backfill to osd.10 from (0'0,0'0] MAX to 3660'22103919
2023-05-03T21:29:14.671807+0300 mgr.pve1 (mgr.13426191) 554 : cluster [DBG] pgmap v480: 129 pgs: 1 remapped+peering, 18 active+remapped+backfilling, 110 active+clean; 15 TiB data, 48 TiB used, 170 TiB / 218 TiB avail; 2.3 MiB/s rd, 2.5 MiB/s wr, 164 op/s; 602225/12051759 objects misplaced (4.997%); 131 MiB/s, 32 objects/s recovering
2023-05-03T21:29:16.672609+0300 mgr.pve1 (mgr.13426191) 555 : cluster [DBG] pgmap v481: 129 pgs: 1 remapped+peering, 18 active+remapped+backfilling, 110 active+clean; 15 TiB data, 48 TiB used, 170 TiB / 218 TiB avail; 3.2 MiB/s rd, 3.0 MiB/s wr, 228 op/s; 601957/12051759 objects misplaced (4.995%); 181 MiB/s, 45 objects/s recovering
2023-05-03T21:29:18.673223+0300 mgr.pve1 (mgr.13426191) 556 : cluster [DBG] pgmap v482: 129 pgs: 19 active+remapped+backfilling, 110 active+clean; 15 TiB data, 48 TiB used, 170 TiB / 218 TiB avail; 3.4 MiB/s rd, 2.7 MiB/s wr, 209 op/s; 695930/12051759 objects misplaced (5.775%); 196 MiB/s, 49 objects/s recovering
Is there any way to fix this?
Best regards, Alex
I've got 3 node PVE cluster with CEPH.
1 pool, 4 OSDs per node
Yesterday CEPH started rabalancing and it still goes.
I think this because I've added abot 6Tb of data and autoscale changed pgs number from 32 to 128.
It's ok but I'm bit confused because it's doing recovery fine until 95% is reached and than it goes back to 94% again.
In the log I can see this (as you can see 4.99% and then 5.77% misplaced again):
2023-05-03T21:29:08.670288+0300 mgr.pve1 (mgr.13426191) 551 : cluster [DBG] pgmap v474: 129 pgs: 18 active+remapped+backfilling, 111 active+clean; 15 TiB data, 48 TiB used, 170 TiB / 218 TiB avail; 2.7 MiB/s rd, 2.5 MiB/s wr, 167 op/s; 602677/12051759 objects misplaced (5.001%); 164 MiB/s, 41 objects/s recovering
2023-05-03T21:29:10.670835+0300 mgr.pve1 (mgr.13426191) 552 : cluster [DBG] pgmap v475: 129 pgs: 18 active+remapped+backfilling, 111 active+clean; 15 TiB data, 48 TiB used, 170 TiB / 218 TiB avail; 2.5 MiB/s rd, 2.5 MiB/s wr, 159 op/s; 602571/12051759 objects misplaced (5.000%); 146 MiB/s, 37 objects/s recovering
2023-05-03T21:29:10.867345+0300 mon.pve1 (mon.0) 1002 : cluster [DBG] osdmap e3661: 12 total, 12 up, 12 in
2023-05-03T21:29:11.878821+0300 mon.pve1 (mon.0) 1003 : cluster [DBG] osdmap e3662: 12 total, 12 up, 12 in
2023-05-03T21:29:12.671381+0300 mgr.pve1 (mgr.13426191) 553 : cluster [DBG] pgmap v478: 129 pgs: 1 remapped+peering, 18 active+remapped+backfilling, 110 active+clean; 15 TiB data, 48 TiB used, 170 TiB / 218 TiB avail; 3.4 MiB/s rd, 3.6 MiB/s wr, 262 op/s; 602225/12051759 objects misplaced (4.997%); 194 MiB/s, 49 objects/s recovering
2023-05-03T21:29:12.881694+0300 mon.pve1 (mon.0) 1008 : cluster [DBG] osdmap e3663: 12 total, 12 up, 12 in
2023-05-03T21:29:12.899231+0300 osd.1 (osd.1) 2611 : cluster [DBG] 2.51 starting backfill to osd.3 from (0'0,0'0] MAX to 3660'22103919
2023-05-03T21:29:12.925535+0300 osd.1 (osd.1) 2612 : cluster [DBG] 2.51 starting backfill to osd.5 from (0'0,0'0] MAX to 3660'22103919
2023-05-03T21:29:12.948445+0300 osd.1 (osd.1) 2613 : cluster [DBG] 2.51 starting backfill to osd.10 from (0'0,0'0] MAX to 3660'22103919
2023-05-03T21:29:14.671807+0300 mgr.pve1 (mgr.13426191) 554 : cluster [DBG] pgmap v480: 129 pgs: 1 remapped+peering, 18 active+remapped+backfilling, 110 active+clean; 15 TiB data, 48 TiB used, 170 TiB / 218 TiB avail; 2.3 MiB/s rd, 2.5 MiB/s wr, 164 op/s; 602225/12051759 objects misplaced (4.997%); 131 MiB/s, 32 objects/s recovering
2023-05-03T21:29:16.672609+0300 mgr.pve1 (mgr.13426191) 555 : cluster [DBG] pgmap v481: 129 pgs: 1 remapped+peering, 18 active+remapped+backfilling, 110 active+clean; 15 TiB data, 48 TiB used, 170 TiB / 218 TiB avail; 3.2 MiB/s rd, 3.0 MiB/s wr, 228 op/s; 601957/12051759 objects misplaced (4.995%); 181 MiB/s, 45 objects/s recovering
2023-05-03T21:29:18.673223+0300 mgr.pve1 (mgr.13426191) 556 : cluster [DBG] pgmap v482: 129 pgs: 19 active+remapped+backfilling, 110 active+clean; 15 TiB data, 48 TiB used, 170 TiB / 218 TiB avail; 3.4 MiB/s rd, 2.7 MiB/s wr, 209 op/s; 695930/12051759 objects misplaced (5.775%); 196 MiB/s, 49 objects/s recovering
Is there any way to fix this?
Best regards, Alex
Last edited: