Hi everyone,
I have a problem with my ceph storage, it is showing pgs stuck unclean warning, I tried to repair pages, and to restart monitors and OSDs but nothing worked. All the problems occurred after an OSD became 95% full, and everything in my cluster stuck!, then I used the command (ceph pg set_full_ratio 0.98), deleted unused machines, and everything worked again, but still have these warnings.
more details about my case are shown as follow:
# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 7.23996 root default
-2 2.71999 host node01
0 2.71999 osd.0 up 1.00000 1.00000
-3 2.71999 host node02
1 2.71999 osd.1 up 1.00000 1.00000
-4 1.79999 host node03
2 1.79999 osd.2 up 0.55003 1.00000
# ceph health
HEALTH_WARN 77 pgs stuck unclean; recovery 46/949785 objects degraded (0.005%); recovery 152987/949785 objects misplaced (16.108%)
# ceph -s
cluster f877d510-6946-4a66-bfbb-06b0ee12ae28
health HEALTH_WARN
77 pgs stuck unclean
recovery 46/949785 objects degraded (0.005%)
recovery 152987/949785 objects misplaced (16.108%)
monmap e3: 3 mons at {0=10.1.1.1:6789/0,1=10.1.1.2:6789/0,2=10.1.1.3:6789/0}
election epoch 70, quorum 0,1,2 0,1,2
osdmap e304: 3 osds: 3 up, 3 in; 77 remapped pgs
pgmap v2222751: 160 pgs, 2 pools, 1201 GB data, 309 kobjects
3676 GB used, 3756 GB / 7433 GB avail
46/949785 objects degraded (0.005%)
152987/949785 objects misplaced (16.108%)
83 active+clean
77 active+remapped
client io 66399 kB/s rd, 851 kB/s wr, 1221 op/s
# ceph health detail
HEALTH_WARN 77 pgs stuck unclean; recovery 46/949785 objects degraded (0.005%); recovery 152987/949785 objects misplaced (16.108%)
pg 4.25 is stuck unclean for 121582.717747, current state active+remapped, last acting [0,1,2]
pg 5.1a is stuck unclean for 118635.513579, current state active+remapped, last acting [0,1,2]
pg 4.1b is stuck unclean for 121589.276017, current state active+remapped, last acting [1,0,2]
pg 4.1a is stuck unclean for 121587.037792, current state active+remapped, last acting [1,0,2]
pg 5.1b is stuck unclean for 118676.177113, current state active+remapped, last acting [0,1,2]
pg 4.7a is stuck unclean for 116027.140499, current state active+remapped, last acting [1,0,2]
pg 4.79 is stuck unclean for 115386.851628, current state active+remapped, last acting [1,0,2]
pg 5.1e is stuck unclean for 116462.007267, current state active+remapped, last acting [1,0,2]
pg 4.78 is stuck unclean for 121555.036604, current state active+remapped, last acting [0,1,2]
pg 4.1e is stuck unclean for 116520.298145, current state active+remapped, last acting [1,0,2]
pg 4.1d is stuck unclean for 121587.158490, current state active+remapped, last acting [0,1,2]
pg 4.7e is stuck unclean for 121586.939474, current state active+remapped, last acting [1,0,2]
pg 4.1c is stuck unclean for 121586.202691, current state active+remapped, last acting [1,0,2]
pg 4.13 is stuck unclean for 115386.853358, current state active+remapped, last acting [1,0,2]
pg 5.12 is stuck unclean for 116462.007466, current state active+remapped, last acting [1,0,2]
pg 4.7c is stuck unclean for 121581.825483, current state active+remapped, last acting [0,1,2]
pg 5.10 is stuck unclean for 121596.099742, current state active+remapped, last acting [1,0,2]
pg 4.10 is stuck unclean for 116027.202342, current state active+remapped, last acting [1,0,2]
pg 4.71 is stuck unclean for 121586.364382, current state active+remapped, last acting [1,0,2]
pg 5.16 is stuck unclean for 121591.441230, current state active+remapped, last acting [1,0,2]
pg 4.77 is stuck unclean for 121584.143843, current state active+remapped, last acting [0,1,2]
pg 5.14 is stuck unclean for 119195.905471, current state active+remapped, last acting [0,1,2]
pg 4.75 is stuck unclean for 121584.384698, current state active+remapped, last acting [0,1,2]
pg 4.b is stuck unclean for 120632.338610, current state active+remapped, last acting [0,1,2]
pg 5.b is stuck unclean for 118672.980616, current state active+remapped, last acting [0,1,2]
pg 4.a is stuck unclean for 121590.361216, current state active+remapped, last acting [1,0,2]
pg 4.6a is stuck unclean for 116520.297389, current state active+remapped, last acting [1,0,2]
pg 4.9 is stuck unclean for 121581.842716, current state active+remapped, last acting [0,1,2]
pg 5.9 is stuck unclean for 119866.168159, current state active+remapped, last acting [0,1,2]
pg 5.e is stuck unclean for 118641.998274, current state active+remapped, last acting [0,1,2]
pg 5.f is stuck unclean for 115816.478902, current state active+remapped, last acting [1,0,2]
pg 5.c is stuck unclean for 116035.945866, current state active+remapped, last acting [1,0,2]
pg 4.d is stuck unclean for 121583.616507, current state active+remapped, last acting [0,1,2]
pg 4.6d is stuck unclean for 120850.772815, current state active+remapped, last acting [0,1,2]
pg 4.c is stuck unclean for 116520.297148, current state active+remapped, last acting [1,0,2]
pg 4.6c is stuck unclean for 121590.714610, current state active+remapped, last acting [0,1,2]
pg 4.3 is stuck unclean for 121556.453100, current state active+remapped, last acting [0,1,2]
pg 4.63 is stuck unclean for 121582.568779, current state active+remapped, last acting [0,1,2]
pg 5.3 is stuck unclean for 116035.902051, current state active+remapped, last acting [1,0,2]
pg 4.2 is stuck unclean for 121581.835128, current state active+remapped, last acting [0,1,2]
pg 4.62 is stuck unclean for 116027.098725, current state active+remapped, last acting [1,0,2]
pg 5.0 is stuck unclean for 118685.737689, current state active+remapped, last acting [0,1,2]
pg 4.1 is stuck unclean for 121585.405808, current state active+remapped, last acting [1,0,2]
pg 4.61 is stuck unclean for 121581.947941, current state active+remapped, last acting [0,1,2]
pg 4.0 is stuck unclean for 121582.869185, current state active+remapped, last acting [0,1,2]
pg 4.60 is stuck unclean for 121603.161066, current state active+remapped, last acting [0,1,2]
pg 5.6 is stuck unclean for 116462.006376, current state active+remapped, last acting [1,0,2]
pg 4.7 is stuck unclean for 116027.087510, current state active+remapped, last acting [1,0,2]
pg 4.6 is stuck unclean for 120751.693971, current state active+remapped, last acting [0,1,2]
pg 4.65 is stuck unclean for 116027.086255, current state active+remapped, last acting [1,0,2]
pg 4.5a is stuck unclean for 121584.771439, current state active+remapped, last acting [0,1,2]
pg 4.5c is stuck unclean for 121584.108782, current state active+remapped, last acting [0,1,2]
pg 4.53 is stuck unclean for 121582.627265, current state active+remapped, last acting [1,0,2]
pg 4.52 is stuck unclean for 115290.593727, current state active+remapped, last acting [1,0,2]
pg 4.51 is stuck unclean for 121555.698662, current state active+remapped, last acting [0,1,2]
pg 4.57 is stuck unclean for 121582.464896, current state active+remapped, last acting [0,1,2]
pg 4.4b is stuck unclean for 121582.762554, current state active+remapped, last acting [1,0,2]
pg 4.4a is stuck unclean for 121595.675892, current state active+remapped, last acting [0,1,2]
pg 4.49 is stuck unclean for 121581.922555, current state active+remapped, last acting [0,1,2]
pg 4.48 is stuck unclean for 119258.014499, current state active+remapped, last acting [0,1,2]
pg 4.4f is stuck unclean for 121594.400713, current state active+remapped, last acting [1,0,2]
pg 4.4c is stuck unclean for 116520.297840, current state active+remapped, last acting [1,0,2]
pg 4.43 is stuck unclean for 116520.297863, current state active+remapped, last acting [1,0,2]
pg 4.41 is stuck unclean for 116027.068146, current state active+remapped, last acting [1,0,2]
pg 4.40 is stuck unclean for 116520.297938, current state active+remapped, last acting [1,0,2]
pg 4.38 is stuck unclean for 120226.454185, current state active+remapped, last acting [0,1,2]
pg 4.3e is stuck unclean for 121581.861168, current state active+remapped, last acting [1,0,2]
pg 4.31 is stuck unclean for 121583.502541, current state active+remapped, last acting [1,0,2]
pg 4.36 is stuck unclean for 121582.880836, current state active+remapped, last acting [1,0,2]
pg 4.2b is stuck unclean for 121582.990050, current state active+remapped, last acting [1,0,2]
pg 4.29 is stuck unclean for 121582.880635, current state active+remapped, last acting [1,0,2]
pg 4.28 is stuck unclean for 121587.158553, current state active+remapped, last acting [0,1,2]
pg 4.2e is stuck unclean for 121582.880683, current state active+remapped, last acting [1,0,2]
pg 4.2d is stuck unclean for 121553.777639, current state active+remapped, last acting [0,1,2]
pg 4.23 is stuck unclean for 116520.298495, current state active+remapped, last acting [1,0,2]
pg 4.21 is stuck unclean for 116520.298558, current state active+remapped, last acting [1,0,2]
pg 4.27 is stuck unclean for 121582.065714, current state active+remapped, last acting [0,1,2]
recovery 46/949785 objects degraded (0.005%)
recovery 152987/949785 objects misplaced (16.108%)
Logs showing :
#tail -f /var/log/ceph/ceph-mon.0.log
2017-05-18 03:53:50.539006 7f826e394700 0 log_channel(cluster) log [INF] : pgmap v2222822: 160 pgs: 77 active+remapped, 83 active+clean; 1201 GB data, 3676 GB used, 3756 GB / 7433 GB avail; 14667 kB/s rd, 284 kB/s wr, 300 op/s; 46/949785 objects degraded (0.005%); 152987/949785 objects misplaced (16.108%)
2017-05-18 03:53:51.545615 7f826e394700 0 log_channel(cluster) log [INF] : pgmap v2222823: 160 pgs: 77 active+remapped, 83 active+clean; 1201 GB data, 3676 GB used, 3756 GB / 7433 GB avail; 31352 kB/s rd, 824 kB/s wr, 675 op/s; 46/949785 objects degraded (0.005%); 152987/949785 objects misplaced (16.108%)
2017-05-18 03:53:52.552068 7f826e394700 0 log_channel(cluster) log [INF] : pgmap v2222824: 160 pgs: 77 active+remapped, 83 active+clean; 1201 GB data, 3676 GB used, 3756 GB / 7433 GB avail; 89268 kB/s rd, 2659 kB/s wr, 1947 op/s; 46/949785 objects degraded (0.005%); 152987/949785 objects misplaced (16.108%)
2017-05-18 03:53:55.627772 7f826e394700 0 log_channel(cluster) log [INF] : pgmap v2222825: 160 pgs: 77 active+remapped, 83 active+clean; 1201 GB data, 3676 GB used, 3756 GB / 7433 GB avail; 26179 kB/s rd, 1133 kB/s wr, 624 op/s; 46/949785 objects degraded (0.005%); 152987/949785 objects misplaced (16.108%)
2017-05-18 03:53:56.633148 7f826e394700 0 log_channel(cluster) log [INF] : pgmap v2222826: 160 pgs: 77 active+remapped, 83 active+clean; 1201 GB data, 3676 GB used, 3756 GB / 7433 GB avail; 22057 kB/s rd, 1042 kB/s wr, 542 op/s; 46/949785 objects degraded (0.005%); 152987/949785 objects misplaced (16.108%)
2017-05-18 03:53:57.636356 7f826e394700 0 log_channel(cluster) log [INF] : pgmap v2222827: 160 pgs: 77 active+remapped, 83 active+clean; 1201 GB data, 3676 GB used, 3756 GB / 7433 GB avail; 69140 kB/s rd, 2991 kB/s wr, 1597 op/s; 46/949785 objects degraded (0.005%); 152987/949785 objects misplaced (16.108%)
How can I fix this warning??
thanks in advance for your replies
I have a problem with my ceph storage, it is showing pgs stuck unclean warning, I tried to repair pages, and to restart monitors and OSDs but nothing worked. All the problems occurred after an OSD became 95% full, and everything in my cluster stuck!, then I used the command (ceph pg set_full_ratio 0.98), deleted unused machines, and everything worked again, but still have these warnings.
more details about my case are shown as follow:
# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 7.23996 root default
-2 2.71999 host node01
0 2.71999 osd.0 up 1.00000 1.00000
-3 2.71999 host node02
1 2.71999 osd.1 up 1.00000 1.00000
-4 1.79999 host node03
2 1.79999 osd.2 up 0.55003 1.00000
# ceph health
HEALTH_WARN 77 pgs stuck unclean; recovery 46/949785 objects degraded (0.005%); recovery 152987/949785 objects misplaced (16.108%)
# ceph -s
cluster f877d510-6946-4a66-bfbb-06b0ee12ae28
health HEALTH_WARN
77 pgs stuck unclean
recovery 46/949785 objects degraded (0.005%)
recovery 152987/949785 objects misplaced (16.108%)
monmap e3: 3 mons at {0=10.1.1.1:6789/0,1=10.1.1.2:6789/0,2=10.1.1.3:6789/0}
election epoch 70, quorum 0,1,2 0,1,2
osdmap e304: 3 osds: 3 up, 3 in; 77 remapped pgs
pgmap v2222751: 160 pgs, 2 pools, 1201 GB data, 309 kobjects
3676 GB used, 3756 GB / 7433 GB avail
46/949785 objects degraded (0.005%)
152987/949785 objects misplaced (16.108%)
83 active+clean
77 active+remapped
client io 66399 kB/s rd, 851 kB/s wr, 1221 op/s
# ceph health detail
HEALTH_WARN 77 pgs stuck unclean; recovery 46/949785 objects degraded (0.005%); recovery 152987/949785 objects misplaced (16.108%)
pg 4.25 is stuck unclean for 121582.717747, current state active+remapped, last acting [0,1,2]
pg 5.1a is stuck unclean for 118635.513579, current state active+remapped, last acting [0,1,2]
pg 4.1b is stuck unclean for 121589.276017, current state active+remapped, last acting [1,0,2]
pg 4.1a is stuck unclean for 121587.037792, current state active+remapped, last acting [1,0,2]
pg 5.1b is stuck unclean for 118676.177113, current state active+remapped, last acting [0,1,2]
pg 4.7a is stuck unclean for 116027.140499, current state active+remapped, last acting [1,0,2]
pg 4.79 is stuck unclean for 115386.851628, current state active+remapped, last acting [1,0,2]
pg 5.1e is stuck unclean for 116462.007267, current state active+remapped, last acting [1,0,2]
pg 4.78 is stuck unclean for 121555.036604, current state active+remapped, last acting [0,1,2]
pg 4.1e is stuck unclean for 116520.298145, current state active+remapped, last acting [1,0,2]
pg 4.1d is stuck unclean for 121587.158490, current state active+remapped, last acting [0,1,2]
pg 4.7e is stuck unclean for 121586.939474, current state active+remapped, last acting [1,0,2]
pg 4.1c is stuck unclean for 121586.202691, current state active+remapped, last acting [1,0,2]
pg 4.13 is stuck unclean for 115386.853358, current state active+remapped, last acting [1,0,2]
pg 5.12 is stuck unclean for 116462.007466, current state active+remapped, last acting [1,0,2]
pg 4.7c is stuck unclean for 121581.825483, current state active+remapped, last acting [0,1,2]
pg 5.10 is stuck unclean for 121596.099742, current state active+remapped, last acting [1,0,2]
pg 4.10 is stuck unclean for 116027.202342, current state active+remapped, last acting [1,0,2]
pg 4.71 is stuck unclean for 121586.364382, current state active+remapped, last acting [1,0,2]
pg 5.16 is stuck unclean for 121591.441230, current state active+remapped, last acting [1,0,2]
pg 4.77 is stuck unclean for 121584.143843, current state active+remapped, last acting [0,1,2]
pg 5.14 is stuck unclean for 119195.905471, current state active+remapped, last acting [0,1,2]
pg 4.75 is stuck unclean for 121584.384698, current state active+remapped, last acting [0,1,2]
pg 4.b is stuck unclean for 120632.338610, current state active+remapped, last acting [0,1,2]
pg 5.b is stuck unclean for 118672.980616, current state active+remapped, last acting [0,1,2]
pg 4.a is stuck unclean for 121590.361216, current state active+remapped, last acting [1,0,2]
pg 4.6a is stuck unclean for 116520.297389, current state active+remapped, last acting [1,0,2]
pg 4.9 is stuck unclean for 121581.842716, current state active+remapped, last acting [0,1,2]
pg 5.9 is stuck unclean for 119866.168159, current state active+remapped, last acting [0,1,2]
pg 5.e is stuck unclean for 118641.998274, current state active+remapped, last acting [0,1,2]
pg 5.f is stuck unclean for 115816.478902, current state active+remapped, last acting [1,0,2]
pg 5.c is stuck unclean for 116035.945866, current state active+remapped, last acting [1,0,2]
pg 4.d is stuck unclean for 121583.616507, current state active+remapped, last acting [0,1,2]
pg 4.6d is stuck unclean for 120850.772815, current state active+remapped, last acting [0,1,2]
pg 4.c is stuck unclean for 116520.297148, current state active+remapped, last acting [1,0,2]
pg 4.6c is stuck unclean for 121590.714610, current state active+remapped, last acting [0,1,2]
pg 4.3 is stuck unclean for 121556.453100, current state active+remapped, last acting [0,1,2]
pg 4.63 is stuck unclean for 121582.568779, current state active+remapped, last acting [0,1,2]
pg 5.3 is stuck unclean for 116035.902051, current state active+remapped, last acting [1,0,2]
pg 4.2 is stuck unclean for 121581.835128, current state active+remapped, last acting [0,1,2]
pg 4.62 is stuck unclean for 116027.098725, current state active+remapped, last acting [1,0,2]
pg 5.0 is stuck unclean for 118685.737689, current state active+remapped, last acting [0,1,2]
pg 4.1 is stuck unclean for 121585.405808, current state active+remapped, last acting [1,0,2]
pg 4.61 is stuck unclean for 121581.947941, current state active+remapped, last acting [0,1,2]
pg 4.0 is stuck unclean for 121582.869185, current state active+remapped, last acting [0,1,2]
pg 4.60 is stuck unclean for 121603.161066, current state active+remapped, last acting [0,1,2]
pg 5.6 is stuck unclean for 116462.006376, current state active+remapped, last acting [1,0,2]
pg 4.7 is stuck unclean for 116027.087510, current state active+remapped, last acting [1,0,2]
pg 4.6 is stuck unclean for 120751.693971, current state active+remapped, last acting [0,1,2]
pg 4.65 is stuck unclean for 116027.086255, current state active+remapped, last acting [1,0,2]
pg 4.5a is stuck unclean for 121584.771439, current state active+remapped, last acting [0,1,2]
pg 4.5c is stuck unclean for 121584.108782, current state active+remapped, last acting [0,1,2]
pg 4.53 is stuck unclean for 121582.627265, current state active+remapped, last acting [1,0,2]
pg 4.52 is stuck unclean for 115290.593727, current state active+remapped, last acting [1,0,2]
pg 4.51 is stuck unclean for 121555.698662, current state active+remapped, last acting [0,1,2]
pg 4.57 is stuck unclean for 121582.464896, current state active+remapped, last acting [0,1,2]
pg 4.4b is stuck unclean for 121582.762554, current state active+remapped, last acting [1,0,2]
pg 4.4a is stuck unclean for 121595.675892, current state active+remapped, last acting [0,1,2]
pg 4.49 is stuck unclean for 121581.922555, current state active+remapped, last acting [0,1,2]
pg 4.48 is stuck unclean for 119258.014499, current state active+remapped, last acting [0,1,2]
pg 4.4f is stuck unclean for 121594.400713, current state active+remapped, last acting [1,0,2]
pg 4.4c is stuck unclean for 116520.297840, current state active+remapped, last acting [1,0,2]
pg 4.43 is stuck unclean for 116520.297863, current state active+remapped, last acting [1,0,2]
pg 4.41 is stuck unclean for 116027.068146, current state active+remapped, last acting [1,0,2]
pg 4.40 is stuck unclean for 116520.297938, current state active+remapped, last acting [1,0,2]
pg 4.38 is stuck unclean for 120226.454185, current state active+remapped, last acting [0,1,2]
pg 4.3e is stuck unclean for 121581.861168, current state active+remapped, last acting [1,0,2]
pg 4.31 is stuck unclean for 121583.502541, current state active+remapped, last acting [1,0,2]
pg 4.36 is stuck unclean for 121582.880836, current state active+remapped, last acting [1,0,2]
pg 4.2b is stuck unclean for 121582.990050, current state active+remapped, last acting [1,0,2]
pg 4.29 is stuck unclean for 121582.880635, current state active+remapped, last acting [1,0,2]
pg 4.28 is stuck unclean for 121587.158553, current state active+remapped, last acting [0,1,2]
pg 4.2e is stuck unclean for 121582.880683, current state active+remapped, last acting [1,0,2]
pg 4.2d is stuck unclean for 121553.777639, current state active+remapped, last acting [0,1,2]
pg 4.23 is stuck unclean for 116520.298495, current state active+remapped, last acting [1,0,2]
pg 4.21 is stuck unclean for 116520.298558, current state active+remapped, last acting [1,0,2]
pg 4.27 is stuck unclean for 121582.065714, current state active+remapped, last acting [0,1,2]
recovery 46/949785 objects degraded (0.005%)
recovery 152987/949785 objects misplaced (16.108%)
Logs showing :
#tail -f /var/log/ceph/ceph-mon.0.log
2017-05-18 03:53:50.539006 7f826e394700 0 log_channel(cluster) log [INF] : pgmap v2222822: 160 pgs: 77 active+remapped, 83 active+clean; 1201 GB data, 3676 GB used, 3756 GB / 7433 GB avail; 14667 kB/s rd, 284 kB/s wr, 300 op/s; 46/949785 objects degraded (0.005%); 152987/949785 objects misplaced (16.108%)
2017-05-18 03:53:51.545615 7f826e394700 0 log_channel(cluster) log [INF] : pgmap v2222823: 160 pgs: 77 active+remapped, 83 active+clean; 1201 GB data, 3676 GB used, 3756 GB / 7433 GB avail; 31352 kB/s rd, 824 kB/s wr, 675 op/s; 46/949785 objects degraded (0.005%); 152987/949785 objects misplaced (16.108%)
2017-05-18 03:53:52.552068 7f826e394700 0 log_channel(cluster) log [INF] : pgmap v2222824: 160 pgs: 77 active+remapped, 83 active+clean; 1201 GB data, 3676 GB used, 3756 GB / 7433 GB avail; 89268 kB/s rd, 2659 kB/s wr, 1947 op/s; 46/949785 objects degraded (0.005%); 152987/949785 objects misplaced (16.108%)
2017-05-18 03:53:55.627772 7f826e394700 0 log_channel(cluster) log [INF] : pgmap v2222825: 160 pgs: 77 active+remapped, 83 active+clean; 1201 GB data, 3676 GB used, 3756 GB / 7433 GB avail; 26179 kB/s rd, 1133 kB/s wr, 624 op/s; 46/949785 objects degraded (0.005%); 152987/949785 objects misplaced (16.108%)
2017-05-18 03:53:56.633148 7f826e394700 0 log_channel(cluster) log [INF] : pgmap v2222826: 160 pgs: 77 active+remapped, 83 active+clean; 1201 GB data, 3676 GB used, 3756 GB / 7433 GB avail; 22057 kB/s rd, 1042 kB/s wr, 542 op/s; 46/949785 objects degraded (0.005%); 152987/949785 objects misplaced (16.108%)
2017-05-18 03:53:57.636356 7f826e394700 0 log_channel(cluster) log [INF] : pgmap v2222827: 160 pgs: 77 active+remapped, 83 active+clean; 1201 GB data, 3676 GB used, 3756 GB / 7433 GB avail; 69140 kB/s rd, 2991 kB/s wr, 1597 op/s; 46/949785 objects degraded (0.005%); 152987/949785 objects misplaced (16.108%)
How can I fix this warning??
thanks in advance for your replies