Hello all,
Few days ago I've been warned about a 1 pgs inconsistent; 1 scrub error on my PM4.4 cluster:
After investigation it appeared that PG 8.2f was faulted on OSD.1:
I tried to repair it (twice) using "ceph pg repair 8.2f" but it did not work at that time.
I've ordered a new disk in order to replace OSD.1 but just before starting to replace it I tried the repair command one last time and it worked:
What do you think about it ? Should I replace OSD.1 as planed when PG was faulty ? S.M.A.R.T did not showed any issue on the disk ... So I'm little puzzled.
Bonus question: What is the correct procedure to replace a failing but still UP/IN OSD with CEPH jewel ?
Thanks a lot in advance !
Olivier
Few days ago I've been warned about a 1 pgs inconsistent; 1 scrub error on my PM4.4 cluster:
root@pve2:~# ceph health detail
HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
pg 8.2f is active+clean+inconsistent, acting [0,1,2]
1 scrub errors
HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
pg 8.2f is active+clean+inconsistent, acting [0,1,2]
1 scrub errors
After investigation it appeared that PG 8.2f was faulted on OSD.1:
Code:
root@pve2:~# rados list-inconsistent-obj 8.2f --format=json-pretty
{
"epoch": 461,
"inconsistents": [
{
"object": {
"name": "rbd_data.4f23c2ae8944a.0000000000000263",
"nspace": "",
"locator": "",
"snap": "head"
},
"errors": [
"read_error"
],
"shards": [
{
"osd": 0,
"size": 4194304,
"omap_digest": "0xffffffff",
"data_digest": "0x56d22b99",
"errors": []
},
{
"osd": 1,
"size": 4194304,
"errors": [
"read_error"
]
},
{
"osd": 2,
"size": 4194304,
"omap_digest": "0xffffffff",
"data_digest": "0x56d22b99",
"errors": []
}
]
}
]
}
I've ordered a new disk in order to replace OSD.1 but just before starting to replace it I tried the repair command one last time and it worked:
root@pve2:~# ceph pg repair 8.2f
[...]
root@pve2:~# ceph health detail
HEALTH_OK
root@pve2:~# rados list-inconsistent-obj 8.2f --format=json-pretty
{
"epoch": 461,
"inconsistents": []
}
[...]
root@pve2:~# ceph health detail
HEALTH_OK
root@pve2:~# rados list-inconsistent-obj 8.2f --format=json-pretty
{
"epoch": 461,
"inconsistents": []
}
What do you think about it ? Should I replace OSD.1 as planed when PG was faulty ? S.M.A.R.T did not showed any issue on the disk ... So I'm little puzzled.
Bonus question: What is the correct procedure to replace a failing but still UP/IN OSD with CEPH jewel ?
Thanks a lot in advance !
Olivier
Last edited: