Proxmox ceph pg inconsistent

senyapsudah

Member
Oct 22, 2013
8
0
21
Hi Guys,

I need some help. i have proxmox clusters of 10 servers. last few days our ceph cluster having problem. few disk seems corrupt which causing some of the pg goes to inconsistent status. Below are the output when i ran rados command.

it seems like the error that we are having is due to oi_attr_missing. is there suggestion on how can we correct this or how can we recover data or force it to store in other pg.

rados list-inconsistent-obj 2.2c0 --format=json-pretty
{
"epoch": 57580,
"inconsistents": [
{
"object": {
"name": "rbd_data.10815ea2ae8944a.0000000000000385",
"nspace": "",
"locator": "",
"snap": 55,
"version": 0
},
"errors": [],
"union_shard_errors": [
"missing",
"oi_attr_missing"
],
"shards": [
{
"osd": 10,
"errors": [
"oi_attr_missing"
],
"size": 4194304,
"omap_digest": "0xffffffff",
"data_digest": "0x32133b39"
},
{
"osd": 28,
"errors": [
"missing"
]
},
{
"osd": 37,
"errors": [
"missing"
]
}
]
},
{
"object": {
"name": "rbd_data.10815ea2ae8944a.0000000000000730",
"nspace": "",
"locator": "",
"snap": 55,
"version": 0
},
"errors": [],
"union_shard_errors": [
"missing",
"oi_attr_missing"
],
"shards": [
{
"osd": 10,
"errors": [
"oi_attr_missing"
],
"size": 4194304,
"omap_digest": "0xffffffff",
"data_digest": "0x0f843f64"
},
{
"osd": 28,
"errors": [
"missing"
]
},
{
"osd": 37,
"errors": [
"missing"
]
}
]
},
 
"osd": 10, "errors": ["oi_attr_missing"],
"osd": 28,"errors": ["missing"]
"osd": 37,"errors": ["missing"]
Did you already run a 'ceph pg deep-scrub <pgid>' and 'ceph pg repair <pgid>'?
Are there any entries in the syslog/journal that indicate if it is a hardware error?
 
Hi Alwin,

sorry for the late update. Yes, i have run repair command however it's end up with some errors. after few hours checking on the issue i notice it is due to 2 of the osd that are participate in the pg basically having very low weight. Which i believe it does not allow ceph to write to the disk. After increase the weight to the same level of hard disk size then issuing pg repair did solved the issue.