Ceph: active+clean+inconsistent

gdi2k

Active Member
Aug 13, 2016
83
1
28
I have an issue where I have 1 pg in my ceph cluster marked as:
Code:
pg 2.3d is active+clean+inconsistent, acting [1,5,3]

I have tried doing ceph pg repair 2.3d but no success. I am following this guide to fix it:
https://ceph.com/geen-categorie/ceph-manually-repair-object/

I have identified the object in the logs successfully, but when I try to find it in /var/lib/ceph/osd/ceph-1/current/ I see that the directory "current' doesn't exist, so I'm stuck. I can only think that the guide was written for an older version of ceph? Where can I find the ceph objects now?

Versions:
PVE 5.2-10
ceph version 12.2.8 (6f01265ca03a6b9d7f3b7f759d8894bb9dbb6840) luminous (stable)
 
Hello,

I guess you using Bluestore, this is why you do not see files (bluestore using RAW devices instead of files).

When you started the repair command?

How many replication number do you using on your pools? 2 or 3?
 
It has been a couple of days since I tried repair. At the time I only had replication set to 2. I have since changed it to 3, but it was two when the issue arose.

I can see from the logs that the primary is unable to to be read, but the other copies contain good data.

I am using bluestore, yes. Is there a way to deal with this using bluestore?
 
I've stopped and restarted (while waiting a while in between) all affected OSDs, but no luck.

I saw a hard disk read error in the logs, but was able to successfully read the mentioned block using hdparm, so I would think it was only a temporary issue.
 
Last edited:
Hi,
during deep scrub ceph control the content of all replicas (one by one).
If there are an error (different content) ceph came with the inconsisent error...

With repair, ceph normaly read all replicas, and rewrote the replica, which is different to the other two.
With an replica count of two, ceph has an problem - two different contents of the same PG, but ceph don't know, which is the right one... (but afaik there should be (or come) an checksumming).

You can try to find out, which block on the VM is in this PG an wrote the data again?!

Udo
 
Thanks Udo. This Ceph pool is only used for CCTV footage storage, so it isn't going to be fatal if I lose a clip somewhere. The CCTV system will eventually overwrite the affected block I suppose, but I don't know if that will solve the issue - anyway, it's a good exercise for me to learn how to deal with such issues.

Repair likely didn't work because there was only one remaining copy, not two, when the error occured, which is my fault. I have changed replication to 3 now to avoid such issues in the future.

I forced a scrub on the pg, which was upgraded to a deep scrub. Logs show:

Code:
2018-11-19 09:00:00.000195 mon.sb1 mon.0 10.32.113.1:6789/0 148218 : cluster [ERR] overall HEALTH_ERR noout flag(s) set; 4 scrub errors; Possible data damage: 1 pg inconsistent

2018-11-19 09:03:14.508605 osd.1 osd.1 10.32.113.1:6804/1962722 3 : cluster [INF] osd.1 pg 2.3d Deep scrub errors, upgrading scrub to deep-scrub

2018-11-19 09:05:14.259033 osd.1 osd.1 10.32.113.1:6804/1962722 5 : cluster [ERR] 2.3d shard 1: soid 2:bc5082d5:::rbd_data.27a274b0dc51.0000000000022200:head data_digest 0x9040bfa6 != data_digest 0x4b6f5b62 from shard 3, size 1114112 != size 4194304 from auth oi 2:bc5082d5:::rbd_data.27a274b0dc51.0000000000022200:head(771'668523 client.24858882.0:3655296 dirty s 4194304 uv 668523 alloc_hint [4194304 4194304 0]), size 1114112 != size 4194304 from shard 3

2018-11-19 09:05:14.259036 osd.1 osd.1 10.32.113.1:6804/1962722 6 : cluster [ERR] 2.3d shard 5: soid 2:bc5082d5:::rbd_data.27a274b0dc51.0000000000022200:head data_digest 0x9040bfa6 != data_digest 0x4b6f5b62 from shard 3, size 1114112 != size 4194304 from auth oi 2:bc5082d5:::rbd_data.27a274b0dc51.0000000000022200:head(771'668523 client.24858882.0:3655296 dirty s 4194304 uv 668523 alloc_hint [4194304 4194304 0]), size 1114112 != size 4194304 from shard 3

I also ran:

Code:
root@sb2:~# rados list-inconsistent-obj 2.3d --format=json-pretty
{
    "epoch": 770,
    "inconsistents": [
        {
            "object": {
                "name": "rbd_data.27a274b0dc51.0000000000022200",
                "nspace": "",
                "locator": "",
                "snap": "head",
                "version": 668523
            },
            "errors": [
                "data_digest_mismatch",
                "size_mismatch"
            ],
            "union_shard_errors": [
                "size_mismatch_info",
                "obj_size_info_mismatch"
            ],
            "selected_object_info": {
                "oid": {
                    "oid": "rbd_data.27a274b0dc51.0000000000022200",
                    "key": "",
                    "snapid": -2,
                    "hash": 2873166397,
                    "max": 0,
                    "pool": 2,
                    "namespace": ""
                },
                "version": "771'668523",
                "prior_version": "771'668514",
                "last_reqid": "client.24858882.0:3655296",
                "user_version": 668523,
                "size": 4194304,
                "mtime": "2018-11-19 08:22:25.948104",
                "local_mtime": "2018-11-19 08:22:25.950483",
                "lost": 0,
                "flags": [
                    "dirty"
                ],
                "legacy_snaps": [],
                "truncate_seq": 0,
                "truncate_size": 0,
                "data_digest": "0xffffffff",
                "omap_digest": "0xffffffff",
                "expected_object_size": 4194304,
                "expected_write_size": 4194304,
                "alloc_hint_flags": 0,
                "manifest": {
                    "type": 0,
                    "redirect_target": {
                        "oid": "",
                        "key": "",
                        "snapid": 0,
                        "hash": 0,
                        "max": 0,
                        "pool": -9223372036854775808,
                        "namespace": ""
                    }
                },
                "watchers": {}
            },
            "shards": [
                {
                    "osd": 1,
                    "primary": true,
                    "errors": [
                        "size_mismatch_info",
                        "obj_size_info_mismatch"
                    ],
                    "size": 1114112,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x9040bfa6",
                    "object_info": {
                        "oid": {
                            "oid": "rbd_data.27a274b0dc51.0000000000022200",
                            "key": "",
                            "snapid": -2,
                            "hash": 2873166397,
                            "max": 0,
                            "pool": 2,
                            "namespace": ""
                        },
                        "version": "771'668523",
                        "prior_version": "771'668514",
                        "last_reqid": "client.24858882.0:3655296",
                        "user_version": 668523,
                        "size": 4194304,
                        "mtime": "2018-11-19 08:22:25.948104",
                        "local_mtime": "2018-11-19 08:22:25.950483",
                        "lost": 0,
                        "flags": [
                            "dirty"
                        ],
                        "legacy_snaps": [],
                        "truncate_seq": 0,
                        "truncate_size": 0,
                        "data_digest": "0xffffffff",
                        "omap_digest": "0xffffffff",
                        "expected_object_size": 4194304,
                        "expected_write_size": 4194304,
                        "alloc_hint_flags": 0,
                        "manifest": {
                            "type": 0,
                            "redirect_target": {
                                "oid": "",
                                "key": "",
                                "snapid": 0,
                                "hash": 0,
                                "max": 0,
                                "pool": -9223372036854775808,
                                "namespace": ""
                            }
                        },
                        "watchers": {}
                    }
                },
                {
                    "osd": 3,
                    "primary": false,
                    "errors": [],
                    "size": 4194304,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x4b6f5b62"
                },
                {
                    "osd": 5,
                    "primary": false,
                    "errors": [
                        "size_mismatch_info",
                        "obj_size_info_mismatch"
                    ],
                    "size": 1114112,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x9040bfa6",
                    "object_info": {
                        "oid": {
                            "oid": "rbd_data.27a274b0dc51.0000000000022200",
                            "key": "",
                            "snapid": -2,
                            "hash": 2873166397,
                            "max": 0,
                            "pool": 2,
                            "namespace": ""
                        },
                        "version": "771'668523",
                        "prior_version": "771'668514",
                        "last_reqid": "client.24858882.0:3655296",
                        "user_version": 668523,
                        "size": 4194304,
                        "mtime": "2018-11-19 08:22:25.948104",
                        "local_mtime": "2018-11-19 08:22:25.950483",
                        "lost": 0,
                        "flags": [
                            "dirty"
                        ],
                        "legacy_snaps": [],
                        "truncate_seq": 0,
                        "truncate_size": 0,
                        "data_digest": "0xffffffff",
                        "omap_digest": "0xffffffff",
                        "expected_object_size": 4194304,
                        "expected_write_size": 4194304,
                        "alloc_hint_flags": 0,
                        "manifest": {
                            "type": 0,
                            "redirect_target": {
                                "oid": "",
                                "key": "",
                                "snapid": 0,
                                "hash": 0,
                                "max": 0,
                                "pool": -9223372036854775808,
                                "namespace": ""
                            }
                        },
                        "watchers": {}
                    }
                }
            ]
        }
    ]
}

How can I identify the affected block?
 
Ceph managed to fix this itself after issuing another repair, which is nice. Works in mysterious ways!

Logs show the following:

Code:
2018-11-19 11:44:26.513656 mon.sb1 mon.0 10.32.113.1:6789/0 153119 : cluster [ERR] Health check update: 3 scrub errors (OSD_SCRUB_ERRORS)
2018-11-19 11:44:26.513677 mon.sb1 mon.0 10.32.113.1:6789/0 153120 : cluster [ERR] Health check update: Possible data damage: 1 pg inconsistent (PG_DAMAGED)
2018-11-19 11:44:23.333449 osd.1 osd.1 10.32.113.1:6804/1962722 25 : cluster [ERR] 2.3d repair 0 missing, 1 inconsistent objects
2018-11-19 11:44:23.333657 osd.1 osd.1 10.32.113.1:6804/1962722 26 : cluster [ERR] 2.3d repair 3 errors, 2 fixed
2018-11-19 11:58:06.444089 mon.sb1 mon.0 10.32.113.1:6789/0 153454 : cluster [INF] Health check cleared: OSD_SCRUB_ERRORS (was: 3 scrub errors)
2018-11-19 11:58:06.444110 mon.sb1 mon.0 10.32.113.1:6789/0 153455 : cluster [INF] Health check cleared: PG_DAMAGED (was: Possible data damage: 1 pg inconsistent)
2018-11-19 12:00:00.000112 mon.sb1 mon.0 10.32.113.1:6789/0 153499 : cluster [WRN] overall HEALTH_WARN noout flag(s) set
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!