How to recover from loosing all but 1 ceph monitor

rekahsoft · Sep 18, 2021

Hi all,

Today out of the blue my ceph cluster had all clients disconnected. Ceph dashboard still shows healthy (lies), but proxmox shows both my vm storage (based on rdb) and cephfs as in an "unknown state". When I started to dive in further, I found that on every node, ceph health hangs. osd's appear to be healthy in the ceph dashboard as well as when viewing their logs on each node.

I have about a 100TB cluster, with 5 nodes, each with 4 or 8 osd's.

Looks like I've lost quorum as I stupidly had 2 monitors running after running maintenance a while ago. I'm able to connect to the remaining monitor now over its socket (see mon_status and quorum_status below). My pve-2 host's root raid has failed, so that will take a while to recover. Does anyone know how to recover from a single monitor? Any help greatly appreciated!

On one node (pve-2), journalctl -ru ceph-crash has the following output:

Code:

Sep 17 21:38:30 pve-2 ceph-crash[1475]: WARNING:__main__:post /var/lib/ceph/crash/2021-09-17_16:53:48.826384Z_9dcff0ee-8e0b-4679-87c6-a67ed29b3ae6 as client.crash failed: [errno 2] error conn
Sep 17 21:38:30 pve-2 ceph-crash[1475]: WARNING:__main__:post /var/lib/ceph/crash/2021-09-17_16:53:48.826384Z_9dcff0ee-8e0b-4679-87c6-a67ed29b3ae6 as client.crash.pve-2 failed: [errno 2] erro
Sep 17 21:28:30 pve-2 ceph-crash[1475]: WARNING:__main__:post /var/lib/ceph/crash/2021-09-17_16:53:48.826384Z_9dcff0ee-8e0b-4679-87c6-a67ed29b3ae6 as client.admin failed:
Sep 17 21:28:00 pve-2 ceph-crash[1475]: WARNING:__main__:post /var/lib/ceph/crash/2021-09-17_16:53:48.826384Z_9dcff0ee-8e0b-4679-87c6-a67ed29b3ae6 as client.crash failed: [errno 2] error conn
Sep 17 21:27:59 pve-2 ceph-crash[1475]: WARNING:__main__:post /var/lib/ceph/crash/2021-09-17_16:53:48.826384Z_9dcff0ee-8e0b-4679-87c6-a67ed29b3ae6 as client.crash.pve-2 failed: [errno 2] erro
Sep 17 21:17:59 pve-2 ceph-crash[1475]: WARNING:__main__:post /var/lib/ceph/crash/2021-09-17_16:53:48.826384Z_9dcff0ee-8e0b-4679-87c6-a67ed29b3ae6 as client.admin failed:
Sep 17 21:17:29 pve-2 ceph-crash[1475]: WARNING:__main__:post /var/lib/ceph/crash/2021-09-17_16:53:48.826384Z_9dcff0ee-8e0b-4679-87c6-a67ed29b3ae6 as client.crash failed: [errno 2] error conn
Sep 17 21:17:29 pve-2 ceph-crash[1475]: WARNING:__main__:post /var/lib/ceph/crash/2021-09-17_16:53:48.826384Z_9dcff0ee-8e0b-4679-87c6-a67ed29b3ae6 as client.crash.pve-2 failed: [errno 2] erro
Sep 17 21:07:29 pve-2 ceph-crash[1475]: WARNING:__main__:post /var/lib/ceph/crash/2021-09-17_16:53:48.826384Z_9dcff0ee-8e0b-4679-87c6-a67ed29b3ae6 as client.admin failed:
Sep 17 21:06:59 pve-2 ceph-crash[1475]: WARNING:__main__:post /var/lib/ceph/crash/2021-09-17_16:53:48.826384Z_9dcff0ee-8e0b-4679-87c6-a67ed29b3ae6 as client.crash failed: [errno 2] error conn
Sep 17 21:06:59 pve-2 ceph-crash[1475]: WARNING:__main__:post /var/lib/ceph/crash/2021-09-17_16:53:48.826384Z_9dcff0ee-8e0b-4679-87c6-a67ed29b3ae6 as client.crash.pve-2 failed: [errno 2] erro

On pve-2, the ceph-mon crashed:
/var/log/ceph/ceph-mon.pve-2.log

Code:

2021-09-17 12:48:57.470 7f87fc228700  0 mon.pve-2@0(leader) e36 handle_command mon_command({"prefix":"df","format":"json"} v 0) v1
2021-09-17 12:48:57.470 7f87fc228700  0 log_channel(audit) log [DBG] : from='client.? 172.16.0.21:0/2977611991' entity='client.admin' cmd=[{"prefix":"df","format":"json"}]: dispatch
2021-09-17 12:53:48.822 7f87faa25700 -1 rocksdb: submit_common error: IO error: While fdatasync: /var/lib/ceph/mon/ceph-pve-2/store.db/446832.log: Input/output error code = 5 Rocksdb transaction:
Put( Prefix = p key = 'xos'0x006c6173't_committed' Value size = 8)
Put( Prefix = m key = 'nitor_store'0x006c6173't_metadata' Value size = 1414)
2021-09-17 12:53:48.822 7f87faa25700 -1 /build/ceph/ceph-14.2.19/src/mon/MonitorDBStore.h: In function 'int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef)' thread 7f87faa25700 time 2021-09-17 12:53:48.823296
/build/ceph/ceph-14.2.19/src/mon/MonitorDBStore.h: 354: ceph_abort_msg("failed to write to db")

 ceph version 14.2.19 (79ae1114a99aa887d945e367d046a9a8a1233a75) nautilus (stable)
 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xdf) [0x7f880628b3a8]
 2: (MonitorDBStore::apply_transaction(std::shared_ptr<MonitorDBStore::Transaction>)+0xe31) [0x55da118f9821]
 3: (MonitorDBStore::C_DoTransaction::finish(int)+0x77) [0x55da11a19d27]
 4: (Context::complete(int)+0x9) [0x55da11939b39]
 5: (Finisher::finisher_thread_entry()+0x17f) [0x7f88063184bf]
 6: (()+0x7fa3) [0x7f880514dfa3]
 7: (clone()+0x3f) [0x7f8804cfd4cf]

2021-09-17 12:53:48.826 7f87faa25700 -1 *** Caught signal (Aborted) **
 in thread 7f87faa25700 thread_name:fn_monstore

 ceph version 14.2.19 (79ae1114a99aa887d945e367d046a9a8a1233a75) nautilus (stable)
 1: (()+0x12730) [0x7f8805158730]
 2: (gsignal()+0x10b) [0x7f8804c3b7bb]
 3: (abort()+0x121) [0x7f8804c26535]
 4: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1b0) [0x7f880628b479]
 5: (MonitorDBStore::apply_transaction(std::shared_ptr<MonitorDBStore::Transaction>)+0xe31) [0x55da118f9821]
 6: (MonitorDBStore::C_DoTransaction::finish(int)+0x77) [0x55da11a19d27]
 7: (Context::complete(int)+0x9) [0x55da11939b39]
 8: (Finisher::finisher_thread_entry()+0x17f) [0x7f88063184bf]
 9: (()+0x7fa3) [0x7f880514dfa3]
 10: (clone()+0x3f) [0x7f8804cfd4cf]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---

I also noticed in all osd logs across all nodes, messages like this:

Code:

Sep 17 21:49:59 pve-0 ceph-osd[7709]: 2021-09-17 21:49:59.442 7fec64008700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2021-09-17 20:49

--

Looks like pve-2's root device is currently read-only. I suspect this is what caused the original issue. Now the question is, how to recover.

Code:

Sep 17 22:06:38 pve-2 pveproxy[1449722]: unable to open log file '/var/log/pveproxy/access.log' - Read-only file system
Sep 17 22:06:39 pve-2 kernel: libceph: mon1 (1)172.16.0.22:6789 socket error on write
Sep 17 22:06:39 pve-2 pmxcfs[2755]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-storage/pve-1/local: opening '/var/lib/rrdcached/db/pve2-storage/pve-1/local': Read-only file
Sep 17 22:06:39 pve-2 ceph-mgr[1248992]: 2021-09-17 22:06:39.898 7ff3e765d700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2021-09-17 21
Sep 17 22:06:40 pve-2 kernel: libceph: mon1 (1)172.16.0.22:6789 socket error on write
Sep 17 22:06:40 pve-2 pve-firewall[3002]: status update error: unable to open file '/var/lib/pve-firewall/ipsetcmdlist1.tmp.3002' - Read-only file system
Sep 17 22:06:41 pve-2 pmxcfs[2755]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-storage/pve-3/local: opening '/var/lib/rrdcached/db/pve2-storage/pve-3/local': Read-only file
Sep 17 22:06:41 pve-2 pvestatd[3005]: got timeout

--

Code:

root@pve-0:~# ceph daemon mon.pve-0 quorum_status
{
    "election_epoch": 937789,
    "quorum": [],
    "quorum_names": [],
    "quorum_leader_name": "",
    "monmap": {
        "epoch": 36,
        "fsid": "0f2890c4-3a78-4859-b7c1-43f749b127b3",
        "modified": "2021-06-19 10:27:06.492224",
        "created": "2019-11-27 22:08:14.851985",
        "min_mon_release": 14,
        "min_mon_release_name": "nautilus",
        "features": {
            "persistent": [
                "kraken",
                "luminous",
                "mimic",
                "osdmap-prune",
                "nautilus"
            ],
            "optional": []
        },
        "mons": [
            {
                "rank": 0,
                "name": "pve-2",
                "public_addrs": {
                    "addrvec": [
                        {
                            "type": "v2",
                            "addr": "172.16.0.22:3300",
                            "nonce": 0
                        },
                        {
                            "type": "v1",
                            "addr": "172.16.0.22:6789",
                            "nonce": 0
                        }
                    ]
                },
                "addr": "172.16.0.22:6789/0",
                "public_addr": "172.16.0.22:6789/0"
            },
            {
                "rank": 1,
                "name": "pve-0",
                "public_addrs": {
                    "addrvec": [
                        {
                            "type": "v2",
                            "addr": "172.16.0.20:3300",
                            "nonce": 0
                        },
                        {
                            "type": "v1",
                            "addr": "172.16.0.20:6789",
                            "nonce": 0
                        }
                    ]
                },
                "addr": "172.16.0.20:6789/0",
                "public_addr": "172.16.0.20:6789/0"
            }
        ]
    }
}

Code:

root@pve-0:~# ceph daemon mon.pve-0 mon_status
{
    "name": "pve-0",
    "rank": 1,
    "state": "electing",
    "election_epoch": 938205,
    "quorum": [],
    "features": {
        "required_con": "2449958747315912708",
        "required_mon": [
            "kraken",
            "luminous",
            "mimic",
            "osdmap-prune",
            "nautilus"
        ],
        "quorum_con": "4611087854035861503",
        "quorum_mon": [
            "kraken",
            "luminous",
            "mimic",
            "osdmap-prune",
            "nautilus"
        ]
    },
    "outside_quorum": [],
    "extra_probe_peers": [
        {
            "addrvec": [
                {
                    "type": "v1",
                    "addr": "172.16.0.23:6789",
                    "nonce": 0
                }
            ]
        },
        {
            "addrvec": [
                {
                    "type": "v2",
                    "addr": "172.16.0.23:3300",
                    "nonce": 0
                },
                {
                    "type": "v1",
                    "addr": "172.16.0.23:6789",
                    "nonce": 0
                }
            ]
        },
        {
            "addrvec": [
                {
                    "type": "v2",
                    "addr": "172.16.0.24:3300",
                    "nonce": 0
                },
                {
                    "type": "v1",
                    "addr": "172.16.0.24:6789",
                    "nonce": 0
                }
            ]
        }
    ],
    "sync_provider": [],
    "monmap": {
        "epoch": 36,
        "fsid": "0f2890c4-3a78-4859-b7c1-43f749b127b3",
        "modified": "2021-06-19 10:27:06.492224",
        "created": "2019-11-27 22:08:14.851985",
        "min_mon_release": 14,
        "min_mon_release_name": "nautilus",
        "features": {
            "persistent": [
                "kraken",
                "luminous",
                "mimic",
                "osdmap-prune",
                "nautilus"
            ],
            "optional": []
        },
        "mons": [
            {
                "rank": 0,
                "name": "pve-2",
                "public_addrs": {
                    "addrvec": [
                        {
                            "type": "v2",
                            "addr": "172.16.0.22:3300",
                            "nonce": 0
                        },
                        {
                            "type": "v1",
                            "addr": "172.16.0.22:6789",
                            "nonce": 0
                        }
                    ]
                },
                "addr": "172.16.0.22:6789/0",
                "public_addr": "172.16.0.22:6789/0"
            },
            {
                "rank": 1,
                "name": "pve-0",
                "public_addrs": {
                    "addrvec": [
                        {
                            "type": "v2",
                            "addr": "172.16.0.20:3300",
                            "nonce": 0
                        },
                        {
                            "type": "v1",
                            "addr": "172.16.0.20:6789",
                            "nonce": 0
                        }
                    ]
                },
                "addr": "172.16.0.20:6789/0",
                "public_addr": "172.16.0.20:6789/0"
            }
        ]
    },
    "feature_map": {
        "mon": [
            {
                "features": "0x3ffddff8ffecffff",
                "release": "luminous",
                "num": 1
            }
        ],
        "mds": [
            {
                "features": "0x3ffddff8ffecffff",
                "release": "luminous",
                "num": 2
            }
        ],
        "osd": [
            {
                "features": "0x3ffddff8ffecffff",
                "release": "luminous",
                "num": 24
            }
        ],
        "client": [
            {
                "features": "0x3ffddff8eeacfffb",
                "release": "luminous",
                "num": 1
            },
            {
                "features": "0x3ffddff8ffecffff",
                "release": "luminous",
                "num": 23
            }
        ],
        "mgr": [
            {
                "features": "0x3ffddff8ffecffff",
                "release": "luminous",
                "num": 4
            }
        ]
    }
}

AlexLup · Sep 18, 2021

I would first ensure the network hasnt changed MTUs or so. Then you should be able to move the monitor db on your other broken nodes, get qurorm and wait for them to sync with the primary working one.

rekahsoft · Sep 18, 2021

I have lost all monitors at this point (2 nodes have failed). How would I go about recovering from a loss of all monitors?

Thanks for your help @AlexLup

rekahsoft · Sep 18, 2021

Aha! I was able to recover a single node and its monitor! Back to needing to recover from a single monitor. Any assistance greatly appreciated!

spirit · Sep 19, 2021

Do you have removed the dead monitor first ?

https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/
"REMOVING MONITORS FROM AN UNHEALTHY CLUSTER"

GoZippy · May 13, 2022

did this get resolved? How?

Search

Search

How to recover from loosing all but 1 ceph monitor

rekahsoft

Member

AlexLup

Well-Known Member

rekahsoft

Member

rekahsoft

Member

spirit

Distinguished Member

GoZippy

Active Member

We value your privacy

How to recover from loosing all but 1 ceph monitor

rekahsoft

Member

AlexLup

Well-Known Member

rekahsoft

Member

rekahsoft

Member

spirit

Distinguished Member

​

GoZippy

Active Member

We value your privacy