How to recover from loosing all but 1 ceph monitor

rekahsoft

Member
Dec 28, 2019
10
3
8
33
Hi all,

Today out of the blue my ceph cluster had all clients disconnected. Ceph dashboard still shows healthy (lies), but proxmox shows both my vm storage (based on rdb) and cephfs as in an "unknown state". When I started to dive in further, I found that on every node, ceph health hangs. osd's appear to be healthy in the ceph dashboard as well as when viewing their logs on each node.

I have about a 100TB cluster, with 5 nodes, each with 4 or 8 osd's.

Looks like I've lost quorum as I stupidly had 2 monitors running after running maintenance a while ago. I'm able to connect to the remaining monitor now over its socket (see mon_status and quorum_status below). My pve-2 host's root raid has failed, so that will take a while to recover. Does anyone know how to recover from a single monitor? Any help greatly appreciated!

On one node (pve-2), journalctl -ru ceph-crash has the following output:

Code:
Sep 17 21:38:30 pve-2 ceph-crash[1475]: WARNING:__main__:post /var/lib/ceph/crash/2021-09-17_16:53:48.826384Z_9dcff0ee-8e0b-4679-87c6-a67ed29b3ae6 as client.crash failed: [errno 2] error conn
Sep 17 21:38:30 pve-2 ceph-crash[1475]: WARNING:__main__:post /var/lib/ceph/crash/2021-09-17_16:53:48.826384Z_9dcff0ee-8e0b-4679-87c6-a67ed29b3ae6 as client.crash.pve-2 failed: [errno 2] erro
Sep 17 21:28:30 pve-2 ceph-crash[1475]: WARNING:__main__:post /var/lib/ceph/crash/2021-09-17_16:53:48.826384Z_9dcff0ee-8e0b-4679-87c6-a67ed29b3ae6 as client.admin failed:
Sep 17 21:28:00 pve-2 ceph-crash[1475]: WARNING:__main__:post /var/lib/ceph/crash/2021-09-17_16:53:48.826384Z_9dcff0ee-8e0b-4679-87c6-a67ed29b3ae6 as client.crash failed: [errno 2] error conn
Sep 17 21:27:59 pve-2 ceph-crash[1475]: WARNING:__main__:post /var/lib/ceph/crash/2021-09-17_16:53:48.826384Z_9dcff0ee-8e0b-4679-87c6-a67ed29b3ae6 as client.crash.pve-2 failed: [errno 2] erro
Sep 17 21:17:59 pve-2 ceph-crash[1475]: WARNING:__main__:post /var/lib/ceph/crash/2021-09-17_16:53:48.826384Z_9dcff0ee-8e0b-4679-87c6-a67ed29b3ae6 as client.admin failed:
Sep 17 21:17:29 pve-2 ceph-crash[1475]: WARNING:__main__:post /var/lib/ceph/crash/2021-09-17_16:53:48.826384Z_9dcff0ee-8e0b-4679-87c6-a67ed29b3ae6 as client.crash failed: [errno 2] error conn
Sep 17 21:17:29 pve-2 ceph-crash[1475]: WARNING:__main__:post /var/lib/ceph/crash/2021-09-17_16:53:48.826384Z_9dcff0ee-8e0b-4679-87c6-a67ed29b3ae6 as client.crash.pve-2 failed: [errno 2] erro
Sep 17 21:07:29 pve-2 ceph-crash[1475]: WARNING:__main__:post /var/lib/ceph/crash/2021-09-17_16:53:48.826384Z_9dcff0ee-8e0b-4679-87c6-a67ed29b3ae6 as client.admin failed:
Sep 17 21:06:59 pve-2 ceph-crash[1475]: WARNING:__main__:post /var/lib/ceph/crash/2021-09-17_16:53:48.826384Z_9dcff0ee-8e0b-4679-87c6-a67ed29b3ae6 as client.crash failed: [errno 2] error conn
Sep 17 21:06:59 pve-2 ceph-crash[1475]: WARNING:__main__:post /var/lib/ceph/crash/2021-09-17_16:53:48.826384Z_9dcff0ee-8e0b-4679-87c6-a67ed29b3ae6 as client.crash.pve-2 failed: [errno 2] erro

On pve-2, the ceph-mon crashed:
/var/log/ceph/ceph-mon.pve-2.log
Code:
2021-09-17 12:48:57.470 7f87fc228700  0 mon.pve-2@0(leader) e36 handle_command mon_command({"prefix":"df","format":"json"} v 0) v1
2021-09-17 12:48:57.470 7f87fc228700  0 log_channel(audit) log [DBG] : from='client.? 172.16.0.21:0/2977611991' entity='client.admin' cmd=[{"prefix":"df","format":"json"}]: dispatch
2021-09-17 12:53:48.822 7f87faa25700 -1 rocksdb: submit_common error: IO error: While fdatasync: /var/lib/ceph/mon/ceph-pve-2/store.db/446832.log: Input/output error code = 5 Rocksdb transaction:
Put( Prefix = p key = 'xos'0x006c6173't_committed' Value size = 8)
Put( Prefix = m key = 'nitor_store'0x006c6173't_metadata' Value size = 1414)
2021-09-17 12:53:48.822 7f87faa25700 -1 /build/ceph/ceph-14.2.19/src/mon/MonitorDBStore.h: In function 'int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef)' thread 7f87faa25700 time 2021-09-17 12:53:48.823296
/build/ceph/ceph-14.2.19/src/mon/MonitorDBStore.h: 354: ceph_abort_msg("failed to write to db")

 ceph version 14.2.19 (79ae1114a99aa887d945e367d046a9a8a1233a75) nautilus (stable)
 1: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xdf) [0x7f880628b3a8]
 2: (MonitorDBStore::apply_transaction(std::shared_ptr<MonitorDBStore::Transaction>)+0xe31) [0x55da118f9821]
 3: (MonitorDBStore::C_DoTransaction::finish(int)+0x77) [0x55da11a19d27]
 4: (Context::complete(int)+0x9) [0x55da11939b39]
 5: (Finisher::finisher_thread_entry()+0x17f) [0x7f88063184bf]
 6: (()+0x7fa3) [0x7f880514dfa3]
 7: (clone()+0x3f) [0x7f8804cfd4cf]

2021-09-17 12:53:48.826 7f87faa25700 -1 *** Caught signal (Aborted) **
 in thread 7f87faa25700 thread_name:fn_monstore

 ceph version 14.2.19 (79ae1114a99aa887d945e367d046a9a8a1233a75) nautilus (stable)
 1: (()+0x12730) [0x7f8805158730]
 2: (gsignal()+0x10b) [0x7f8804c3b7bb]
 3: (abort()+0x121) [0x7f8804c26535]
 4: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1b0) [0x7f880628b479]
 5: (MonitorDBStore::apply_transaction(std::shared_ptr<MonitorDBStore::Transaction>)+0xe31) [0x55da118f9821]
 6: (MonitorDBStore::C_DoTransaction::finish(int)+0x77) [0x55da11a19d27]
 7: (Context::complete(int)+0x9) [0x55da11939b39]
 8: (Finisher::finisher_thread_entry()+0x17f) [0x7f88063184bf]
 9: (()+0x7fa3) [0x7f880514dfa3]
 10: (clone()+0x3f) [0x7f8804cfd4cf]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---

I also noticed in all osd logs across all nodes, messages like this:
Code:
Sep 17 21:49:59 pve-0 ceph-osd[7709]: 2021-09-17 21:49:59.442 7fec64008700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2021-09-17 20:49

--

Looks like pve-2's root device is currently read-only. I suspect this is what caused the original issue. Now the question is, how to recover.

Code:
Sep 17 22:06:38 pve-2 pveproxy[1449722]: unable to open log file '/var/log/pveproxy/access.log' - Read-only file system
Sep 17 22:06:39 pve-2 kernel: libceph: mon1 (1)172.16.0.22:6789 socket error on write
Sep 17 22:06:39 pve-2 pmxcfs[2755]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-storage/pve-1/local: opening '/var/lib/rrdcached/db/pve2-storage/pve-1/local': Read-only file
Sep 17 22:06:39 pve-2 ceph-mgr[1248992]: 2021-09-17 22:06:39.898 7ff3e765d700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2021-09-17 21
Sep 17 22:06:40 pve-2 kernel: libceph: mon1 (1)172.16.0.22:6789 socket error on write
Sep 17 22:06:40 pve-2 pve-firewall[3002]: status update error: unable to open file '/var/lib/pve-firewall/ipsetcmdlist1.tmp.3002' - Read-only file system
Sep 17 22:06:41 pve-2 pmxcfs[2755]: [status] notice: RRD update error /var/lib/rrdcached/db/pve2-storage/pve-3/local: opening '/var/lib/rrdcached/db/pve2-storage/pve-3/local': Read-only file
Sep 17 22:06:41 pve-2 pvestatd[3005]: got timeout

--

Code:
root@pve-0:~# ceph daemon mon.pve-0 quorum_status
{
    "election_epoch": 937789,
    "quorum": [],
    "quorum_names": [],
    "quorum_leader_name": "",
    "monmap": {
        "epoch": 36,
        "fsid": "0f2890c4-3a78-4859-b7c1-43f749b127b3",
        "modified": "2021-06-19 10:27:06.492224",
        "created": "2019-11-27 22:08:14.851985",
        "min_mon_release": 14,
        "min_mon_release_name": "nautilus",
        "features": {
            "persistent": [
                "kraken",
                "luminous",
                "mimic",
                "osdmap-prune",
                "nautilus"
            ],
            "optional": []
        },
        "mons": [
            {
                "rank": 0,
                "name": "pve-2",
                "public_addrs": {
                    "addrvec": [
                        {
                            "type": "v2",
                            "addr": "172.16.0.22:3300",
                            "nonce": 0
                        },
                        {
                            "type": "v1",
                            "addr": "172.16.0.22:6789",
                            "nonce": 0
                        }
                    ]
                },
                "addr": "172.16.0.22:6789/0",
                "public_addr": "172.16.0.22:6789/0"
            },
            {
                "rank": 1,
                "name": "pve-0",
                "public_addrs": {
                    "addrvec": [
                        {
                            "type": "v2",
                            "addr": "172.16.0.20:3300",
                            "nonce": 0
                        },
                        {
                            "type": "v1",
                            "addr": "172.16.0.20:6789",
                            "nonce": 0
                        }
                    ]
                },
                "addr": "172.16.0.20:6789/0",
                "public_addr": "172.16.0.20:6789/0"
            }
        ]
    }
}

Code:
root@pve-0:~# ceph daemon mon.pve-0 mon_status
{
    "name": "pve-0",
    "rank": 1,
    "state": "electing",
    "election_epoch": 938205,
    "quorum": [],
    "features": {
        "required_con": "2449958747315912708",
        "required_mon": [
            "kraken",
            "luminous",
            "mimic",
            "osdmap-prune",
            "nautilus"
        ],
        "quorum_con": "4611087854035861503",
        "quorum_mon": [
            "kraken",
            "luminous",
            "mimic",
            "osdmap-prune",
            "nautilus"
        ]
    },
    "outside_quorum": [],
    "extra_probe_peers": [
        {
            "addrvec": [
                {
                    "type": "v1",
                    "addr": "172.16.0.23:6789",
                    "nonce": 0
                }
            ]
        },
        {
            "addrvec": [
                {
                    "type": "v2",
                    "addr": "172.16.0.23:3300",
                    "nonce": 0
                },
                {
                    "type": "v1",
                    "addr": "172.16.0.23:6789",
                    "nonce": 0
                }
            ]
        },
        {
            "addrvec": [
                {
                    "type": "v2",
                    "addr": "172.16.0.24:3300",
                    "nonce": 0
                },
                {
                    "type": "v1",
                    "addr": "172.16.0.24:6789",
                    "nonce": 0
                }
            ]
        }
    ],
    "sync_provider": [],
    "monmap": {
        "epoch": 36,
        "fsid": "0f2890c4-3a78-4859-b7c1-43f749b127b3",
        "modified": "2021-06-19 10:27:06.492224",
        "created": "2019-11-27 22:08:14.851985",
        "min_mon_release": 14,
        "min_mon_release_name": "nautilus",
        "features": {
            "persistent": [
                "kraken",
                "luminous",
                "mimic",
                "osdmap-prune",
                "nautilus"
            ],
            "optional": []
        },
        "mons": [
            {
                "rank": 0,
                "name": "pve-2",
                "public_addrs": {
                    "addrvec": [
                        {
                            "type": "v2",
                            "addr": "172.16.0.22:3300",
                            "nonce": 0
                        },
                        {
                            "type": "v1",
                            "addr": "172.16.0.22:6789",
                            "nonce": 0
                        }
                    ]
                },
                "addr": "172.16.0.22:6789/0",
                "public_addr": "172.16.0.22:6789/0"
            },
            {
                "rank": 1,
                "name": "pve-0",
                "public_addrs": {
                    "addrvec": [
                        {
                            "type": "v2",
                            "addr": "172.16.0.20:3300",
                            "nonce": 0
                        },
                        {
                            "type": "v1",
                            "addr": "172.16.0.20:6789",
                            "nonce": 0
                        }
                    ]
                },
                "addr": "172.16.0.20:6789/0",
                "public_addr": "172.16.0.20:6789/0"
            }
        ]
    },
    "feature_map": {
        "mon": [
            {
                "features": "0x3ffddff8ffecffff",
                "release": "luminous",
                "num": 1
            }
        ],
        "mds": [
            {
                "features": "0x3ffddff8ffecffff",
                "release": "luminous",
                "num": 2
            }
        ],
        "osd": [
            {
                "features": "0x3ffddff8ffecffff",
                "release": "luminous",
                "num": 24
            }
        ],
        "client": [
            {
                "features": "0x3ffddff8eeacfffb",
                "release": "luminous",
                "num": 1
            },
            {
                "features": "0x3ffddff8ffecffff",
                "release": "luminous",
                "num": 23
            }
        ],
        "mgr": [
            {
                "features": "0x3ffddff8ffecffff",
                "release": "luminous",
                "num": 4
            }
        ]
    }
}
 
Last edited:
I would first ensure the network hasnt changed MTUs or so. Then you should be able to move the monitor db on your other broken nodes, get qurorm and wait for them to sync with the primary working one.
 
I have lost all monitors at this point (2 nodes have failed). How would I go about recovering from a loss of all monitors?

Thanks for your help @AlexLup
 
Aha! I was able to recover a single node and its monitor! Back to needing to recover from a single monitor. Any assistance greatly appreciated!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!