ceph Bug or fixable? terminate called after throwing an instance of 'std::invalid_argument'

steele · Dec 29, 2023

This started after a power failure...
I have 3 servers, one is NOT doing this and is alive, the other two cannot be started.
The OSDs appear to be started and running, but the monitor fails to start.
SO... ceph cluster is down until I figure out how to fix this:

Code:

Dec 28 15:11:50 pve2 systemd[1]: ceph-mon@pve2.service: Failed with result 'signal'.
Dec 28 15:11:50 pve2 systemd[1]: ceph-mon@pve2.service: Main process exited, code=killed, status=6/ABRT
Dec 28 15:11:50 pve2 ceph-mon[856941]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Dec 28 15:11:50 pve2 ceph-mon[856941]:  17: _start()
Dec 28 15:11:50 pve2 ceph-mon[856941]:  16: __libc_start_main()
Dec 28 15:11:50 pve2 ceph-mon[856941]:  15: /lib/x86_64-linux-gnu/libc.so.6(+0x271ca) [0x7f7feac461ca]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  14: main()
Dec 28 15:11:50 pve2 ceph-mon[856941]:  13: (Monitor::preinit()+0x97a) [0x563038d1697a]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  12: (Monitor::refresh_from_paxos(bool*)+0x163) [0x563038ce8a23]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  11: (LogMonitor::update_from_paxos(bool*)+0x53) [0x563038d734a3]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  10: (LogMonitor::log_external_backlog()+0xe29) [0x563038d70849]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  9: (std::__throw_invalid_argument(char const*)+0x40) [0x7f7feaaa0192]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  8: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa90d8) [0x7f7feaaa90d8]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  7: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa8e85) [0x7f7feaaa8e85]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  6: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa8e1a) [0x7f7feaaa8e1a]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  5: /lib/x86_64-linux-gnu/libstdc++.so.6(+0x9d919) [0x7f7feaa9d919]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  4: abort()
Dec 28 15:11:50 pve2 ceph-mon[856941]:  3: gsignal()
Dec 28 15:11:50 pve2 ceph-mon[856941]:  2: /lib/x86_64-linux-gnu/libc.so.6(+0x8ad3c) [0x7f7feaca9d3c]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  1: /lib/x86_64-linux-gnu/libc.so.6(+0x3bfd0) [0x7f7feac5afd0]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  ceph version 17.2.7 (e303afc2e967a4705b40a7e5f76067c10eea0484) quincy (stable)
Dec 28 15:11:50 pve2 ceph-mon[856941]:  in thread 7f7feb02ca00 thread_name:ceph-mon
Dec 28 15:11:50 pve2 ceph-mon[856941]:      0> 2023-12-28T15:11:50.992-0700 7f7feb02ca00 -1 *** Caught signal (Aborted) **
Dec 28 15:11:50 pve2 ceph-mon[856941]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Dec 28 15:11:50 pve2 ceph-mon[856941]:  17: _start()
Dec 28 15:11:50 pve2 ceph-mon[856941]:  16: __libc_start_main()
Dec 28 15:11:50 pve2 ceph-mon[856941]:  15: /lib/x86_64-linux-gnu/libc.so.6(+0x271ca) [0x7f7feac461ca]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  14: main()
Dec 28 15:11:50 pve2 ceph-mon[856941]:  13: (Monitor::preinit()+0x97a) [0x563038d1697a]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  12: (Monitor::refresh_from_paxos(bool*)+0x163) [0x563038ce8a23]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  11: (LogMonitor::update_from_paxos(bool*)+0x53) [0x563038d734a3]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  10: (LogMonitor::log_external_backlog()+0xe29) [0x563038d70849]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  9: (std::__throw_invalid_argument(char const*)+0x40) [0x7f7feaaa0192]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  8: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa90d8) [0x7f7feaaa90d8]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  7: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa8e85) [0x7f7feaaa8e85]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  6: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa8e1a) [0x7f7feaaa8e1a]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  5: /lib/x86_64-linux-gnu/libstdc++.so.6(+0x9d919) [0x7f7feaa9d919]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  4: abort()
Dec 28 15:11:50 pve2 ceph-mon[856941]:  3: gsignal()
Dec 28 15:11:50 pve2 ceph-mon[856941]:  2: /lib/x86_64-linux-gnu/libc.so.6(+0x8ad3c) [0x7f7feaca9d3c]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  1: /lib/x86_64-linux-gnu/libc.so.6(+0x3bfd0) [0x7f7feac5afd0]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  ceph version 17.2.7 (e303afc2e967a4705b40a7e5f76067c10eea0484) quincy (stable)
Dec 28 15:11:50 pve2 ceph-mon[856941]:  in thread 7f7feb02ca00 thread_name:ceph-mon
Dec 28 15:11:50 pve2 ceph-mon[856941]:      0> 2023-12-28T15:11:50.992-0700 7f7feb02ca00 -1 *** Caught signal (Aborted) **
Dec 28 15:11:50 pve2 ceph-mon[856941]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Dec 28 15:11:50 pve2 ceph-mon[856941]:  17: _start()
Dec 28 15:11:50 pve2 ceph-mon[856941]:  16: __libc_start_main()
Dec 28 15:11:50 pve2 ceph-mon[856941]:  15: /lib/x86_64-linux-gnu/libc.so.6(+0x271ca) [0x7f7feac461ca]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  14: main()
Dec 28 15:11:50 pve2 ceph-mon[856941]:  13: (Monitor::preinit()+0x97a) [0x563038d1697a]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  12: (Monitor::refresh_from_paxos(bool*)+0x163) [0x563038ce8a23]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  11: (LogMonitor::update_from_paxos(bool*)+0x53) [0x563038d734a3]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  10: (LogMonitor::log_external_backlog()+0xe29) [0x563038d70849]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  9: (std::__throw_invalid_argument(char const*)+0x40) [0x7f7feaaa0192]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  8: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa90d8) [0x7f7feaaa90d8]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  7: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa8e85) [0x7f7feaaa8e85]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  6: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa8e1a) [0x7f7feaaa8e1a]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  5: /lib/x86_64-linux-gnu/libstdc++.so.6(+0x9d919) [0x7f7feaa9d919]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  4: abort()
Dec 28 15:11:50 pve2 ceph-mon[856941]:  3: gsignal()
Dec 28 15:11:50 pve2 ceph-mon[856941]:  2: /lib/x86_64-linux-gnu/libc.so.6(+0x8ad3c) [0x7f7feaca9d3c]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  1: /lib/x86_64-linux-gnu/libc.so.6(+0x3bfd0) [0x7f7feac5afd0]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  ceph version 17.2.7 (e303afc2e967a4705b40a7e5f76067c10eea0484) quincy (stable)
Dec 28 15:11:50 pve2 ceph-mon[856941]:  in thread 7f7feb02ca00 thread_name:ceph-mon
Dec 28 15:11:50 pve2 ceph-mon[856941]:      0> 2023-12-28T15:11:50.992-0700 7f7feb02ca00 -1 *** Caught signal (Aborted) **
Dec 28 15:11:50 pve2 ceph-mon[856941]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Dec 28 15:11:50 pve2 ceph-mon[856941]:  17: _start()
Dec 28 15:11:50 pve2 ceph-mon[856941]:  16: __libc_start_main()
Dec 28 15:11:50 pve2 ceph-mon[856941]:  15: /lib/x86_64-linux-gnu/libc.so.6(+0x271ca) [0x7f7feac461ca]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  14: main()
Dec 28 15:11:50 pve2 ceph-mon[856941]:  13: (Monitor::preinit()+0x97a) [0x563038d1697a]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  12: (Monitor::refresh_from_paxos(bool*)+0x163) [0x563038ce8a23]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  11: (LogMonitor::update_from_paxos(bool*)+0x53) [0x563038d734a3]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  10: (LogMonitor::log_external_backlog()+0xe29) [0x563038d70849]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  9: (std::__throw_invalid_argument(char const*)+0x40) [0x7f7feaaa0192]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  8: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa90d8) [0x7f7feaaa90d8]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  7: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa8e85) [0x7f7feaaa8e85]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  6: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa8e1a) [0x7f7feaaa8e1a]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  5: /lib/x86_64-linux-gnu/libstdc++.so.6(+0x9d919) [0x7f7feaa9d919]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  4: abort()
Dec 28 15:11:50 pve2 ceph-mon[856941]:  3: gsignal()
Dec 28 15:11:50 pve2 ceph-mon[856941]:  2: /lib/x86_64-linux-gnu/libc.so.6(+0x8ad3c) [0x7f7feaca9d3c]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  1: /lib/x86_64-linux-gnu/libc.so.6(+0x3bfd0) [0x7f7feac5afd0]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  ceph version 17.2.7 (e303afc2e967a4705b40a7e5f76067c10eea0484) quincy (stable)
Dec 28 15:11:50 pve2 ceph-mon[856941]:  in thread 7f7feb02ca00 thread_name:ceph-mon
Dec 28 15:11:50 pve2 ceph-mon[856941]: 2023-12-28T15:11:50.992-0700 7f7feb02ca00 -1 *** Caught signal (Aborted) **
Dec 28 15:11:50 pve2 ceph-mon[856941]:  17: _start()
Dec 28 15:11:50 pve2 ceph-mon[856941]:  16: __libc_start_main()
Dec 28 15:11:50 pve2 ceph-mon[856941]:  15: /lib/x86_64-linux-gnu/libc.so.6(+0x271ca) [0x7f7feac461ca]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  14: main()
Dec 28 15:11:50 pve2 ceph-mon[856941]:  13: (Monitor::preinit()+0x97a) [0x563038d1697a]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  12: (Monitor::refresh_from_paxos(bool*)+0x163) [0x563038ce8a23]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  11: (LogMonitor::update_from_paxos(bool*)+0x53) [0x563038d734a3]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  10: (LogMonitor::log_external_backlog()+0xe29) [0x563038d70849]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  9: (std::__throw_invalid_argument(char const*)+0x40) [0x7f7feaaa0192]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  8: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa90d8) [0x7f7feaaa90d8]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  7: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa8e85) [0x7f7feaaa8e85]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  6: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa8e1a) [0x7f7feaaa8e1a]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  5: /lib/x86_64-linux-gnu/libstdc++.so.6(+0x9d919) [0x7f7feaa9d919]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  4: abort()
Dec 28 15:11:50 pve2 ceph-mon[856941]:  3: gsignal()
Dec 28 15:11:50 pve2 ceph-mon[856941]:  2: /lib/x86_64-linux-gnu/libc.so.6(+0x8ad3c) [0x7f7feaca9d3c]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  1: /lib/x86_64-linux-gnu/libc.so.6(+0x3bfd0) [0x7f7feac5afd0]
Dec 28 15:11:50 pve2 ceph-mon[856941]:  ceph version 17.2.7 (e303afc2e967a4705b40a7e5f76067c10eea0484) quincy (stable)
Dec 28 15:11:50 pve2 ceph-mon[856941]:  in thread 7f7feb02ca00 thread_name:ceph-mon
Dec 28 15:11:50 pve2 ceph-mon[856941]: *** Caught signal (Aborted) **
Dec 28 15:11:50 pve2 ceph-mon[856941]:   what():  stoull
Dec 28 15:11:50 pve2 ceph-mon[856941]: terminate called after throwing an instance of 'std::invalid_argument'
Dec 28 15:11:50 pve2 systemd[1]: Started ceph-mon@pve2.service - Ceph cluster monitor daemon.
Dec 28 15:11:50 pve2 systemd[1]: Stopped ceph-mon@pve2.service - Ceph cluster monitor daemon.
Dec 28 15:11:50 pve2 systemd[1]: ceph-mon@pve2.service: Scheduled restart job, restart counter is at 3.

I have verified that /etc/ceph/ceph.conf AND /etc/pve/ceph.conf are identical on all 3 servers.
pve1 runs and is probing
pve2 throws this and won't start
pve3 throws this and wont't start

here is pveversion -v:

Code:

root@pve2:~# pveversion -v
proxmox-ve: 8.1.0 (running kernel: 6.5.11-7-pve)
pve-manager: 8.1.3 (running version: 8.1.3/b46aac3b42da5d15)
proxmox-kernel-helper: 8.1.0
pve-kernel-5.15: 7.4-4
proxmox-kernel-6.5: 6.5.11-7
proxmox-kernel-6.5.11-7-pve-signed: 6.5.11-7
proxmox-kernel-6.2.16-20-pve: 6.2.16-20
proxmox-kernel-6.2: 6.2.16-20
proxmox-kernel-6.2.16-15-pve: 6.2.16-15
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph: 17.2.7-pve1
ceph-fuse: 17.2.7-pve1
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx7
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.0.7
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.0
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.5
libpve-network-perl: 0.9.5
libpve-rs-perl: 0.8.7
libpve-storage-perl: 8.0.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.2-1
proxmox-backup-file-restore: 3.1.2-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.2
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.3
proxmox-widget-toolkit: 4.1.3
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.3
pve-edk2-firmware: 4.2023.08-2
pve-firewall: 5.0.3
pve-firmware: 3.9-1
pve-ha-manager: 4.0.3
pve-i18n: 3.1.5
pve-qemu-kvm: 8.1.2-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.0.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.2-pve1
root@pve2:~#

INI:

root@pve1:~# cat /etc/ceph/ceph.conf
[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 10.10.10.0/24
     fsid = b3445d50-80e3-405e-b3cd-a5b7251876e2
     mon_allow_pool_delete = true
     mon_host = 10.10.10.2 10.10.10.3 10.10.10.4
     ms_bind_ipv4 = true
     ms_bind_ipv6 = false
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public_network = 10.10.10.0/24

[client]
     keyring = /etc/pve/priv/$cluster.$name.keyring
         rbd_cache_size = 134217728

[mds]
     keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.pve1]
     host = pve1
     mds_standby_for_name = pve

[mds.pve2]
     host = pve2
     mds_standby_for_name = pve

[mds.pve3]
     host = pve3
     mds_standby_for_name = pve

[mon.pve1]
     host = pve1
     public_addr = 10.10.10.2

[mon.pve2]
     host = pve2
     public_addr = 10.10.10.3

[mon.pve3]
     host = pve3
     public_addr = 10.10.10.4

gurubert · Dec 29, 2023

It looks like the two MONs' databases have been corrupted with the power failure.

You can try to repair them with the existing one.

https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-mon/

steele · Dec 30, 2023

that would be lovely if I could even make the cluster active to do so.
I have green on everything on pve1...
Except, it won't talk to the cluster or get a quorum even after removing the other 2 monitors.
nothing I do will allow me to get 'pveceph status' to run.

this is by far the worst experience I have ever had trying to diagnose anything.

gurubert · Dec 30, 2023

Please read the Ceph manual section about troubleshooting MONs.

steele · Dec 30, 2023

RTFM... ORLY?

That's been done and the problem exceeds available information provided in the manuals.
If you had read them, you would know this.

There is a fundamental flaw here.
This is NOT a resilient distributed system, it's far too fragile for that.

If I cannot diagnose simple failures such as failure to communicate without just waiting for a timeout
while not a single event is thrown or logged to assist with what communication is actually failing.

This is a fundamental with ceph.
This system is supposed to be preventing things like this from happening, not encouraging you to perform a rebuild from scratch because you mucked up how you save state in communicating.

Where is the communication being monitored, certainly not in anything you have made available to diagnose.

OSDs are green
Monitor is green (supposedly)
Cluster is unresponsive... How is anyone supposed to understand this if it is NOT in the manual, if it is, please point to it.
You seriously hold state in a DB that cannot be recovered and try to call that a resilient system?
You have binary tooling instead of standard serialization tools.
Your recommendations are to trash the system and rebuild.
I cannot even do that without cluster communication.

So read the post properly before insulting someone.

I have read the Proxmox Docs, the Redhat Docs AND the IBM Docs.
Please enlighten me on other documentation other than "read the friggin code dummy."

gurubert · Dec 30, 2023

I do not owe you anything. This is a free forum. Maybe you want to look for paid support.

Apart from that you have not described which steps you have taken and which parts of the documentation you have read.

Have you tried injecting the monmap from the last running MON into the two broken ones as described here: https://docs.ceph.com/en/reef/rados...ing-mon/#recovering-a-monitor-s-broken-monmap ?

quanto11 · Dec 30, 2023

@gurubert i m just beeing curious, why do you think that the mon database is corrupted?

gurubert · Dec 31, 2023

Reportedly there was a power failure and when trying to start the MONs throw an exception. This smells like database corruption.

aaron · Dec 31, 2023

You could also export the monmap, remove the two problematic MONs and inject it into the working MON. Also remove the two non-working MONs from the /etc/pve/ceph.conf file. They should have their own sections and the IPs listed in the mon_host line.
This way the single MON should be quorate.

This part of the Ceph docs should explain the procedure: https://docs.ceph.com/en/latest/rad.../#removing-monitors-from-an-unhealthy-cluster

Then you can manually clean up the non working MONs:
systemctl disable ceph-mon@$(hostname).service
and rm -r /var/lib/ceph/mon/ceph-$(hostname)

Both commands are from memory, to please verify the names and paths. After that, you can recreate them.

Out of curiosity, on what kind of hardware are the MONs running? Especially, what kind of disks are used for the OS itself? (MONs store their data on the OS disk by default) A battery backed RAID controller, or if SSDs/NVMEs, do they have Power Loss Protection (PLP)?

And have you done a FS check to make sure there is nothing else corrupted on the host after the power failure?

steele · Jan 1, 2024

aaron said:
You could also export the monmap, remove the two problematic MONs and inject it into the working MON. Also remove the two non-working MONs from the /etc/pve/ceph.conf file. They should have their own sections and the IPs listed in the mon_host line.
This way the single MON should be quorate.

This part of the Ceph docs should explain the procedure: https://docs.ceph.com/en/latest/rad.../#removing-monitors-from-an-unhealthy-cluster

Then you can manually clean up the non working MONs:
systemctl disable ceph-mon@$(hostname).service
and rm -r /var/lib/ceph/mon/ceph-$(hostname)

Both commands are from memory, to please verify the names and paths. After that, you can recreate them.

Out of curiosity, on what kind of hardware are the MONs running? Especially, what kind of disks are used for the OS itself? (MONs store their data on the OS disk by default) A battery backed RAID controller, or if SSDs/NVMEs, do they have Power Loss Protection (PLP)?

And have you done a FS check to make sure there is nothing else corrupted on the host after the power failure?

I have done all of this.
No one seems to be paying attention to exactly what I am saying, so let me try to explain it again.

I have a 3 node cluster that was at 100% health.
These are spinners (24) 8 in each server and NOT SSDs.
The system drive ext4 on is a server grade SSD in a PCI slot, backed up to cephfs.
This is admitedly older equipment, but has been functioning perfectly for a year
since the last massive problem last year (also a power failure but took out power supplies and OSDs too.)

This time, over 24 hours we had massive power failures even though I am on battery backup, it wasn't enough to keep from losing power.
It cycled at least 4 times.
When power was restored 2 of 3 servers were down.
The major problem wasn't that 2 weren't starting... It was the fact that the network protocol was not functioning properly.
We cycled the DNS and Routers and still couldn't get connections to function.
In a panic, we changed the search domain from something .local to something .net so we could fix some certificate issues at the same time, but that made everything worse...
After resetting it BACK to the original, we still couldn't connect properly (pings were failing)
IGMP Snooping was turned on for this network so we turned it off.
after yet another rest and double checking things like network bridges and bonded adapters, we had connections.
ssh worked everywhere on both networks affecting ceph.
telnet was failing for obvious reasons, we saw the monitors down.

Network:
Ubiquiti UDM Pro rackmount
24 port Ubiquiti Switch
3 bonded 1g ethernet all working correctly over 802.3ad for a couple years.
1 dedicated 1g ethernet to ceph

Proxmox:
8.03 which has been transitioned to 8.13 to attempt to fix the networking issues, some which have been reported as bugs.
Cluster had failed... Certs were showing as bad or not available
Checking certs showed them there, but for some reason we see permission errors in the journal.
Resetting the certs and cycling all the servers finally brought the PVE cluster back up.
We can connect and work with the GUI and see shells on all the servers with no certificate errors.
Ceph just gets a timeout (500)

pve1:
All OSDs running (except the connection to remote ones...)
System lost connection to other hosts.
Mon was running, but not responding.
Mgr was running but not responding.

Bash:

root@pve1:~# pvecm status
Cluster information
-------------------
Name:             RP1-Cluster
Config Version:   9
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Mon Jan  1 12:29:30 2024
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          1.3569c
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.0.0.200 (local)
0x00000002          1 10.0.0.201
0x00000003          1 10.0.0.203
root@pve1:~#

ORIGINAL Monmap:

Code:

root@pve1:~# monmaptool --print monmap
monmaptool: monmap file monmap
epoch 10
fsid b3445d50-80e3-405e-b3cd-a5b7251876e2
last_changed 2023-10-13T11:40:15.247220-0700
created 2022-10-13T13:53:32.107284-0700
min_mon_release 17 (quincy)
election_strategy: 1
0: [v2:10.10.10.3:3300/0,v1:10.10.10.3:6789/0] mon.pve2
1: [v2:10.10.10.4:3300/0,v1:10.10.10.4:6789/0] mon.pve3
2: [v2:10.10.10.2:3300/0,v1:10.10.10.2:6789/0] mon.pve1
root@pve1:~#

I can access some things, but not ceph cli...
is the ceph cli braindead without a cluster, that seems like an odd choice.
What is the point of a health server that doesn't report the network failure and where it is trying to connect?
ceph -s, ceph health, and pveceph status (which is just a wrapper) ALL fail and report nothing but a timeout
Where do they report connection attempts? Even with debug on there is not much to see in journalctl -b to help see what is happening.
when ceph starts and tries to establish a quorum.

pve2:
all Local OSDs are green, external timeout.
Mon was failed and not starting

Determined the mon was corrupt seeing it throw `invalid_argument` failures and crash on startup.
Removed the monitor and manager.

pve3:
Exibits same behavior as pve2,
have not removed mon as I am trying to see if the network getting corrected will allow montool to insert the monmap and recover this monitor
This is a cluster failure, but it's failing to even show a single monitor.

I am leaning towards the certs are still not correct and causing connection failures.
If I could see that attempt and failure, it would help tremendously, alas nothing reported to standard debugging tools.

I edited the monmap and replaced it with only a single monitor...
It still fails to connect to the network and form a quorum.
I expect this behavior to be odd, because when first creating a cluster, this works fine.
I can get ceph health and it responds fine with only a single machine...
I mean, how else do we add monitors...

Here is the current ceph.conf

Code:

[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 10.10.10.0/24
     fsid = b3445d50-80e3-405e-b3cd-a5b7251876e2
     mon_allow_pool_delete = true
     mon_host = 10.0.0.200
     ms_bind_ipv4 = true
     ms_bind_ipv6 = false
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public_network = 10.0.0.0/19

[mds]
     keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.pve1]
     host = pve1
     mds_standby_for_name = pve

[mon.pve1]
     host = pve1
     public_addr = 10.0.0.200


root@pve1:~# cat /etc/ceph/ceph.conf
[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 10.10.10.0/24
     fsid = b3445d50-80e3-405e-b3cd-a5b7251876e2
     mon_allow_pool_delete = true
     mon_host = 10.0.0.200
     ms_bind_ipv4 = true
     ms_bind_ipv6 = false
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public_network = 10.0.0.0/19

[mds]
     keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.pve1]
     host = pve1
     mds_standby_for_name = pve

[mon.pve1]
     host = pve1
     public_addr = 10.0.0.200

I have injected this monmap into PVE1:

Bash:

epoch 10
fsid b3445d50-80e3-405e-b3cd-a5b7251876e2
last_changed 2023-10-13T11:40:15.247220-0700
created 2022-10-13T13:53:32.107284-0700
min_mon_release 17 (quincy)
election_strategy: 1
0: [v2:10.0.0.200:3300/0,v1:10.0.0.200:6789/0] mon.pve1

PVE1 is reporting:

Bash:

ceph-mon@pve1.service - Ceph cluster monitor daemon
     Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; preset: enabled)
    Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
             └─ceph-after-pve-cluster.conf
     Active: active (running) since Fri 2023-12-29 17:46:14 MST; 2 days ago
   Main PID: 16365 (ceph-mon)
      Tasks: 25
     Memory: 481.2M
        CPU: 1h 40min 51.854s
     CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@pve1.service
             └─16365 /usr/bin/ceph-mon -f --cluster ceph --id pve1 --setuser ceph --setgroup ceph

Dec 31 17:03:22 pve1 sudo[1345170]:     ceph : PWD=/ ; USER=root ; COMMAND=/usr/sbin/smartctl -x --json=o /dev/nvme0n1
Dec 31 17:03:22 pve1 sudo[1345170]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=64045)
Dec 31 17:03:22 pve1 sudo[1345170]: pam_unix(sudo:session): session closed for user root
Dec 31 17:03:23 pve1 sudo[1345173]:     ceph : PWD=/ ; USER=root ; COMMAND=/usr/sbin/nvme wd_black smart-log-add --json /dev/nvme0n1
Dec 31 17:03:23 pve1 sudo[1345173]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=64045)
Dec 31 17:03:23 pve1 sudo[1345173]: pam_unix(sudo:session): session closed for user root
Jan 01 00:00:08 pve1 ceph-mon[16365]: 2024-01-01T00:00:08.186-0700 7f28faa1f6c0 -1 received  signal: Hangup from killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw rbd-mirror cephfs-mi>
Jan 01 00:00:08 pve1 ceph-mon[16365]: 2024-01-01T00:00:08.186-0700 7f28faa1f6c0 -1 mon.pve1@0(leader) e12 *** Got Signal Hangup ***
Jan 01 00:00:08 pve1 ceph-mon[16365]: 2024-01-01T00:00:08.330-0700 7f28faa1f6c0 -1 received  signal: Hangup from  (PID: 1487618) UID: 0
Jan 01 00:00:08 pve1 ceph-mon[16365]: 2024-01-01T00:00:08.330-0700 7f28faa1f6c0 -1 mon.pve1@0(leader) e12 *** Got Signal Hangup ***

and ceph-s still times out.

WTH is the point of extracting the monmap and injecting it back into itself?
"You could also export the monmap, remove the two problematic MONs and inject it into the working MON"

The only way to get the monmap is to get it from the only working MON... then I inject it back? to what end?

I am about to see if injecting the monmap in pve3 helps if I change the monmap on both pve1 and 3 to include those 2 servers.
the problem is that pve3 cannot start because of an invalid argument exception and no way to strace it properly unless you know the code internals. While this may be a grand adventure, I have a cluster down for over a week that is simply caused by bad network communication and an inability to diagnose it with the given tools.

How am I suppossed to debug this?

Bash:

Jan 01 13:07:49 pve3 systemd[1]: Started ceph-mon@pve3.service - Ceph cluster monitor daemon.
░░ Subject: A start job for unit ceph-mon@pve3.service has finished successfully
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit ceph-mon@pve3.service has finished successfully.
░░
░░ The job identifier is 424607.
Jan 01 13:07:49 pve3 ceph-mon[219643]: terminate called after throwing an instance of 'std::invalid_argument'
Jan 01 13:07:49 pve3 ceph-mon[219643]:   what():  stoull
Jan 01 13:07:49 pve3 ceph-mon[219643]: *** Caught signal (Aborted) **
Jan 01 13:07:49 pve3 ceph-mon[219643]:  in thread 7f70e9009a00 thread_name:ceph-mon
Jan 01 13:07:49 pve3 ceph-mon[219643]:  ceph version 17.2.7 (e303afc2e967a4705b40a7e5f76067c10eea0484) quincy (stable)
Jan 01 13:07:49 pve3 ceph-mon[219643]:  1: /lib/x86_64-linux-gnu/libc.so.6(+0x3bfd0) [0x7f70e8c5afd0]
Jan 01 13:07:49 pve3 ceph-mon[219643]:  2: /lib/x86_64-linux-gnu/libc.so.6(+0x8ad3c) [0x7f70e8ca9d3c]
Jan 01 13:07:49 pve3 ceph-mon[219643]:  3: gsignal()
Jan 01 13:07:49 pve3 ceph-mon[219643]:  4: abort()
Jan 01 13:07:49 pve3 ceph-mon[219643]:  5: /lib/x86_64-linux-gnu/libstdc++.so.6(+0x9d919) [0x7f70e8a9d919]
Jan 01 13:07:49 pve3 ceph-mon[219643]:  6: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa8e1a) [0x7f70e8aa8e1a]
Jan 01 13:07:49 pve3 ceph-mon[219643]:  7: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa8e85) [0x7f70e8aa8e85]
Jan 01 13:07:49 pve3 ceph-mon[219643]:  8: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa90d8) [0x7f70e8aa90d8]
Jan 01 13:07:49 pve3 ceph-mon[219643]:  9: (std::__throw_invalid_argument(char const*)+0x40) [0x7f70e8aa0192]
Jan 01 13:07:49 pve3 ceph-mon[219643]:  10: (LogMonitor::log_external_backlog()+0xe29) [0x55e7b8e1f849]
Jan 01 13:07:49 pve3 ceph-mon[219643]:  11: (LogMonitor::update_from_paxos(bool*)+0x53) [0x55e7b8e224a3]
Jan 01 13:07:49 pve3 ceph-mon[219643]:  12: (Monitor::refresh_from_paxos(bool*)+0x163) [0x55e7b8d97a23]
Jan 01 13:07:49 pve3 ceph-mon[219643]:  13: (Monitor::preinit()+0x97a) [0x55e7b8dc597a]
Jan 01 13:07:49 pve3 ceph-mon[219643]:  14: main()
Jan 01 13:07:49 pve3 ceph-mon[219643]:  15: /lib/x86_64-linux-gnu/libc.so.6(+0x271ca) [0x7f70e8c461ca]
Jan 01 13:07:49 pve3 ceph-mon[219643]:  16: __libc_start_main()
Jan 01 13:07:49 pve3 ceph-mon[219643]:  17: _start()
Jan 01 13:07:49 pve3 ceph-mon[219643]: 2024-01-01T13:07:49.268-0700 7f70e9009a00 -1 *** Caught signal (Aborted) **
Jan 01 13:07:49 pve3 ceph-mon[219643]:  in thread 7f70e9009a00 thread_name:ceph-mon
Jan 01 13:07:49 pve3 ceph-mon[219643]:  ceph version 17.2.7 (e303afc2e967a4705b40a7e5f76067c10eea0484) quincy (stable)
Jan 01 13:07:49 pve3 ceph-mon[219643]:  1: /lib/x86_64-linux-gnu/libc.so.6(+0x3bfd0) [0x7f70e8c5afd0]
Jan 01 13:07:49 pve3 ceph-mon[219643]:  2: /lib/x86_64-linux-gnu/libc.so.6(+0x8ad3c) [0x7f70e8ca9d3c]
Jan 01 13:07:49 pve3 ceph-mon[219643]:  3: gsignal()
Jan 01 13:07:49 pve3 ceph-mon[219643]:  4: abort()
Jan 01 13:07:49 pve3 ceph-mon[219643]:  5: /lib/x86_64-linux-gnu/libstdc++.so.6(+0x9d919) [0x7f70e8a9d919]
Jan 01 13:07:49 pve3 ceph-mon[219643]:  6: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa8e1a) [0x7f70e8aa8e1a]
Jan 01 13:07:49 pve3 ceph-mon[219643]:  7: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa8e85) [0x7f70e8aa8e85]
Jan 01 13:07:49 pve3 ceph-mon[219643]:  8: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa90d8) [0x7f70e8aa90d8]
Jan 01 13:07:49 pve3 ceph-mon[219643]:  9: (std::__throw_invalid_argument(char const*)+0x40) [0x7f70e8aa0192]
Jan 01 13:07:49 pve3 ceph-mon[219643]:  10: (LogMonitor::log_external_backlog()+0xe29) [0x55e7b8e1f849]
Jan 01 13:07:49 pve3 ceph-mon[219643]:  11: (LogMonitor::update_from_paxos(bool*)+0x53) [0x55e7b8e224a3]

steele · Jan 1, 2024

So, it finished successfully, but threw a major exception... How does that work exactly?

ceph-crash won't write files so...
no crash reports to look at...
says it's running, but also that it couldn't connect to the cluster, so which is it? running or failed?

Code:

 ceph-crash.service - Ceph crash dump collector
     Loaded: loaded (/lib/systemd/system/ceph-crash.service; enabled; preset: enabled)
     Active: active (running) since Wed 2023-12-27 14:51:21 MST; 4 days ago
   Main PID: 1218 (ceph-crash)
      Tasks: 1 (limit: 231699)
     Memory: 23.0M
        CPU: 7h 58min 19.994s
     CGroup: /system.slice/ceph-crash.service
             └─1218 /usr/bin/python3 /usr/bin/ceph-crash

Jan 01 12:58:38 pve3 ceph-crash[1218]: [errno 2] RADOS object not found (error connecting to the cluster)
Jan 01 12:58:39 pve3 ceph-crash[1218]: WARNING:ceph-crash:post /var/lib/ceph/crash/2023-12-24T16:57:20.334107Z_7abdea8b-cdfa-477a-98d1>
Jan 01 12:58:39 pve3 ceph-crash[1218]: 2024-01-01T12:58:39.019-0700 7fa8fd7ae6c0 -1 AuthRegistry(0x7fa8f8060a50) no keyring found at />
Jan 01 12:58:39 pve3 ceph-crash[1218]: 2024-01-01T12:58:39.023-0700 7fa8fd7ae6c0 -1 auth: unable to find a keyring on /etc/ceph/ceph.c>
Jan 01 12:58:39 pve3 ceph-crash[1218]: 2024-01-01T12:58:39.023-0700 7fa8fd7ae6c0 -1 AuthRegistry(0x7fa8f8065dc0) no keyring found at />
Jan 01 12:58:39 pve3 ceph-crash[1218]: 2024-01-01T12:58:39.023-0700 7fa8fd7ae6c0 -1 auth: unable to find a keyring on /etc/ceph/ceph.c>
Jan 01 12:58:39 pve3 ceph-crash[1218]: 2024-01-01T12:58:39.023-0700 7fa8fd7ae6c0 -1 AuthRegistry(0x7fa8fd7ad3c0) no keyring found at />
Jan 01 12:58:39 pve3 ceph-crash[1218]: [errno 2] RADOS object not found (error connecting to the cluster)
Jan 01 12:58:39 pve3 ceph-crash[1218]: WARNING:ceph-crash:post /var/lib/ceph/crash/2023-12-24T16:57:20.334107Z_7abdea8b-cdfa-477a-98d1>
Jan 01 12:58:39 pve3 ceph-crash[1218]: [errno 13] RADOS permission denied (error connecting to the cluster)

Demanding money is surely not going to help fix this, it will make me run away.
OH, I know, standard answer from marketing... buy more and better hardware, even though this has worked perfectly fine for the last year.

Revealing how to diagnose the failures would help as the logs and crash module fail to start.
Sorry, you're on your own, that is beyond a brief glance and not worthy of our time.

I rebuilt from OSDs last year.
Is this now a standard condition I need to prepare for?
Power failed...
Flush ceph, reinstall it and rebuild from OSDs.

That is a ludicrous suggestion.

This should be recoverable.

I would say the Manual Pages are far out of date for this.
While they do cover some basic high level ways to try to fix this,
There are also many reports of following the instructions fail.
Where are the suggestions for editing the DB?
There aren't any...
What about problems from container?
exteremely little documentation to even follow, that is IF you actually know docker internals.

Without cluster communications to the OSDs, it is simply NOT POSSIBLE to recover and I have a trash datacenter.

Without any ability to see what is actually a root cause of the problem results in suggestions to rebuild from scratch...
Also ludicrous... Why is the monmap in an inaccessible DB and not stored in a config file? Some seriously old-style proprietary nonsense going on there.

Without the ability to fix the cluster due to an inability to see the attempts in the software to try to establish a cluster, then this is a blind, black box system that is neither resilient, nor reliable for any sort of planned disaster recovery.

Suggesting a 3rd party written `script` to even recover from OSDs is a terrible plan for telling customers how they will be dealing with disaster recovery on an `open` system `supposedly` designed for disaster recovery. It's obviously not, it's designed to be a distrubuted file system, but if there is no way to see what is going on when the distributed nodes try to form a network, then it totally fails in it's design there too.

While it works, it's pretty great, when it fails, pure misery.

I am not complaining because it is hard... Show me an actual testable solution that shows real thought to mitigate disasters.
ceph does not have one and Proxmox just wrapping things and FURTHER hiding what is happening just makes it all the more worse.

I am complaining because this is something easily fixable and staff are blaming incompetent users for a fundamental programming failure.
Yes, yes, open source, make a contribution...

I plan to because no one should suffer this and then get insulted for being an idiot by staff.

steele · Jan 3, 2024

so again, this is either a BUG, as it has been reported a few times, or it is down at the CephX level and I finally found a decent way to trace this, but I don't understand the problem...

The file is there. I can't change the permissions...

BobhWasatch · Jan 3, 2024

Maybe if you scream at the staff some more they will help you fix it.

aaron · Jan 4, 2024

@steele please be aware that this is a community forum. No one here is obligated to spend time trying to help you to fix the issue. It is therefore in your best interest to keep the interaction civil and respectful. A condescending tone will only turn people away.

steele said:
WTH is the point of extracting the monmap and injecting it back into itself?
"You could also export the monmap, remove the two problematic MONs and inject it into the working MON"

The only way to get the monmap is to get it from the only working MON... then I inject it back? to what end?

The situation is that 2 out of 3 MONs throw weird errors when starting up. One MON is still good.

With the assumption that everything else is okay:

network between the nodes works
OSD services can start
MGR can start

the issue is that there are not enough MONs available to form a Quorum. By exporting the monmap from the last working MON and removing the two broken MONs from it before you inject the changed MONMAP again, you can get the last working MON to be quorate by itself, as it is unaware of the other two MONs.
This way, chances are good that the cluster will be operational again.

Before you can recreate the broken MONs from scratch, you will need to manually clean them up. See my previous comment.

I see you did inject a MONMAP with just the working MON. Did you stop the MON before you exported and injected it? The log output in the status does look okay.

You can verify what the MON sees by connecting to its socket and running mon_status. The Ceph docs explain how https://docs.ceph.com/en/reef/rados...hooting-mon/#using-the-monitor-s-admin-socket

It will show the state it is in and the internal monmap. Hopefully only that mon is listed. If that is the case, restart all Ceph services in the cluster so that they try to connect to the only MON listed in the ceph.conf. For example: systemctl restart ceph.target

Search

Search

ceph Bug or fixable? terminate called after throwing an instance of 'std::invalid_argument'

steele

New Member

gurubert

Distinguished Member

steele

New Member

gurubert

Distinguished Member

steele

New Member

gurubert

Distinguished Member

quanto11

Member

gurubert

Distinguished Member

aaron

Proxmox Staff Member

steele

New Member

steele

New Member

steele

New Member

Attachments

BobhWasatch

Famous Member

aaron

Proxmox Staff Member

We value your privacy