mon.pxceph crashed on host pxceph, need help please.

nttec · Dec 20, 2019

We upgraded our proxmox from 5.4 to 6.1.5, so far some of the issue we encounter has been solved but one that we can't solve.

And from Luminous to Nautilus

We got a Health_warn that says

mon.pxceph crashed on host pxceph at 2019-12-20 20:48:37.776154Z

the time frame is still the same even if we restart the monitor

Also in our logs when we are restarting there is this error

Dec 20 18:38:20 pxceph ceph-mon[142699]: 2019-12-20 18:38:20.555 7fc093ff4280 -1 WARNING: 'mon addr' config option v1:10.10.20.20:6789/0 does not match monmap file
Dec 20 18:38:20 pxceph ceph-mon[142699]: continuing with monmap configuration
Dec 20 18:38:21 pxceph ceph-mon[142699]: 2019-12-20 18:38:21.179 7fc08baf4700 -1 mon.pxceph@0(electing) e6 failed to get devid for : fallback method has serial ''but no model
Dec 20 18:38:21 pxceph ceph-mon[142699]: 2019-12-20 18:38:21.199 7fc08baf4700 -1 mon.pxceph@0(electing) e6 failed to get devid for : fallback method has serial ''but no model

We are following this guide URL: https://pve.proxmox.com/wiki/Ceph_Luminous_to_Nautilus

we also applied this part of the guide

Enable msgrv2 protocol and update Ceph configuration
To enable the new v2 network protocol, issue the following command:

ceph mon enable-msgr2

This will instruct all monitors that bind to the old default port 6789 for the legacy v1 protocol to also bind to the new 3300 v2 protocol port. To see if all monitors have been updated run

ceph mon dump

and verify that each monitor has both a v2: and v1: address listed.

Updating /etc/pve/ceph.conf

For each host that has been upgraded, you should update your /etc/pve/ceph.conf file so that it either specifies no monitor port (if you are running the monitors on the default ports) or references both the v2 and v1 addresses and ports explicitly. Things will still work if only the v1 IP and port are listed, but each CLI instantiation or daemon will need to reconnect after learning the monitors also speak the v2 protocol, slowing things down a bit and preventing a full transition to the v2 protocol.

It is recommended to add all monitor ips (without port) to 'mon_host' in the global section like this:

[global]
...
mon_host = 10.0.0.100 10.0.0.101 10.0.0.102

But ending up with the same result. Can anyone please enlighten us, on what we should do.

cat /etc/ceph.ceph.conf

[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
cluster network = 10.10.10.0/24
debug ms = 0/0
fsid = 672d9ca3-b4b4-4313-9ecb-1dd02e8da71d
mon allow pool delete = true
mon_host = 10.10.20.20 10.10.20.21 10.10.20.22
osd deep scrub interval = 1209600
osd scrub begin hour = 19
osd scrub end hour = 6
osd scrub sleep = 0.1
osd journal size = 5120
osd pool default min size = 2
osd pool default size = 3
public network = 10.10.20.0/24
bluestore_block_db_size = 40000000000

[client]
keyring = /etc/pve/priv/$cluster.$name.keyring

[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon.pxceph]
host = pxceph
mon addr = 10.10.20.20:6789

[mon.pxceph2]
host = pxceph2
mon addr = 10.10.20.21:6789

[mon.pxceph3]
host = pxceph3
mon addr = 10.10.20.22:6789

And on the GUI Ceph>Monitor it shows that all monitor is running.

sg90 · Dec 21, 2019

Sounds like you just have the Crash module enabled : https://docs.ceph.com/docs/master/mgr/crash/

Run : ceph crash archive-all

This should remove the entry showing in the health output

nttec · Dec 21, 2019

sg90 said:
Sounds like you just have the Crash module enabled : https://docs.ceph.com/docs/master/mgr/crash/

Run : ceph crash archive-all

This should remove the entry showing in the health output

Thank you that solve it.

Search

Search

mon.pxceph crashed on host pxceph, need help please.

nttec

Well-Known Member

sg90

Renowned Member

nttec

Well-Known Member