ceph-crash problem

In my opinion, the better option is only on one node run either ceph crash archive-all or ceph crash rm <crashid>.

ceph crash archive-all - reports are no longer considered for the RECENT_CRASH health check and does not appear in the crash ls-new output (it will still appear in the crash ls output - so you can analyze them in the future).

ceph crash rm <crashid> - removes specific crash report.

Both commands also invoke refresh_health_checks() function.
 
In my opinion, the better option is only on one node run either ceph crash archive-all or ceph crash rm <crashid>.

ceph crash archive-all - reports are no longer considered for the RECENT_CRASH health check and does not appear in the crash ls-new output (it will still appear in the crash ls output - so you can analyze them in the future).

ceph crash rm <crashid> - removes specific crash report.

Both commands also invoke refresh_health_checks() function.
Done all but does not solve the remaining issue.
So, manual deleting was the only way to get rid off the annoying messages in syslog..
 
Last edited:
Done all but does not solve the remaining issue.
So, manual deleting was the only way to get rid off the annoying messages in syslog..
How do I delete these all it takes my all space in /var/lib/ceph/crash/posted


2023-03-26_14:46:57.391125Z_27286eae-78c1-4f23-ad64-42b23403cd42 2023-10-18_09:28:35.630719Z_12f663c8-de5a-4b31-be09-52dfba67e369
2023-03-26_15:03:41.483267Z_6b6b22f2-ad09-471e-a44b-b36b99de011f 2023-10-18_09:41:11.153720Z_2f66b343-8ef6-42ce-8dbf-80f7b8860e0e
2023-03-26_15:20:50.403741Z_e2da09bd-dc4f-473a-85e3-fc701a79c35a 2023-10-18_09:53:43.467781Z_4416d532-2f01-47b3-b524-d6c8e60eef41
2023-03-26_15:37:19.519444Z_5b07b0da-9207-4f89-b3bc-5b146977a956 2023-10-18_10:05:55.115032Z_7b612d81-9f57-4ecb-9796-5ef26ed1409d
2023-03-26_15:52:59.275860Z_00e43e8a-5757-4273-b3b2-6a83efdf3aaf 2023-10-18_10:19:12.930650Z_8be02204-bc95-47b8-88be-6e522370464f
2023-03-26_16:08:44.063938Z_3ac5e1d0-81c7-4e12-b94a-029ab1076a00 2023-10-18_10:34:39.344234Z_498de363-7ca0-4041-ba86-9c5afcc07d97
2023-03-26_16:24:19.822101Z_650f0bbf-6162-4dcf-8cc6-ed169037c59f 2023-10-18_10:50:39.033067Z_c16ddb7c-a15a-4adb-bfe3-7775e857b4be
2023-03-26_16:39:21.376618Z_c2e868e3-618a-4ed8-b475-e529b9fcfefe 2023-10-18_11:04:56.304781Z_f51489cd-4874-491a-8113-ebd7bbfa9fe2
2023-03-26_16:54:32.491550Z_23e274a6-8d7b-4c9c-ab63-1e4e838e966a 2023-10-18_11:17:14.777913Z_eee9ebc9-c613-4677-8769-6c87e0ba9e87
 
I added separate auth for ceph-crash with ceph auth get-or-create client.crash mon 'profile crash' mgr 'profile crash' and added it to a dedicated keyring /etc/pve/ceph.client.crash.keyring.

Then I added the following to ceph.conf to make sure ceph-crash could pick it up.
Code:
[client.crash]
        keyring = /etc/pve/$cluster.$name.keyring
I also had to change ownership on /var/lib/ceph/crash/ on all the nodes since the posted/ subfolder was still owned by root.
chown -R ceph: /var/lib/ceph/crash/

The key will be public for all cluster nodes, but at least the client is limited to the crash profile. Crash reporting works now, including archiving and pruning.
 
Last edited:
@ahovda : With your changes i don't get the
auth: unable to find a keyring on /etc/pve/priv/ceph.client.crash.keyring:
anymore. Thx for that.

But i still get
Jan 18 16:47:06 proxh5 ceph-crash[34121]: 2024-01-18T16:47:06.782+0100 7f8dc7e926c0 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.admin.keyring: (13) Permission denied Jan 18 16:47:06 proxh5 ceph-crash[34121]: 2024-01-18T16:47:06.782+0100 7f8dc7e926c0 -1 monclient: keyring not found

Any idea what that coiuld be? Did you have these messages as well?

[6 days later]

OK, the two messages only show up, on the startup of the ceph-crash service. The running service does not throw any errors anymore with the workaround. And it reports the crashes correctly.
 
Last edited:
is it advisable to upgrade ceph from quincy to reef with this error ? any advise ? coz, i have a critical production cluster , directly upgrading will have any impact rather than troubleshooting ?
 
I checked in today. The two messages only show up, on the startup of the ceph-crash service. The running service does not throw any errors anymore (with the workaround). And it reports the crashes correctly.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!