ceph-crash problem

ITT · Sep 14, 2023

yes, on all nodes.

radim.smehlik · Sep 14, 2023

In my opinion, the better option is only on one node run either ceph crash archive-all or ceph crash rm <crashid>.

ceph crash archive-all - reports are no longer considered for the RECENT_CRASH health check and does not appear in the crash ls-new output (it will still appear in the crash ls output - so you can analyze them in the future).

ceph crash rm <crashid> - removes specific crash report.

Both commands also invoke refresh_health_checks() function.

ITT · Sep 14, 2023

radim.smehlik said:
In my opinion, the better option is only on one node run either ceph crash archive-all or ceph crash rm <crashid>.

ceph crash archive-all - reports are no longer considered for the RECENT_CRASH health check and does not appear in the crash ls-new output (it will still appear in the crash ls output - so you can analyze them in the future).

ceph crash rm <crashid> - removes specific crash report.

Both commands also invoke refresh_health_checks() function.

Done all but does not solve the remaining issue.
So, manual deleting was the only way to get rid off the annoying messages in syslog..

GAS · Oct 26, 2023

ITT said:
Done all but does not solve the remaining issue.
So, manual deleting was the only way to get rid off the annoying messages in syslog..

How do I delete these all it takes my all space in /var/lib/ceph/crash/posted

2023-03-26_14:46:57.391125Z_27286eae-78c1-4f23-ad64-42b23403cd42 2023-10-18_09:28:35.630719Z_12f663c8-de5a-4b31-be09-52dfba67e369
2023-03-26_15:03:41.483267Z_6b6b22f2-ad09-471e-a44b-b36b99de011f 2023-10-18_09:41:11.153720Z_2f66b343-8ef6-42ce-8dbf-80f7b8860e0e
2023-03-26_15:20:50.403741Z_e2da09bd-dc4f-473a-85e3-fc701a79c35a 2023-10-18_09:53:43.467781Z_4416d532-2f01-47b3-b524-d6c8e60eef41
2023-03-26_15:37:19.519444Z_5b07b0da-9207-4f89-b3bc-5b146977a956 2023-10-18_10:05:55.115032Z_7b612d81-9f57-4ecb-9796-5ef26ed1409d
2023-03-26_15:52:59.275860Z_00e43e8a-5757-4273-b3b2-6a83efdf3aaf 2023-10-18_10:19:12.930650Z_8be02204-bc95-47b8-88be-6e522370464f
2023-03-26_16:08:44.063938Z_3ac5e1d0-81c7-4e12-b94a-029ab1076a00 2023-10-18_10:34:39.344234Z_498de363-7ca0-4041-ba86-9c5afcc07d97
2023-03-26_16:24:19.822101Z_650f0bbf-6162-4dcf-8cc6-ed169037c59f 2023-10-18_10:50:39.033067Z_c16ddb7c-a15a-4adb-bfe3-7775e857b4be
2023-03-26_16:39:21.376618Z_c2e868e3-618a-4ed8-b475-e529b9fcfefe 2023-10-18_11:04:56.304781Z_f51489cd-4874-491a-8113-ebd7bbfa9fe2
2023-03-26_16:54:32.491550Z_23e274a6-8d7b-4c9c-ab63-1e4e838e966a 2023-10-18_11:17:14.777913Z_eee9ebc9-c613-4677-8769-6c87e0ba9e87

sb-jw · Nov 18, 2023

Ghous Ali Shah said:
How do I delete these all

ceph crash prune 1
(will delete all Crash Reports older than 1 day)

ahovda · Jan 8, 2024

I added separate auth for ceph-crash with ceph auth get-or-create client.crash mon 'profile crash' mgr 'profile crash' and added it to a dedicated keyring /etc/pve/ceph.client.crash.keyring.

Then I added the following to ceph.conf to make sure ceph-crash could pick it up.

Code:

[client.crash]
        keyring = /etc/pve/$cluster.$name.keyring

I also had to change ownership on /var/lib/ceph/crash/ on all the nodes since the posted/ subfolder was still owned by root.
chown -R ceph: /var/lib/ceph/crash/

The key will be public for all cluster nodes, but at least the client is limited to the crash profile. Crash reporting works now, including archiving and pruning.

radim.smehlik · Jan 8, 2024

Thank you for your post! Could you add it to the Bugzilla – Bug 4759?

ahovda · Jan 8, 2024

radim.smehlik said:
Thank you for your post! Could you add it to the Bugzilla – Bug 4759?

https://bugzilla.proxmox.com/show_bug.cgi?id=4759#c7

BenediktS · Jan 18, 2024

@ahovda : With your changes i don't get the
auth: unable to find a keyring on /etc/pve/priv/ceph.client.crash.keyring:
anymore. Thx for that.

But i still get

Jan 18 16:47:06 proxh5 ceph-crash[34121]: 2024-01-18T16:47:06.782+0100 7f8dc7e926c0 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.admin.keyring: (13) Permission denied
Jan 18 16:47:06 proxh5 ceph-crash[34121]: 2024-01-18T16:47:06.782+0100 7f8dc7e926c0 -1 monclient: keyring not found

Any idea what that coiuld be? Did you have these messages as well?

[6 days later]

OK, the two messages only show up, on the startup of the ceph-crash service. The running service does not throw any errors anymore with the workaround. And it reports the crashes correctly.

Madscientist · Jan 19, 2024

is it advisable to upgrade ceph from quincy to reef with this error ? any advise ? coz, i have a critical production cluster , directly upgrading will have any impact rather than troubleshooting ?

BenediktS · Jan 23, 2024

I checked in today. The two messages only show up, on the startup of the ceph-crash service. The running service does not throw any errors anymore (with the workaround). And it reports the crashes correctly.

gio2022 · Feb 6, 2024

Hello,
please check this post.
https://forum.proxmox.com/threads/ceph-warning-post-upgrade-to-v8.129371/page-6#post-630195
It could help too.
Regards.

Search

Search

ceph-crash problem

ITT

Renowned Member

radim.smehlik

Member

ITT

Renowned Member

GAS

New Member

sb-jw

Famous Member

ahovda

Active Member

radim.smehlik

Member

ahovda

Active Member

BenediktS

Member

Madscientist

Member

BenediktS

Member

gio2022

Member

We value your privacy