PVE upgrade to PMX 6.8.12-2 and Ceph 18.2.4 - oddity

mcgarrah · Oct 8, 2024

Encountered a weird set of events after my upgrade today. Ceph and CephFS would not come back up after my reboot for a new kernel 6.8.12-2. I have a known issue with Ceph not coming back up after upgrades based on limited disk space blocking the Ceph services startups but I had that handled. After cleaning up disk to under 77% used and rebooting to get the new kernel in place, I got a bunch of HTTP 500 errors in the webui that I could not find in logs. Those errors popped up from the web interface when looking at anything Ceph related.

Punting a bit, I tried running pveceph status from PuTTY session which returned this:

Code:

root@pve1:~# pveceph status
command 'ceph -s' failed: got timeout

Digging into the system logs in the console didn't show anything useful except my OSDs were not responding in time. Since this is just a set of test boxes I use for reviewing changes before upgrading my main cluster, I just rebooted each one a second time and everything eventually came back up after a long cycle on the OSDs starting up piece meal. Corosync and the PVE Cluster never seemed to be encountering issues but only the Ceph pieces.

Full disclosure: This is a highly constrained Dell Wyse 3040 three node cluster that is sliding on the lower edge of what is viable for Proxmox 8. It also was installed with Debian and converted over to Proxmox because of the eMMC boot storage is not supported by Proxmox installer. So this is not a standard cluster. I'm just mentioning this in case somebody else hits something odd.

PVE upgrade to PMX 6.8.12-2 and Ceph 18.2.4 - oddity

mcgarrah

Member

We value your privacy