Upgraded one node from 7.4 to 8 and Ceph broke for whole cluster

waynevaghi

Member
Jan 11, 2021
16
0
6
39
Hi All,

I am not an experienced Linux engineer but this has become my problem to try and fix.
I upgraded one node on a 3 node cluster from 7.4.17 to 8 following this guide:
Upgrade from 7 to 8 - Proxmox VE

The upgrade was only done on one node. I followed all steps in the guide carefully. After rebooting the one node that was upgraded the whole site became unresponsive.
Ceph is not working even though I have 2 nodes up.

Please help me with this issue, we have a virtual router running on here so the site is down while the Proxmox system is down.

Thank you,
 
Ceph is not working even though I have 2 nodes up.
That's normal because Ceph needs at lest 3 nodes, which means you were running Ceph without any redundancy. That's bad and you probably need to fix this first because doing an in-place major upgrade. I have no experience with this but maybe Ceph needs to be upgraded separately from Proxmox?
Please help me with this issue, we have a virtual router running on here so the site is down while the Proxmox system is down.
Maybe scrap the upgraded node, wipe it, install a fresh Proxmox 7.4 and add it to the cluster. That might give you time to investigate or practice the upgrade on a (virtual) test cluster?
Or maybe use one of your support subscription tickets for professional help, if it is that urgent?

EDIT: Maybe you can copy some error messages or add more details about the problem you got (and at which step) for other knowledgeable people to respond to?
 
Last edited:
  • Like
Reactions: Kingneutron
Thank you for the reply.
Because the cluster is down I have to access the site through a 5G device with out of band management to the cluster. So I only have access to the Proxmox cli.
Is it possible to reinstall ceph without losing my configuration?
Or what logs would I need to post here to help see the cause of the issue?