CEPH issue during upgrade

ronnieponnie

New Member
Dec 17, 2021
2
1
1
43
Hi All,

In an attempt to upgrade a Proxmox5 cluster to Proxmox6 (and eventually to Proxmox7), we are facing an issue with CEPH.
The upgrade to Proxmox6 as preliminary step went smooth, and successful.

The problem appreared during step #15 ,from the procedure : https://pve.proxmox.com/wiki/Ceph_Luminous_to_Nautilus
'ceph mon enable-msgr2'

After enabling the 'msgr2' command the ceph CLI commands all timeout, like pveceph and CEPH GUI commands.
We first found errors in the /var/log/ceph/mon log files about the keyring files, but we seemed to have fixed this as these errors are gone now, however CEPH is stil unresponsive. The CEPH processes (OSD/MON) are running, and do initiate network traffic on port 6789 and 3300 (as seen with tcpdump).

At this moment the CEPH storage is unusable, and we are not sure if we can fix this.

One of our questions is:
  1. can we 'undo' the msgr2 command while we cannot communicate with the CEPH daemon? Can we add something in the ceph.conf file for this?
  2. can we re-create the monitors from scratch, while maintaining the OSD's, and their data (VM images)? Is there a procedure for this?


Thanks for any help or pointers to some procedures


Ronnie
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!