[SOLVED] Ceph-mgr failing to start on first node due to mask

mbailon

New Member
May 28, 2023
2
0
1
Currently assessing ProxMox for our corporate use, I came across an error after upgrading all my test nodes to version 8 and Ceph to Reef.


Trying to get rid of my "Health_Warn" on my 3 node PVE cluster with Ceph. I upgraded to 8.1 and brought Ceph to Reef, however, I cannot achieve green health status. After digging around (ceph -s), it states no active manager (HEALTH WARN - no active mgr). Trying to start the ceph-mgr.target service manually, I get this error:

1706019732500.png

When trying to remove the mgr from the GUI, I get this error.

1706019765076.png


Any help would be great!
 
Last edited:
Looks like it was a basic Linux sysadmin issue where the system file unit was masked. I ran the command
Code:
systemctl unmask ceph-mgr.target
which unmasked the target file, but had the unintended effect of deleting the target file. I then reran
Code:
apt install ceph-mgr
which reinstalled the target file, thus being able to start the manager.

This was done on 2 of my 3 nodes (my third node was a reinstall straight to version 8). Oddly enough, this masking happened on 2 nodes that were upgraded from 7 to 8.