Trouble with ceph on PVE 5

hidalgo

Well-Known Member
Nov 11, 2016
60
0
46
57
Just upgraded my PVE 4.4 with ceph to PVE 5. Everything seemed to go well but then I got trouble with ceph:
  • GUI cannot load OSDs
  • Health: HEALTH_WARN, no active mgr
  • services: mgr: no daemons active
What went wrong? Please help! Thanks
 
BTW
Code:
proxmox-ve: 5.0-16 (running kernel: 4.10.17-1-pve)
pve-manager: 5.0-23 (running version: 5.0-23/af4267bf)
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.67-1-pve: 4.4.67-92
pve-kernel-4.4.59-1-pve: 4.4.59-87
pve-kernel-4.10.17-1-pve: 4.10.17-16
libpve-http-server-perl: 2.0-5
lvm2: 2.02.168-pve2
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-12
qemu-server: 5.0-14
pve-firmware: 2.0-2
libpve-common-perl: 5.0-16
libpve-guest-common-perl: 2.0-11
libpve-access-control: 5.0-5
libpve-storage-perl: 5.0-12
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.0-9
pve-qemu-kvm: 2.9.0-2
pve-container: 2.0-14
pve-firewall: 3.0-2
pve-ha-manager: 2.0-2
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.0.8-3
lxcfs: 2.0.7-pve2
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.6.5.9-pve16~bpo90
ceph: 12.1.1-1~bpo80+1
 
what does
Code:
ceph status

say?
 
what does
Code:
ceph status

say?
Code:
ceph status
  cluster:
    id:     e464de46-c48e-4df9-b7ce-6288d78dea5e
    health: HEALTH_WARN
            no active mgr
 
  services:
    mon: 1 daemons, quorum 0
    mgr: no daemons active
    osd: 8 osds: 8 up, 8 in
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 bytes
    usage:   0 kB used, 0 kB / 0 kB avail
    pgs:

But the VMs are running and from the users point of view everything seems ok.
 
I think I did. I have the correct source list
Code:
cat /etc/apt/sources.list.d/ceph.list
deb http://download.proxmox.com/debian/ceph-luminous stretch main
But, what can I do to correct this error?
 
do you have any other files in /etc/apt/sources.list.d/ which reference ceph?
or do you have an entry in /etc/apt/sources.list ?

if yes remove it, and try to downgrade to our ceph version, but no guarantees this will work
 
I guess, I’m a big step ahead. Now I have ceph running, the cluster fsid matchs, but no osd.
I tried to add my osds to the crushmap but they don’t stay there. ceph -s still says no osd.
ceph-disk list gives me
Code:
/dev/sda :
 /dev/sda1 ceph data, active, cluster ceph, osd.2, journal /dev/nvme0n1p6
/dev/sdb :
 /dev/sdb1 ceph data, active, cluster ceph, osd.4, journal /dev/nvme0n1p8
Any further advice?
 
what does ceph status now say ?
and the following:

Code:
systemctl status ceph ceph-osd
ls /var/lib/ceph/osd/
 
what does ceph status now say ?
and the following:

Code:
systemctl status ceph ceph-osd
ls /var/lib/ceph/osd/

Thank you very much for your time and advice, but I gave up. I set up a new cluster and I’m restoring now from my backup.
Very strange: after a few tests I found out, that my journal ssd is obsolete now. ;-)
 
I just set up a testing system with Proxmox 4.4 and upgraded from Jewel to Luminous https://pve.proxmox.com/wiki/Ceph_Jewel_to_Luminous and ran into the same situation like before.
Code:
Package versions
proxmox-ve: 4.4-92 (running kernel: 4.4.67-1-pve)
pve-manager: 4.4-15 (running version: 4.4-15/7599e35a)
pve-kernel-4.4.44-1-pve: 4.4.44-84
pve-kernel-4.4.67-1-pve: 4.4.67-92
pve-kernel-4.4.19-1-pve: 4.4.19-66
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-52
qemu-server: 4.0-110
pve-firmware: 1.1-11
libpve-common-perl: 4.0-95
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-101
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.9-pve15~bpo80
ceph: 12.1.1-1~bpo80+1
Is it possible that the Howto and or the upgrade path is wrong?
 
Is it possible that the Howto and or the upgrade path is wrong?
at the moment the luminous packages from ceph.com are one version above ours (12.1.1 vs 12.1.0) but an update to our repository is coming soon
 
Having the exact same problem after upgrading to ceph 12.1.1. Any sollution?

yes - (re)start the ceph-mgr services or create new ceph-mgr instances via pveceph if you don't have any yet.
 
I ran into the same problem. I am testing the migration from PVE 4.4 with ceph jewel to promox 5.0 on a test cluster.
So I firts migrated from ceph jewel to luminous following the documention, then migrated from jessie to stretch.

I ended with this ceph package, as reported above :
Code:
# pveversion -v
...
ceph: 12.1.2-1~bpo90+1

I looked at another cluster where I installed proxmox 5.0 beta, and the configuration for the repository is this one :
Code:
# cat /etc/apt/sources.list.d/ceph.list
deb http://download.proxmox.com/debian/ceph-luminous stretch main

As said in the release notes, ceph packages are now compiled directly by proxmox team.

The original repository configuration is the follwing :
Code:
# cat /etc/apt/sources.list.d/ceph.list
deb http://download.ceph.com/debian-luminous jessie main

So not only you have to change the repository from jessie to stretch, but you have to change the repository address from ceph.com to proxmox.com. This is this step that is not indicated in the proxmox 4.4 to 5.0 upgrade documentation.

The documentation for upgrading from ceph jewel to lumious is within 4.x, so keeping the packages from ceph.com.
The documentaion for upgrading from 4.x to 5.0 only says changing jessie to stretch, but does not says to change the repository from ceph.com to proxmox.com (/debian/ceph-luminous).

With this change I have now :
Code:
# pveversion -v
....
ceph: 12.1.2-pve1

which is I think the correct version.

Now, I have I think to create a new ceph-mgr, using 'pveceph createmgr which is now available.

P.S : I just noted that the change for the address of the ceph repository was in fact in the upgrade documentation. So my bad...

Replace ceph.com repositories with proxmox.com ceph repositories This step is only necessary if you have a ceph cluster on your PVE installation.


echo "deb http://download.proxmox.com/debian/ceph-luminous stretch main" > /etc/apt/sources.list.d/ceph.list
 
Last edited:
For completion, I managd to deal with the warning 'no active mgr', by creating one with pveceph.

Code:
# pveceph createmgr
creating manager directory '/var/lib/ceph/mgr/ceph-prox-nest3'
creating keys for 'mgr.prox-nest3'
setting owner for directory
enabling service 'ceph-mgr@prox-nest3.service'
Created symlink /etc/systemd/system/ceph-mgr.target.wants/ceph-mgr@prox-nest3.service -> /lib/systemd/system/ceph-mgr@.service.
starting service 'ceph-mgr@prox-nest3.service'

Then :
Code:
# ceph -s
  cluster:
    id:     486f2cf4-ca81-46cd-8947-b7e0a6e8e47e
    health: HEALTH_WARN
            application not enabled on 1 pool(s)

  services:
    mon: 3 daemons, quorum 0,1,2
    mgr: prox-nest3(active)
    osd: 9 osds: 9 up, 9 in

  data:
    pools:   2 pools, 576 pgs
    objects: 1596 objects, 6208 MB
    usage:   18757 MB used, 881 GB / 899 GB avail
    pgs:     576 active+clean

I have the curious warning 'application not enabled on 1 pool(s)' but I don't know wihch application it is. There is a bug opened on ceph tracker :
http://tracker.ceph.com/issues/20891
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!