Trouble with ceph on PVE 5

hidalgo · Jul 19, 2017

Just upgraded my PVE 4.4 with ceph to PVE 5. Everything seemed to go well but then I got trouble with ceph:

GUI cannot load OSDs
Health: HEALTH_WARN, no active mgr
services: mgr: no daemons active

What went wrong? Please help! Thanks

dcsapak · Jul 19, 2017

did you follow https://pve.proxmox.com/wiki/Ceph_Jewel_to_Luminous

hidalgo · Jul 19, 2017

Yes, exactly.

hidalgo · Jul 19, 2017

BTW

Code:

proxmox-ve: 5.0-16 (running kernel: 4.10.17-1-pve)
pve-manager: 5.0-23 (running version: 5.0-23/af4267bf)
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.67-1-pve: 4.4.67-92
pve-kernel-4.4.59-1-pve: 4.4.59-87
pve-kernel-4.10.17-1-pve: 4.10.17-16
libpve-http-server-perl: 2.0-5
lvm2: 2.02.168-pve2
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-12
qemu-server: 5.0-14
pve-firmware: 2.0-2
libpve-common-perl: 5.0-16
libpve-guest-common-perl: 2.0-11
libpve-access-control: 5.0-5
libpve-storage-perl: 5.0-12
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.0-9
pve-qemu-kvm: 2.9.0-2
pve-container: 2.0-14
pve-firewall: 3.0-2
pve-ha-manager: 2.0-2
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.0.8-3
lxcfs: 2.0.7-pve2
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.6.5.9-pve16~bpo90
ceph: 12.1.1-1~bpo80+1

dcsapak · Jul 19, 2017

what does

Code:

ceph status

say?

hidalgo · Jul 19, 2017

dcsapak said:
what does

Code:

ceph status

say?

Code:

ceph status
  cluster:
    id:     e464de46-c48e-4df9-b7ce-6288d78dea5e
    health: HEALTH_WARN
            no active mgr
 
  services:
    mon: 1 daemons, quorum 0
    mgr: no daemons active
    osd: 8 osds: 8 up, 8 in
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 bytes
    usage:   0 kB used, 0 kB / 0 kB avail
    pgs:

But the VMs are running and from the users point of view everything seems ok.

dcsapak · Jul 19, 2017

hidalgo said:
ceph: 12.1.1-1~bpo80+1

it seems you did not follow https://pve.proxmox.com/wiki/Upgrade_from_4.x_to_5.0
because there it says to use our ceph repositories (which are at 12.1.0 currently and have a different naming scheme, e.g. 12.1.0-pve2)

hidalgo · Jul 19, 2017

I think I did. I have the correct source list

Code:

cat /etc/apt/sources.list.d/ceph.list
deb http://download.proxmox.com/debian/ceph-luminous stretch main

But, what can I do to correct this error?

dcsapak · Jul 19, 2017

do you have any other files in /etc/apt/sources.list.d/ which reference ceph?
or do you have an entry in /etc/apt/sources.list ?

if yes remove it, and try to downgrade to our ceph version, but no guarantees this will work

hidalgo · Jul 19, 2017

Downgrade worked, but now I cannot reach the cluster at all.

hidalgo · Jul 20, 2017

I guess, I’m a big step ahead. Now I have ceph running, the cluster fsid matchs, but no osd.
I tried to add my osds to the crushmap but they don’t stay there. ceph -s still says no osd.
ceph-disk list gives me

Code:

/dev/sda :
 /dev/sda1 ceph data, active, cluster ceph, osd.2, journal /dev/nvme0n1p6
/dev/sdb :
 /dev/sdb1 ceph data, active, cluster ceph, osd.4, journal /dev/nvme0n1p8

Any further advice?

dcsapak · Jul 20, 2017

what does ceph status now say ?
and the following:

Code:

systemctl status ceph ceph-osd
ls /var/lib/ceph/osd/

hidalgo · Jul 20, 2017

dcsapak said:
what does ceph status now say ?
and the following:

Code:

systemctl status ceph ceph-osd ls /var/lib/ceph/osd/

Thank you very much for your time and advice, but I gave up. I set up a new cluster and I’m restoring now from my backup.
Very strange: after a few tests I found out, that my journal ssd is obsolete now. ;-)

hidalgo · Aug 1, 2017

I just set up a testing system with Proxmox 4.4 and upgraded from Jewel to Luminous https://pve.proxmox.com/wiki/Ceph_Jewel_to_Luminous and ran into the same situation like before.

Code:

Package versions
proxmox-ve: 4.4-92 (running kernel: 4.4.67-1-pve)
pve-manager: 4.4-15 (running version: 4.4-15/7599e35a)
pve-kernel-4.4.44-1-pve: 4.4.44-84
pve-kernel-4.4.67-1-pve: 4.4.67-92
pve-kernel-4.4.19-1-pve: 4.4.19-66
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-52
qemu-server: 4.0-110
pve-firmware: 1.1-11
libpve-common-perl: 4.0-95
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-101
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.9-pve15~bpo80
ceph: 12.1.1-1~bpo80+1

Is it possible that the Howto and or the upgrade path is wrong?

dcsapak · Aug 1, 2017

hidalgo said:
Is it possible that the Howto and or the upgrade path is wrong?

at the moment the luminous packages from ceph.com are one version above ours (12.1.1 vs 12.1.0) but an update to our repository is coming soon

A.J. Hart · Aug 6, 2017

Having the exact same problem after upgrading to ceph 12.1.1. Any sollution?

fabian · Aug 7, 2017

A.J. Hart said:
Having the exact same problem after upgrading to ceph 12.1.1. Any sollution?

yes - (re)start the ceph-mgr services or create new ceph-mgr instances via pveceph if you don't have any yet.

A.J. Hart · Aug 7, 2017

Worked! Thanks Fabian!

alain · Aug 11, 2017

I ran into the same problem. I am testing the migration from PVE 4.4 with ceph jewel to promox 5.0 on a test cluster.
So I firts migrated from ceph jewel to luminous following the documention, then migrated from jessie to stretch.

I ended with this ceph package, as reported above :

Code:

# pveversion -v
...
ceph: 12.1.2-1~bpo90+1

I looked at another cluster where I installed proxmox 5.0 beta, and the configuration for the repository is this one :

Code:

# cat /etc/apt/sources.list.d/ceph.list
deb http://download.proxmox.com/debian/ceph-luminous stretch main

As said in the release notes, ceph packages are now compiled directly by proxmox team.

The original repository configuration is the follwing :

Code:

# cat /etc/apt/sources.list.d/ceph.list
deb http://download.ceph.com/debian-luminous jessie main

So not only you have to change the repository from jessie to stretch, but you have to change the repository address from ceph.com to proxmox.com. This is this step that is not indicated in the proxmox 4.4 to 5.0 upgrade documentation.

The documentation for upgrading from ceph jewel to lumious is within 4.x, so keeping the packages from ceph.com.
The documentaion for upgrading from 4.x to 5.0 only says changing jessie to stretch, but does not says to change the repository from ceph.com to proxmox.com (/debian/ceph-luminous).

With this change I have now :

Code:

# pveversion -v
....
ceph: 12.1.2-pve1

which is I think the correct version.

Now, I have I think to create a new ceph-mgr, using 'pveceph createmgr which is now available.

P.S : I just noted that the change for the address of the ceph repository was in fact in the upgrade documentation. So my bad...

Replace ceph.com repositories with proxmox.com ceph repositories This step is only necessary if you have a ceph cluster on your PVE installation.

echo "deb http://download.proxmox.com/debian/ceph-luminous stretch main" > /etc/apt/sources.list.d/ceph.list

alain · Aug 11, 2017

For completion, I managd to deal with the warning 'no active mgr', by creating one with pveceph.

Code:

# pveceph createmgr
creating manager directory '/var/lib/ceph/mgr/ceph-prox-nest3'
creating keys for 'mgr.prox-nest3'
setting owner for directory
enabling service 'ceph-mgr@prox-nest3.service'
Created symlink /etc/systemd/system/ceph-mgr.target.wants/ceph-mgr@prox-nest3.service -> /lib/systemd/system/ceph-mgr@.service.
starting service 'ceph-mgr@prox-nest3.service'

Then :

Code:

# ceph -s
  cluster:
    id:     486f2cf4-ca81-46cd-8947-b7e0a6e8e47e
    health: HEALTH_WARN
            application not enabled on 1 pool(s)

  services:
    mon: 3 daemons, quorum 0,1,2
    mgr: prox-nest3(active)
    osd: 9 osds: 9 up, 9 in

  data:
    pools:   2 pools, 576 pgs
    objects: 1596 objects, 6208 MB
    usage:   18757 MB used, 881 GB / 899 GB avail
    pgs:     576 active+clean

I have the curious warning 'application not enabled on 1 pool(s)' but I don't know wihch application it is. There is a bug opened on ceph tracker :
http://tracker.ceph.com/issues/20891

Trouble with ceph on PVE 5

Renowned Member

Proxmox Staff Member

Renowned Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member

Renowned Member

Proxmox Staff Member

Renowned Member

Renowned Member

Proxmox Staff Member

New Member

Proxmox Staff Member

New Member

Renowned Member

Renowned Member

We value your privacy