after upgrade to PVE 5.0: unable to add osd

Waschbüsch

Renowned Member
Dec 15, 2014
93
8
73
Munich
Hi there,
after upgrading a test-cluster to PVE 5.5, everything worked fine including ceph.
However, if I try to add another osd, it does all the preparation but is unable to start the osd.

what is immediately obvious is this:
The partition layout is different than the old disks:

using gdisk for printing the layout on an old disk:

Number Start (sector) End (sector) Size Code Name
1 20973568 7811870686 3.6 TiB F800 ceph data
2 2048 20971520 10.0 GiB F802 ceph journal

the new disk will look like this:

Number Start (sector) End (sector) Size Code Name
1 2048 206847 100.0 MiB F804 ceph data
2 206848 7811870686 3.6 TiB FFFF ceph block

While the old setup would mount the data partition and have a symlink to the journal:
e.g. /var/lib/ceph/osd/ceph-0/journal is a symlink to /dev/disk/by-partuuid/<some-uuid>

the new layout mounts the journal (it seems) and links to the block storage:
e.g. /var/lib/ceph/osd/ceph-1/block is a symlink to /dev/disk/by-partuuid/<some-uuid>

systemctl status ceph-osd@1 is no help either:

ceph-osd@1.service - Ceph object storage daemon osd.1
Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/ceph-osd@.service.d
└─ceph-after-pve-cluster.conf
Active: activating (auto-restart) (Result: signal) since Tue 2017-07-04 23:53:57 UTC; 5s ago
Process: 10217 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id 1 --setuser ceph --setgroup ceph (code=killed, signal=SEGV)
Process: 10211 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id 1 (code=exited, status=0/SUCCESS)
Main PID: 10217 (code=killed, signal=SEGV)

Jul 04 23:53:57 srv03 systemd[1]: ceph-osd@1.service: Failed with result 'signal'.

Any ideas?
 
OK, there was still some mismatch with regards to packages. I had used luminous before but with packages from the ceph repo, not pve. After uninstalling with --purge and reinstalling using pveceph install, everything works now.