[SOLVED] Ceph: Move OSD to new root

Fladi · Jan 12, 2017

Hi there,

I'm trying to set up a seperated pool with SSD storage within my Ceph. I followed the instructions here (https://elkano.org/blog/ceph-sata-ssd-pools-server-editing-crushmap/), while using pveceph whereever possible.

It works in a way that everything was set up and showed fine in the gui. Until the next reboot. After this the moved OSD will not start (marked as down/out).

Tryiing to start manually (with pveceph or via /etc/init.d/ceph start osd.12

Code:

$ pveceph start osd.12
Job for ceph-osd@12.service failed. See 'systemctl status ceph-osd@12.service' and 'journalctl -xn' for details.
command '/bin/systemctl start ceph-osd@12' failed: exit code 1

Output of systemctl:

Code:

$ systemctl status ceph-osd@12.service
● ceph-osd@12.service - Ceph object storage daemon
   Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled)
  Drop-In: /lib/systemd/system/ceph-osd@.service.d
           └─ceph-after-pve-cluster.conf
   Active: failed (Result: start-limit) since Thu 2017-01-12 11:13:32 CET; 1min 39s ago
  Process: 4717 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=killed, signal=ABRT)
  Process: 4657 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
Main PID: 4717 (code=killed, signal=ABRT)

Jan 12 11:13:32 ceph001 systemd[1]: Unit ceph-osd@12.service entered failed state.
Jan 12 11:13:32 ceph001 systemd[1]: ceph-osd@12.service holdoff time over, scheduling restart.
Jan 12 11:13:32 ceph001 systemd[1]: Stopping Ceph object storage daemon...
Jan 12 11:13:32 ceph001 systemd[1]: Starting Ceph object storage daemon...
Jan 12 11:13:32 ceph001 systemd[1]: ceph-osd@12.service start request repeated too quickly, refusing to start.
Jan 12 11:13:32 ceph001 systemd[1]: Failed to start Ceph object storage daemon.
Jan 12 11:13:32 ceph001 systemd[1]: Unit ceph-osd@12.service entered failed state.
Jan 12 11:14:18 ceph001 systemd[1]: Starting Ceph object storage daemon...
Jan 12 11:14:18 ceph001 systemd[1]: ceph-osd@12.service start request repeated too quickly, refusing to start.
Jan 12 11:14:18 ceph001 systemd[1]: Failed to start Ceph object storage daemon.

Output of journalctl

Code:

$ journalctl -xn
-- Logs begin at Thu 2017-01-12 07:39:40 CET, end at Thu 2017-01-12 11:17:01 CET. --
Jan 12 11:13:32 ceph001 systemd[1]: Unit ceph-osd@12.service entered failed state.
Jan 12 11:14:18 ceph001 pveceph[5145]: <root@pam> starting task UPID:ceph001:0000141A:0013A91F:587756FA:srvstart:osd.12:root@pam:
Jan 12 11:14:18 ceph001 systemd[1]: Starting Ceph object storage daemon...
-- Subject: Unit ceph-osd@12.service has begun with start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ceph-osd@12.service has begun starting up.
Jan 12 11:14:18 ceph001 systemd[1]: ceph-osd@12.service start request repeated too quickly, refusing to start.
Jan 12 11:14:18 ceph001 systemd[1]: Failed to start Ceph object storage daemon.
-- Subject: Unit ceph-osd@12.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ceph-osd@12.service has failed.
--
-- The result is failed.
Jan 12 11:14:18 ceph001 pveceph[5146]: command '/bin/systemctl start ceph-osd@12' failed: exit code 1
Jan 12 11:14:18 ceph001 pveceph[5145]: <root@pam> end task UPID:ceph001:0000141A:0013A91F:587756FA:srvstart:osd.12:root@pam: command '/bin/systemctl start ceph-osd@12' failed: exit code 1
Jan 12 11:17:01 ceph001 CRON[6067]: pam_unix(cron:session): session opened for user root by (uid=0)
Jan 12 11:17:01 ceph001 CRON[6068]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jan 12 11:17:01 ceph001 CRON[6067]: pam_unix(cron:session): session closed for user root

When I delete the entry in ceph.conf an set the osd back to the default-root via

Code:

$ ceph osd crush set osd.12 0 root=default host=ceph001

I'm able to start the OSD and it starts as well after reboot.

It's a brand new installation with up-to-date proxmox and the jewel-version.

Any idea and any further information needed?

spirit · Jan 12, 2017

I think you need to specify the crush location in ceph.conf for each osd, to get them autostart in correct crusmap

https://elkano.org/blog/ceph-sata-ssd-pools-server-editing-crushmap/

Code:

[osd.35]
host = ceph-node1
osd_journal = /dev/disk/by-id/ata-INTEL_SSDSC2BB016T6_BTWA543204R11P6KGN-part1
crush_location = root=ssds host=ceph-node1-ssd

Fladi · Jan 12, 2017

I do have this set.

Fladi · Jan 12, 2017

I solved this.

I was missing

Code:

[osd]
osd crush update on start = false

in ceph.conf. Now it seems to run even without the crush_location-entry.

Search

Search

[SOLVED] Ceph: Move OSD to new root

Fladi

Renowned Member

spirit

Distinguished Member

Fladi

Renowned Member

Fladi

Renowned Member

We value your privacy