[SOLVED] Ceph: Move OSD to new root

Fladi

Renowned Member
Feb 27, 2015
31
9
73
Hi there,

I'm trying to set up a seperated pool with SSD storage within my Ceph. I followed the instructions here (https://elkano.org/blog/ceph-sata-ssd-pools-server-editing-crushmap/), while using pveceph whereever possible.

It works in a way that everything was set up and showed fine in the gui. Until the next reboot. After this the moved OSD will not start (marked as down/out).

Tryiing to start manually (with pveceph or via /etc/init.d/ceph start osd.12
Code:
$ pveceph start osd.12
Job for ceph-osd@12.service failed. See 'systemctl status ceph-osd@12.service' and 'journalctl -xn' for details.
command '/bin/systemctl start ceph-osd@12' failed: exit code 1

Output of systemctl:

Code:
$ systemctl status ceph-osd@12.service
● ceph-osd@12.service - Ceph object storage daemon
   Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled)
  Drop-In: /lib/systemd/system/ceph-osd@.service.d
           └─ceph-after-pve-cluster.conf
   Active: failed (Result: start-limit) since Thu 2017-01-12 11:13:32 CET; 1min 39s ago
  Process: 4717 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=killed, signal=ABRT)
  Process: 4657 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
Main PID: 4717 (code=killed, signal=ABRT)

Jan 12 11:13:32 ceph001 systemd[1]: Unit ceph-osd@12.service entered failed state.
Jan 12 11:13:32 ceph001 systemd[1]: ceph-osd@12.service holdoff time over, scheduling restart.
Jan 12 11:13:32 ceph001 systemd[1]: Stopping Ceph object storage daemon...
Jan 12 11:13:32 ceph001 systemd[1]: Starting Ceph object storage daemon...
Jan 12 11:13:32 ceph001 systemd[1]: ceph-osd@12.service start request repeated too quickly, refusing to start.
Jan 12 11:13:32 ceph001 systemd[1]: Failed to start Ceph object storage daemon.
Jan 12 11:13:32 ceph001 systemd[1]: Unit ceph-osd@12.service entered failed state.
Jan 12 11:14:18 ceph001 systemd[1]: Starting Ceph object storage daemon...
Jan 12 11:14:18 ceph001 systemd[1]: ceph-osd@12.service start request repeated too quickly, refusing to start.
Jan 12 11:14:18 ceph001 systemd[1]: Failed to start Ceph object storage daemon.

Output of journalctl

Code:
$ journalctl -xn
-- Logs begin at Thu 2017-01-12 07:39:40 CET, end at Thu 2017-01-12 11:17:01 CET. --
Jan 12 11:13:32 ceph001 systemd[1]: Unit ceph-osd@12.service entered failed state.
Jan 12 11:14:18 ceph001 pveceph[5145]: <root@pam> starting task UPID:ceph001:0000141A:0013A91F:587756FA:srvstart:osd.12:root@pam:
Jan 12 11:14:18 ceph001 systemd[1]: Starting Ceph object storage daemon...
-- Subject: Unit ceph-osd@12.service has begun with start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ceph-osd@12.service has begun starting up.
Jan 12 11:14:18 ceph001 systemd[1]: ceph-osd@12.service start request repeated too quickly, refusing to start.
Jan 12 11:14:18 ceph001 systemd[1]: Failed to start Ceph object storage daemon.
-- Subject: Unit ceph-osd@12.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit ceph-osd@12.service has failed.
--
-- The result is failed.
Jan 12 11:14:18 ceph001 pveceph[5146]: command '/bin/systemctl start ceph-osd@12' failed: exit code 1
Jan 12 11:14:18 ceph001 pveceph[5145]: <root@pam> end task UPID:ceph001:0000141A:0013A91F:587756FA:srvstart:osd.12:root@pam: command '/bin/systemctl start ceph-osd@12' failed: exit code 1
Jan 12 11:17:01 ceph001 CRON[6067]: pam_unix(cron:session): session opened for user root by (uid=0)
Jan 12 11:17:01 ceph001 CRON[6068]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jan 12 11:17:01 ceph001 CRON[6067]: pam_unix(cron:session): session closed for user root

When I delete the entry in ceph.conf an set the osd back to the default-root via

Code:
$ ceph osd crush set osd.12 0 root=default host=ceph001

I'm able to start the OSD and it starts as well after reboot.

It's a brand new installation with up-to-date proxmox and the jewel-version.

Any idea and any further information needed?
 
I think you need to specify the crush location in ceph.conf for each osd, to get them autostart in correct crusmap

https://elkano.org/blog/ceph-sata-ssd-pools-server-editing-crushmap/

Code:
[osd.35]
host = ceph-node1
osd_journal = /dev/disk/by-id/ata-INTEL_SSDSC2BB016T6_BTWA543204R11P6KGN-part1
crush_location = root=ssds host=ceph-node1-ssd
 
  • Like
Reactions: Fladi
I solved this.

I was missing
Code:
[osd]
osd crush update on start = false

in ceph.conf. Now it seems to run even without the crush_location-entry.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!