XFS the right choice for CEPH?

proxtest · Oct 6, 2016

Good morning guys!

I want to check my filesystem after one osd crashed but i'm not able to unmount it!

first i want to make sure the osd is stopped (is already stopped and out)
service ceph stop osd.22

that's not working for me, all osd's on that node now stopped! so i start the other 5 osd's again.

second i flush the ssd journal to disk
ceph-osd -i 22 --flush-journal

then i want to umount -f /dev/sde1

umount: /var/lib/ceph/osd/ceph-22: target is busy
(In some cases useful info about processes that
use the device is found by lsof(8) or fuser(1).)

lsof | grep /dev/sde1
lsof | grep ceph-22

fuser -m /dev/sde1
fuser -m /var/lib/ceph/osd/ceph-22/

hmmm. no running process on that but still not able to unmount! Crap.

i gave a shoot for a read only fsck, maybe i'm lucky:
xfs_repair -n /dev/sde1

but i'm not:
xfs_repair: /dev/sde1 contains a mounted and writable filesystem
fatal error -- couldn't initialize XFS library

So now i have 2 options,
1. dont check it and let it out
2. remove osd from crush, migrate all vm's and reboot and try to check it. not good for a production cluster. :-(

My question now: why we use a filesystem where u not able to check it on a running system? Is this a good choice? Why we not use ext4, its journaled, its "old" and its stable on millions of disc's and u can do a read only fs-check.

man fsck.xfs
NAME
fsck.xfs - do nothing, successfully
XFS is a journaling filesystem and performs recovery at mount(8) time if necessary, so fsck.xfs simply exits
with a zero exit status.

wolfgang · Oct 6, 2016

Hi,

have a lock if the osd daemon is running.

systemctl status ceph-osd@22.service

xfs fits better to the access pattern from ceph, so it is faster but you can also use ext4.

proxtest · Oct 6, 2016

OSD-Daemon crashed last week, so there was no id osd.22 running. It was stopped/out in Proxmox and ceph osd tree. And i dont find it in ps -ax | grep osd

Now to late for ext4, 30 disc's with SSD journal, i dont want to change but next time i build one.

Is 'service ceph stop osd.XX' the right command to stop only one OSD? Because if i try this it always stopps all OSD's on the node. :-( Dont know why.

fabian · Oct 6, 2016

ext4 is also deprecated by upstream: http://docs.ceph.com/docs/jewel/rados/configuration/filesystem-recommendations/

We currently recommend XFS for production deployments.

and

We recommend against using ext4 due to limitations in the size of xattrs it can store, and the problems this causes with the way Ceph handles long RADOS object names. Although these issues will generally not surface with Ceph clusters using only short object names (e.g., an RBD workload that does not include long RBD image names), other users like RGW make extensive use of long object names and can break.

Starting with the Jewel release, the ceph-osd daemon will refuse to start if the configured max object name cannot be safely stored on ext4.

proxtest · Oct 7, 2016

:-(
Maybe i should set the replicas to 5 on my cluster....

How can i check the filesystem now? Umount don't work and online is not possible.

Search

Search

XFS the right choice for CEPH?

proxtest

Active Member

wolfgang

Proxmox Retired Staff

proxtest

Active Member

fabian

Proxmox Staff Member

proxtest

Active Member