XFS the right choice for CEPH?

proxtest

Active Member
Mar 19, 2014
108
0
36
Good morning guys!

I want to check my filesystem after one osd crashed but i'm not able to unmount it!

first i want to make sure the osd is stopped (is already stopped and out)
service ceph stop osd.22

that's not working for me, all osd's on that node now stopped! so i start the other 5 osd's again.

second i flush the ssd journal to disk
ceph-osd -i 22 --flush-journal

then i want to umount -f /dev/sde1

umount: /var/lib/ceph/osd/ceph-22: target is busy
(In some cases useful info about processes that
use the device is found by lsof(8) or fuser(1).)

lsof | grep /dev/sde1
lsof | grep ceph-22

fuser -m /dev/sde1
fuser -m /var/lib/ceph/osd/ceph-22/

hmmm. no running process on that but still not able to unmount! Crap.

i gave a shoot for a read only fsck, maybe i'm lucky:
xfs_repair -n /dev/sde1

but i'm not:
xfs_repair: /dev/sde1 contains a mounted and writable filesystem
fatal error -- couldn't initialize XFS library

So now i have 2 options,
1. dont check it and let it out
2. remove osd from crush, migrate all vm's and reboot and try to check it. not good for a production cluster. :-(

My question now: why we use a filesystem where u not able to check it on a running system? Is this a good choice? Why we not use ext4, its journaled, its "old" and its stable on millions of disc's and u can do a read only fs-check.

man fsck.xfs
NAME
fsck.xfs - do nothing, successfully
XFS is a journaling filesystem and performs recovery at mount(8) time if necessary, so fsck.xfs simply exits
with a zero exit status.

:)
 
Last edited:
Hi,

have a lock if the osd daemon is running.

systemctl status ceph-osd@22.service

xfs fits better to the access pattern from ceph, so it is faster but you can also use ext4.
 
OSD-Daemon crashed last week, so there was no id osd.22 running. It was stopped/out in Proxmox and ceph osd tree. And i dont find it in ps -ax | grep osd

Now to late for ext4, 30 disc's with SSD journal, i dont want to change but next time i build one.

Is 'service ceph stop osd.XX' the right command to stop only one OSD? Because if i try this it always stopps all OSD's on the node. :-( Dont know why.
 
ext4 is also deprecated by upstream: http://docs.ceph.com/docs/jewel/rados/configuration/filesystem-recommendations/
We currently recommend XFS for production deployments.
and
We recommend against using ext4 due to limitations in the size of xattrs it can store, and the problems this causes with the way Ceph handles long RADOS object names. Although these issues will generally not surface with Ceph clusters using only short object names (e.g., an RBD workload that does not include long RBD image names), other users like RGW make extensive use of long object names and can break.

Starting with the Jewel release, the ceph-osd daemon will refuse to start if the configured max object name cannot be safely stored on ext4.
 
:-(
Maybe i should set the replicas to 5 on my cluster....

How can i check the filesystem now? Umount don't work and online is not possible.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!