Good morning guys!
I want to check my filesystem after one osd crashed but i'm not able to unmount it!
first i want to make sure the osd is stopped (is already stopped and out)
service ceph stop osd.22
that's not working for me, all osd's on that node now stopped! so i start the other 5 osd's again.
second i flush the ssd journal to disk
ceph-osd -i 22 --flush-journal
then i want to umount -f /dev/sde1
umount: /var/lib/ceph/osd/ceph-22: target is busy
(In some cases useful info about processes that
use the device is found by lsof(8) or fuser(1).)
lsof | grep /dev/sde1
lsof | grep ceph-22
fuser -m /dev/sde1
fuser -m /var/lib/ceph/osd/ceph-22/
hmmm. no running process on that but still not able to unmount! Crap.
i gave a shoot for a read only fsck, maybe i'm lucky:
xfs_repair -n /dev/sde1
but i'm not:
xfs_repair: /dev/sde1 contains a mounted and writable filesystem
fatal error -- couldn't initialize XFS library
So now i have 2 options,
1. dont check it and let it out
2. remove osd from crush, migrate all vm's and reboot and try to check it. not good for a production cluster. :-(
My question now: why we use a filesystem where u not able to check it on a running system? Is this a good choice? Why we not use ext4, its journaled, its "old" and its stable on millions of disc's and u can do a read only fs-check.
man fsck.xfs
NAME
fsck.xfs - do nothing, successfully
XFS is a journaling filesystem and performs recovery at mount(8) time if necessary, so fsck.xfs simply exits
with a zero exit status.
I want to check my filesystem after one osd crashed but i'm not able to unmount it!
first i want to make sure the osd is stopped (is already stopped and out)
service ceph stop osd.22
that's not working for me, all osd's on that node now stopped! so i start the other 5 osd's again.
second i flush the ssd journal to disk
ceph-osd -i 22 --flush-journal
then i want to umount -f /dev/sde1
umount: /var/lib/ceph/osd/ceph-22: target is busy
(In some cases useful info about processes that
use the device is found by lsof(8) or fuser(1).)
lsof | grep /dev/sde1
lsof | grep ceph-22
fuser -m /dev/sde1
fuser -m /var/lib/ceph/osd/ceph-22/
hmmm. no running process on that but still not able to unmount! Crap.
i gave a shoot for a read only fsck, maybe i'm lucky:
xfs_repair -n /dev/sde1
but i'm not:
xfs_repair: /dev/sde1 contains a mounted and writable filesystem
fatal error -- couldn't initialize XFS library
So now i have 2 options,
1. dont check it and let it out
2. remove osd from crush, migrate all vm's and reboot and try to check it. not good for a production cluster. :-(
My question now: why we use a filesystem where u not able to check it on a running system? Is this a good choice? Why we not use ext4, its journaled, its "old" and its stable on millions of disc's and u can do a read only fs-check.
man fsck.xfs
NAME
fsck.xfs - do nothing, successfully
XFS is a journaling filesystem and performs recovery at mount(8) time if necessary, so fsck.xfs simply exits
with a zero exit status.
Last edited: