[SOLVED] New OSD doesn't come up

lifeboy

Renowned Member
I added a new OSD to a new node (pmx 4.4 cluster), but for some reasons the OSD doesn't complete the startup. Please check the log below:

Code:
2018-08-16 06:57:05.321371 7f5ffc9cc880  0 ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af), process ceph-osd, pid 1679
2018-08-16 06:57:05.367900 7f5ffc9cc880  0 filestore(/var/lib/ceph/osd/ceph-19) backend xfs (magic 0x58465342)
2018-08-16 06:57:05.372291 7f5ffc9cc880  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP ioctl is supported and appears to work
2018-08-16 06:57:05.372391 7f5ffc9cc880  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2018-08-16 06:57:05.408876 7f5ffc9cc880  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2018-08-16 06:57:05.409010 7f5ffc9cc880  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_feature: extsize is disabled by conf
2018-08-16 06:57:05.538477 7f5ffc9cc880  0 filestore(/var/lib/ceph/osd/ceph-19) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2018-08-16 06:57:05.545377 7f5ffc9cc880  1 journal _open /var/lib/ceph/osd/ceph-19/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2018-08-16 06:57:05.664349 7f5ffc9cc880  1 journal _open /var/lib/ceph/osd/ceph-19/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2018-08-16 06:57:05.727383 7f5ffc9cc880  0 <cls> cls/hello/cls_hello.cc:271: loading cls_hello
2018-08-16 06:57:05.762939 7f5ffc9cc880  0 osd.19 20157 crush map has features 2200130813952, adjusting msgr requires for clients
2018-08-16 06:57:05.763046 7f5ffc9cc880  0 osd.19 20157 crush map has features 2200130813952 was 8705, adjusting msgr requires for mons
2018-08-16 06:57:05.763091 7f5ffc9cc880  0 osd.19 20157 crush map has features 2200130813952, adjusting msgr requires for osds
2018-08-16 06:57:05.763554 7f5ffc9cc880  0 osd.19 20157 load_pgs
2018-08-16 06:57:05.763666 7f5ffc9cc880  0 osd.19 20157 load_pgs opened 0 pgs
2018-08-16 06:57:05.766378 7f5ffc9cc880 -1 osd.19 20157 log_to_monitors {default=true}
2018-08-16 06:57:05.768155 7f5ffc9c5700  0 -- 192.168.121.35:6800/1679 >> 192.168.121.35:6789/0 pipe(0x4b0a000 sd=23 :0 s=1 pgs=0 cs=0 l=1 c=0x497ac60).fault
2018-08-16 06:57:08.774479 7f5feadc8700  0 osd.19 20157 ignoring osdmap until we have initialized
2018-08-16 06:57:08.774743 7f5feadc8700  0 osd.19 20157 ignoring osdmap until we have initialized
2018-08-16 06:57:08.775095 7f5ffc9cc880  0 osd.19 20157 done with init, starting boot process
2018-08-16 12:12:01.123995 7f5ffc9c5700  0 -- 192.168.121.35:6800/1679 >> 192.168.121.32:6789/0 pipe(0x4b0f000 sd=32 :0 s=1 pgs=0 cs=0 l=1 c=0x497b4a0).fault
2018-08-16 12:12:34.145809 7f5feadc8700  0 log_channel(cluster) log [WRN] : failed to encode map e20158 with expected crc
(I added more lines from /var/log/ceph/ceph-osd.19.log as they appeared)

There is another node that I also added to this proxmox cluster, but it's doesn't have an OSD added yet. So what could be the cause of the pipe fault and does this prevent the OSD from starting?

(Apologies for initially posting on the wrong forum by accident!)
 
Last edited:
Futher to this problem (which I now have 2 new nodes already):

Code:
service ceph-osd@19 status
● ceph-osd@19.service
   Loaded: not-found (Reason: No such file or directory)
   Active: inactive (dead)

Warning: Unit file changed on disk, 'systemctl daemon-reload' recommended.

Which directory is not found? I added these OSD's via die GUI...
 
Futher to this problem (which I now have 2 new nodes already):

Code:
service ceph-osd@19 status
● ceph-osd@19.service
   Loaded: not-found (Reason: No such file or directory)
   Active: inactive (dead)

Warning: Unit file changed on disk, 'systemctl daemon-reload' recommended.

Which directory is not found? I added these OSD's via die GUI...

Ah, only since infernalis is systemd supported.

Code:
/etc/init.d/ceph start osd.19
[ ok ] Starting ceph (via systemctl): ceph.service.

Nothing gets written to /var/log/ceph/ceph-osd.19.log when this command is executed. That seems strange to me.
 
Ah, only since infernalis is systemd supported.

Code:
/etc/init.d/ceph start osd.19
[ ok ] Starting ceph (via systemctl): ceph.service.

Nothing gets written to /var/log/ceph/ceph-osd.19.log when this command is executed. That seems strange to me.

However, if I stop and start osd.19, the same stuff gets logged:

Code:
2018-08-16 13:13:07.775413 7f5fd74a0700 -1 osd.19 20215 *** Got signal Terminated ***
2018-08-16 13:13:07.775483 7f5fd74a0700  0 osd.19 20215 prepare_to_stop starting shutdown
2018-08-16 13:13:07.775490 7f5fd74a0700 -1 osd.19 20215 shutdown
2018-08-16 13:13:07.776344 7f5fd74a0700 10 osd.19 20215 recovery tp stopped
2018-08-16 13:13:07.776544 7f5fd74a0700 10 osd.19 20215 osd tp stopped
2018-08-16 13:13:08.573477 7f5ff271b700  5 osd.19 20215 tick
2018-08-16 13:13:08.573512 7f5ff271b700 10 osd.19 20215 do_waiters -- start
2018-08-16 13:13:08.573518 7f5ff271b700 10 osd.19 20215 do_waiters -- finish
2018-08-16 13:13:16.158585 7fe5beb9f880  0 ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af), process ceph-osd, pid 24841
2018-08-16 13:13:16.180349 7fe5beb9f880  0 filestore(/var/lib/ceph/osd/ceph-19) backend xfs (magic 0x58465342)
2018-08-16 13:13:16.184127 7fe5beb9f880  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP ioctl is supported and appears to work
2018-08-16 13:13:16.184240 7fe5beb9f880  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2018-08-16 13:13:16.232686 7fe5beb9f880  0 genericfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2018-08-16 13:13:16.232885 7fe5beb9f880  0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-19) detect_feature: extsize is disabled by conf
2018-08-16 13:13:16.291676 7fe5beb9f880  0 filestore(/var/lib/ceph/osd/ceph-19) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2018-08-16 13:13:16.297324 7fe5beb9f880  1 journal _open /var/lib/ceph/osd/ceph-19/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2018-08-16 13:13:16.404234 7fe5beb9f880  1 journal _open /var/lib/ceph/osd/ceph-19/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1
2018-08-16 13:13:16.425098 7fe5beb9f880  0 <cls> cls/hello/cls_hello.cc:271: loading cls_hello
2018-08-16 13:13:16.425542 7fe5beb9f880  0 osd.19 20215 crush map has features 2200130813952, adjusting msgr requires for clients
2018-08-16 13:13:16.425555 7fe5beb9f880  0 osd.19 20215 crush map has features 2200130813952 was 8705, adjusting msgr requires for mons
2018-08-16 13:13:16.425566 7fe5beb9f880  0 osd.19 20215 crush map has features 2200130813952, adjusting msgr requires for osds
2018-08-16 13:13:16.425587 7fe5beb9f880  0 osd.19 20215 load_pgs
2018-08-16 13:13:16.425642 7fe5beb9f880  0 osd.19 20215 load_pgs opened 0 pgs
2018-08-16 13:13:16.426815 7fe5beb9f880 -1 osd.19 20215 log_to_monitors {default=true}
2018-08-16 13:13:16.434311 7fe5beb98700  0 -- 192.168.121.35:6800/24841 >> 192.168.121.35:6789/0 pipe(0x580e000 sd=23 :0 s=1 pgs=0 cs=0 l=1 c=0x567ec60).fault
2018-08-16 13:13:19.431564 7fe5acf9b700  0 osd.19 20215 ignoring osdmap until we have initialized
2018-08-16 13:13:19.433081 7fe5acf9b700  0 osd.19 20215 ignoring osdmap until we have initialized
2018-08-16 13:13:19.433406 7fe5beb9f880  0 osd.19 20215 done with init, starting boot process

... and osd.19 doesn't start.
 
After much searching, I investigated the only lead left to investigate: "log_channel(cluster) log [WRN] : failed to encode map e20158 with expected crc". Turns out this is in reference to a version error?! Ok, I know developers don't always mean to inform users of the finer details of what goes wrong by a detailed error message, but this is next level. LOL!

The newly installed nodes were still running hammer, and the cluster is on jewel. Upgraded, recreated the osd and of course it starts!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!