Cannot Start OSD

D

Deleted member 33567

Guest
Hi,

Recently my datancenter had some issues on VRack level affecting therefor all services depending on this. Also my cluster running proxmox was affected in special CEPH. I receive this error on one of the clusters:

Code:
root@node02-sxb-pve01:~# service ceph-osd@1 status
● ceph-osd@1.service - Ceph object storage daemon
   Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled)
  Drop-In: /lib/systemd/system/ceph-osd@.service.d
           └─ceph-after-pve-cluster.conf
   Active: activating (auto-restart) (Result: signal) since Thu 2017-11-16 22:43:49 CET; 1s ago
  Process: 32003 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph (code=killed, signal=ABRT)
  Process: 31955 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
 Main PID: 32003 (code=killed, signal=ABRT)

Nov 16 22:43:49 node02-sxb-pve01 systemd[1]: Unit ceph-osd@1.service entered failed state.
root@node02-sxb-pve01:~# journalctl -xe
Nov 16 22:43:49 node02-sxb-pve01 ceph-osd[32003]: 2017-11-16 22:43:49.677122 7f25435db800 -1 *** Caught signal (Aborted) **
Nov 16 22:43:49 node02-sxb-pve01 ceph-osd[32003]: in thread 7f25435db800 thread_name:ceph-osd
Nov 16 22:43:49 node02-sxb-pve01 ceph-osd[32003]: ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
Nov 16 22:43:49 node02-sxb-pve01 ceph-osd[32003]: 1: (()+0x961ee7) [0x5581f8b7fee7]
Nov 16 22:43:49 node02-sxb-pve01 ceph-osd[32003]: 2: (()+0xf890) [0x7f2542446890]
Nov 16 22:43:49 node02-sxb-pve01 ceph-osd[32003]: 3: (gsignal()+0x37) [0x7f254048d067]
Nov 16 22:43:49 node02-sxb-pve01 ceph-osd[32003]: 4: (abort()+0x148) [0x7f254048e448]
Nov 16 22:43:49 node02-sxb-pve01 ceph-osd[32003]: 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x256) [0x5581f8c85fe6]
Nov 16 22:43:49 node02-sxb-pve01 ceph-osd[32003]: 6: (OSDService::get_map(unsigned int)+0x3d) [0x5581f86006dd]
Nov 16 22:43:49 node02-sxb-pve01 ceph-osd[32003]: 7: (OSD::init()+0x1fa0) [0x5581f85af770]
Nov 16 22:43:49 node02-sxb-pve01 ceph-osd[32003]: 8: (main()+0x2a64) [0x5581f85147d4]
Nov 16 22:43:49 node02-sxb-pve01 ceph-osd[32003]: 9: (__libc_start_main()+0xf5) [0x7f2540479b45]
Nov 16 22:43:49 node02-sxb-pve01 ceph-osd[32003]: 10: (()+0x341317) [0x5581f855f317]
Nov 16 22:43:49 node02-sxb-pve01 ceph-osd[32003]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Nov 16 22:43:49 node02-sxb-pve01 ceph-osd[32003]: 0> 2017-11-16 22:43:49.677122 7f25435db800 -1 *** Caught signal (Aborted) **
Nov 16 22:43:49 node02-sxb-pve01 ceph-osd[32003]: in thread 7f25435db800 thread_name:ceph-osd
Nov 16 22:43:49 node02-sxb-pve01 ceph-osd[32003]: ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
Nov 16 22:43:49 node02-sxb-pve01 ceph-osd[32003]: 1: (()+0x961ee7) [0x5581f8b7fee7]
Nov 16 22:43:49 node02-sxb-pve01 ceph-osd[32003]: 2: (()+0xf890) [0x7f2542446890]
Nov 16 22:43:49 node02-sxb-pve01 ceph-osd[32003]: 3: (gsignal()+0x37) [0x7f254048d067]
Nov 16 22:43:49 node02-sxb-pve01 ceph-osd[32003]: 4: (abort()+0x148) [0x7f254048e448]
Nov 16 22:43:49 node02-sxb-pve01 ceph-osd[32003]: 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x256) [0x5581f8c85fe6]
Nov 16 22:43:49 node02-sxb-pve01 ceph-osd[32003]: 6: (OSDService::get_map(unsigned int)+0x3d) [0x5581f86006dd]
Nov 16 22:43:49 node02-sxb-pve01 ceph-osd[32003]: 7: (OSD::init()+0x1fa0) [0x5581f85af770]
Nov 16 22:43:49 node02-sxb-pve01 ceph-osd[32003]: 8: (main()+0x2a64) [0x5581f85147d4]
Nov 16 22:43:49 node02-sxb-pve01 ceph-osd[32003]: 9: (__libc_start_main()+0xf5) [0x7f2540479b45]
Nov 16 22:43:49 node02-sxb-pve01 ceph-osd[32003]: 10: (()+0x341317) [0x5581f855f317]
Nov 16 22:43:49 node02-sxb-pve01 ceph-osd[32003]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Nov 16 22:43:49 node02-sxb-pve01 systemd[1]: ceph-osd@1.service: main process exited, code=killed, status=6/ABRT
Nov 16 22:43:49 node02-sxb-pve01 systemd[1]: Unit ceph-osd@1.service entered failed state.
lines 1454-1483/1483 (END)

Here is the PVEPackages:
Code:
proxmox-ve: 4.4-96 (running kernel: 4.4.83-1-pve)
pve-manager: 4.4-18 (running version: 4.4-18/ef2610e8)
pve-kernel-4.4.79-1-pve: 4.4.79-95
pve-kernel-4.4.44-1-pve: 4.4.44-84
pve-kernel-4.4.67-1-pve: 4.4.67-92
pve-kernel-4.4.76-1-pve: 4.4.76-94
pve-kernel-4.4.83-1-pve: 4.4.83-96
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-53
qemu-server: 4.0-113
pve-firmware: 1.1-11
libpve-common-perl: 4.0-96
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.9.0-5~pve4
pve-container: 1.0-101
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
ceph: 10.2.10-1~bpo80+1

CEPH Status:
Code:
root@node02-sxb-pve01:~# ceph status
    cluster 277f2e22-86ed-4486-825e-a6cabd080bb0
     health HEALTH_WARN
            256 pgs degraded
            256 pgs stuck degraded
            256 pgs stuck unclean
            256 pgs stuck undersized
            256 pgs undersized
            recovery 167815/518544 objects degraded (32.363%)
     monmap e3: 3 mons at {0=172.18.1.1:6789/0,1=172.18.2.1:6789/0,2=172.18.3.1:6789/0}
            election epoch 130, quorum 0,1,2 0,1,2
     osdmap e253: 3 osds: 2 up, 2 in; 256 remapped pgs
            flags sortbitwise,require_jewel_osds
      pgmap v5982330: 256 pgs, 2 pools, 672 GB data, 168 kobjects
            1349 GB used, 2364 GB / 3713 GB avail
            167815/518544 objects degraded (32.363%)
                 256 active+undersized+degraded
  client io 819 B/s rd, 416 kB/s wr, 0 op/s rd, 48 op/s wr
root@node02-sxb-pve01:~# ceph osd tree
ID WEIGHT  TYPE NAME                 UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 5.43979 root default                                               
-2       0     host node03-sxb01                                       
-3       0     host node02                                             
-4       0     host node01                                             
-5 1.81360     host node03-sxb-pve01                                   
 0 1.81360         osd.0                  up  1.00000          1.00000
-6 1.81310     host node02-sxb-pve01                                   
 1 1.81310         osd.1                down        0          1.00000
-7 1.81310     host node01-sxb-pve01                                   
 2 1.81310         osd.2                  up  1.00000          1.00000
root@node02-sxb-pve01:~#
 
Last edited by a moderator:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!