Ceph OSD stoped and out

Discussion in 'Proxmox VE: Installation and configuration' started by frantek, May 15, 2019.

  1. frantek

    frantek Member

    Joined:
    May 30, 2009
    Messages:
    154
    Likes Received:
    3
    Hi,

    I've a problem with one OSD in my Ceph cluster:

    Code:
    # ceph health detail
    HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent
    OSD_SCRUB_ERRORS 1 scrub errors
    PG_DAMAGED Possible data damage: 1 pg inconsistent
        pg 7.2fa is active+clean+inconsistent, acting [13,6,16]
    
    Code:
    # systemctl status ceph-osd@14.service
    ● ceph-osd@14.service - Ceph object storage daemon osd.14
       Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor preset: enabled)
      Drop-In: /lib/systemd/system/ceph-osd@.service.d
               └─ceph-after-pve-cluster.conf
       Active: failed (Result: signal) since Wed 2019-05-15 10:20:26 CEST; 6min ago
      Process: 3225166 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id 14 --setuser ceph --setgroup ceph (code=killed, signal=ABRT)
      Process: 3225161 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id 14 (code=exited, status=0/SUCCESS)
     Main PID: 3225166 (code=killed, signal=ABRT)
    
    Mai 15 10:20:26 pve03 systemd[1]: ceph-osd@14.service: Start request repeated too quickly.
    Mai 15 10:20:26 pve03 systemd[1]: Failed to start Ceph object storage daemon osd.14.
    Mai 15 10:20:26 pve03 systemd[1]: ceph-osd@14.service: Unit entered failed state.
    Mai 15 10:20:26 pve03 systemd[1]: ceph-osd@14.service: Failed with result 'signal'.
    Mai 15 10:23:20 pve03 systemd[1]: ceph-osd@14.service: Start request repeated too quickly.
    Mai 15 10:23:20 pve03 systemd[1]: Failed to start Ceph object storage daemon osd.14.
    Mai 15 10:23:20 pve03 systemd[1]: ceph-osd@14.service: Failed with result 'signal'.
    Mai 15 10:24:14 pve03 systemd[1]: ceph-osd@14.service: Start request repeated too quickly.
    Mai 15 10:24:14 pve03 systemd[1]: Failed to start Ceph object storage daemon osd.14.
    Mai 15 10:24:14 pve03 systemd[1]: ceph-osd@14.service: Failed with result 'signal'.
    
    # pveversion --verbose
    proxmox-ve: 5.4-1 (running kernel: 4.15.18-12-pve)
    pve-manager: 5.4-5 (running version: 5.4-5/c6fdb264)
    pve-kernel-4.15: 5.4-2
    pve-kernel-4.15.18-14-pve: 4.15.18-38
    pve-kernel-4.15.18-13-pve: 4.15.18-37
    pve-kernel-4.15.18-12-pve: 4.15.18-36
    pve-kernel-4.15.18-11-pve: 4.15.18-34
    pve-kernel-4.15.18-10-pve: 4.15.18-32
    ceph: 12.2.12-pve1
    corosync: 2.4.4-pve1
    criu: 2.11.1-1~bpo90
    glusterfs-client: 3.8.8-1
    ksm-control-daemon: 1.2-2
    libjs-extjs: 6.0.1-2
    libpve-access-control: 5.1-9
    libpve-apiclient-perl: 2.0-5
    libpve-common-perl: 5.0-51
    libpve-guest-common-perl: 2.0-20
    libpve-http-server-perl: 2.0-13
    libpve-storage-perl: 5.0-42
    libqb0: 1.0.3-1~bpo9
    lvm2: 2.02.168-pve6
    lxc-pve: 3.1.0-3
    lxcfs: 3.0.3-pve1
    novnc-pve: 1.0.0-3
    proxmox-widget-toolkit: 1.0-26
    pve-cluster: 5.0-37
    pve-container: 2.0-37
    pve-docs: 5.4-2
    pve-edk2-firmware: 1.20190312-1
    pve-firewall: 3.0-20
    pve-firmware: 2.0-6
    pve-ha-manager: 2.0-9
    pve-i18n: 1.1-4
    pve-libspice-server1: 0.14.1-2
    pve-qemu-kvm: 3.0.1-2
    pve-xtermjs: 3.12.0-1
    qemu-server: 5.0-51
    smartmontools: 6.5+svn4324-1
    spiceterm: 3.0-5
    vncterm: 1.5-3
    zfsutils-linux: 0.7.13-pve1~bpo2

    How to fix this?

    TIA
     
  2. adamb

    adamb Member
    Proxmox Subscriber

    Joined:
    Mar 1, 2012
    Messages:
    990
    Likes Received:
    23
  3. frantek

    frantek Member

    Joined:
    May 30, 2009
    Messages:
    154
    Likes Received:
    3
    Hmm, now scrubbing errors are gone by doing nothing. Now I get:

    Code:
    ~# ceph health detail
    HEALTH_WARN 1 osds down; 44423/801015 objects misplaced (5.546%)
    OSD_DOWN 1 osds down
        osd.14 (root=default,host=pve03) is down
    OBJECT_MISPLACED 44423/801015 objects misplaced (5.546%)
    
    Code:
    # systemctl status ceph-osd@14.service
    ● ceph-osd@14.service - Ceph object storage daemon osd.14
       Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor preset: enabled)
      Drop-In: /lib/systemd/system/ceph-osd@.service.d
               └─ceph-after-pve-cluster.conf
       Active: activating (auto-restart) (Result: signal) since Wed 2019-05-15 19:38:15 CEST; 7s ago
      Process: 3633324 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id 14 --setuser ceph --setgroup ceph (code=killed, signal=ABRT)
      Process: 3633319 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id 14 (code=exited, status=0/SUCCESS)
     Main PID: 3633324 (code=killed, signal=ABRT)
    
    Mai 15 19:38:15 pve03 systemd[1]: ceph-osd@14.service: Failed with result 'signal'.
    
    I can start the OSD but then it crashes with:

    --
    -- Unit pvesr.service has begun starting up.
    Mai 15 19:39:01 pve03 systemd[1]: Started Proxmox VE replication runner.
    -- Subject: Unit pvesr.service has finished start-up
    -- Defined-By: systemd
    -- Support: https://www.debian.org/support
    --
    -- Unit pvesr.service has finished starting up.
    --
    -- The start-up result is done.
    Mai 15 19:39:05 pve03 systemd[1]: ceph-osd@14.service: Service hold-off time over, scheduling restart.
    Mai 15 19:39:05 pve03 systemd[1]: Stopped Ceph object storage daemon osd.14.
    -- Subject: Unit ceph-osd@14.service has finished shutting down
    -- Defined-By: systemd
    -- Support: https://www.debian.org/support
    --
    -- Unit ceph-osd@14.service has finished shutting down.
    Mai 15 19:39:05 pve03 systemd[1]: Starting Ceph object storage daemon osd.14...
    -- Subject: Unit ceph-osd@14.service has begun start-up
    -- Defined-By: systemd
    -- Support: https://www.debian.org/support
    --
    -- Unit ceph-osd@14.service has begun starting up.
    Mai 15 19:39:05 pve03 systemd[1]: Started Ceph object storage daemon osd.14.
    -- Subject: Unit ceph-osd@14.service has finished start-up
    -- Defined-By: systemd
    -- Support: https://www.debian.org/support
    --
    -- Unit ceph-osd@14.service has finished starting up.
    --
    -- The start-up result is done.
    Mai 15 19:39:05 pve03 ceph-osd[3634089]: starting osd.14 at - osd_data /var/lib/ceph/osd/ceph-14 /var/lib/ceph/osd/ceph-14/journal
    Mai 15 19:39:08 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    Mai 15 19:39:08 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 Sense Key : Medium Error [current]
    Mai 15 19:39:08 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 Add. Sense: Unrecovered read error
    Mai 15 19:39:08 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 CDB: Read(10) 28 00 05 d8 4c 60 00 00 08 00
    Mai 15 19:39:08 pve03 kernel: print_req_error: critical medium error, dev sdg, sector 98061408
    Mai 15 19:39:09 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    Mai 15 19:39:09 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 Sense Key : Medium Error [current]
    Mai 15 19:39:09 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 Add. Sense: Unrecovered read error
    Mai 15 19:39:09 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 CDB: Read(10) 28 00 05 d8 4c 60 00 00 08 00
    Mai 15 19:39:09 pve03 kernel: print_req_error: critical medium error, dev sdg, sector 98061408
    Mai 15 19:39:12 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    Mai 15 19:39:12 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 Sense Key : Medium Error [current]
    Mai 15 19:39:12 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 Add. Sense: Unrecovered read error
    Mai 15 19:39:12 pve03 kernel: sd 4:0:17:0: [sdg] tag#1 CDB: Read(10) 28 00 05 d8 4c 60 00 00 08 00
    Mai 15 19:39:12 pve03 kernel: print_req_error: critical medium error, dev sdg, sector 98061408
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 2019-05-15 19:39:12.065699 7fccb4415e00 -1 filestore(/var/lib/ceph/osd/ceph-14) _write(3544): wri
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: /mnt/pve/store/tlamprecht/sources/ceph/ceph-12.2.12/src/os/filestore/FileStore.cc: In function 'v
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: /mnt/pve/store/tlamprecht/sources/ceph/ceph-12.2.12/src/os/filestore/FileStore.cc: 3185: FAILED a
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 2019-05-15 19:39:12.065753 7fccb4415e00 -1 filestore(/var/lib/ceph/osd/ceph-14) error (5) Input/
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous (stable)
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x563d3983f262]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHand
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 3: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 4: (JournalingObjectStore::journal_replay(unsigned long)+0xdda) [0x563d395fa2ea]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 5: (FileStore::mount()+0x48f8) [0x563d395e4d18]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 6: (OSD::init()+0x3e2) [0x563d3923e772]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 7: (main()+0x3092) [0x563d391481c2]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 8: (__libc_start_main()+0xf1) [0x7fccb09d22e1]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 9: (_start()+0x2a) [0x563d391d48ca]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 2019-05-15 19:39:12.069017 7fccb4415e00 -1 /mnt/pve/store/tlamprecht/sources/ceph/ceph-12.2.12/sr
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: /mnt/pve/store/tlamprecht/sources/ceph/ceph-12.2.12/src/os/filestore/FileStore.cc: 3185: FAILED a
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous (stable)
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x563d3983f262]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHand
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 3: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 4: (JournalingObjectStore::journal_replay(unsigned long)+0xdda) [0x563d395fa2ea]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 5: (FileStore::mount()+0x48f8) [0x563d395e4d18]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 6: (OSD::init()+0x3e2) [0x563d3923e772]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 7: (main()+0x3092) [0x563d391481c2]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 8: (__libc_start_main()+0xf1) [0x7fccb09d22e1]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 9: (_start()+0x2a) [0x563d391d48ca]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: -4> 2019-05-15 19:39:12.065699 7fccb4415e00 -1 filestore(/var/lib/ceph/osd/ceph-14) _write(35
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: -3> 2019-05-15 19:39:12.065753 7fccb4415e00 -1 filestore(/var/lib/ceph/osd/ceph-14) error (5
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 0> 2019-05-15 19:39:12.069017 7fccb4415e00 -1 /mnt/pve/store/tlamprecht/sources/ceph/ceph-12
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: /mnt/pve/store/tlamprecht/sources/ceph/ceph-12.2.12/src/os/filestore/FileStore.cc: 3185: FAILED a
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous (stable)
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x563d3983f262]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 2: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHand
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 3: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 4: (JournalingObjectStore::journal_replay(unsigned long)+0xdda) [0x563d395fa2ea]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 5: (FileStore::mount()+0x48f8) [0x563d395e4d18]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 6: (OSD::init()+0x3e2) [0x563d3923e772]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 7: (main()+0x3092) [0x563d391481c2]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 8: (__libc_start_main()+0xf1) [0x7fccb09d22e1]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 9: (_start()+0x2a) [0x563d391d48ca]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: *** Caught signal (Aborted) **
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: in thread 7fccb4415e00 thread_name:ceph-osd
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous (stable)
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 1: (()+0xa59c94) [0x563d397f6c94]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 2: (()+0x110e0) [0x7fccb1a1d0e0]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 3: (gsignal()+0xcf) [0x7fccb09e4fff]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 4: (abort()+0x16a) [0x7fccb09e642a]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x563d3983f3ee]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 6: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHand
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 7: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 8: (JournalingObjectStore::journal_replay(unsigned long)+0xdda) [0x563d395fa2ea]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 9: (FileStore::mount()+0x48f8) [0x563d395e4d18]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 10: (OSD::init()+0x3e2) [0x563d3923e772]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 11: (main()+0x3092) [0x563d391481c2]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 12: (__libc_start_main()+0xf1) [0x7fccb09d22e1]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 13: (_start()+0x2a) [0x563d391d48ca]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 2019-05-15 19:39:12.072802 7fccb4415e00 -1 *** Caught signal (Aborted) **
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: in thread 7fccb4415e00 thread_name:ceph-osd
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous (stable)
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 1: (()+0xa59c94) [0x563d397f6c94]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 2: (()+0x110e0) [0x7fccb1a1d0e0]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 3: (gsignal()+0xcf) [0x7fccb09e4fff]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 4: (abort()+0x16a) [0x7fccb09e642a]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x563d3983f3ee]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 6: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHand
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 7: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 8: (JournalingObjectStore::journal_replay(unsigned long)+0xdda) [0x563d395fa2ea]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 9: (FileStore::mount()+0x48f8) [0x563d395e4d18]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 10: (OSD::init()+0x3e2) [0x563d3923e772]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 11: (main()+0x3092) [0x563d391481c2]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 12: (__libc_start_main()+0xf1) [0x7fccb09d22e1]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 13: (_start()+0x2a) [0x563d391d48ca]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 0> 2019-05-15 19:39:12.072802 7fccb4415e00 -1 *** Caught signal (Aborted) **
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: in thread 7fccb4415e00 thread_name:ceph-osd
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous (stable)
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 1: (()+0xa59c94) [0x563d397f6c94]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 2: (()+0x110e0) [0x7fccb1a1d0e0]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 3: (gsignal()+0xcf) [0x7fccb09e4fff]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 4: (abort()+0x16a) [0x7fccb09e642a]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x563d3983f3ee]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 6: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, int, ThreadPool::TPHand
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 7: (FileStore::_do_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 8: (JournalingObjectStore::journal_replay(unsigned long)+0xdda) [0x563d395fa2ea]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 9: (FileStore::mount()+0x48f8) [0x563d395e4d18]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 10: (OSD::init()+0x3e2) [0x563d3923e772]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 11: (main()+0x3092) [0x563d391481c2]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 12: (__libc_start_main()+0xf1) [0x7fccb09d22e1]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: 13: (_start()+0x2a) [0x563d391d48ca]
    Mai 15 19:39:12 pve03 ceph-osd[3634089]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
    Mai 15 19:39:12 pve03 systemd[1]: ceph-osd@14.service: Main process exited, code=killed, status=6/ABRT
    Mai 15 19:39:12 pve03 systemd[1]: ceph-osd@14.service: Unit entered failed state.
    Mai 15 19:39:12 pve03 systemd[1]: ceph-osd@14.service: Failed with result 'signal'.

    Code:
    Mai 15 19:39:08 pve03 kernel: print_req_error: critical medium error, dev sdg, sector 98061408
    
    sounds like I should replace the disk ... right?

    Well, smartcl does not look good ...

    Code:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000f   091   091   006    Pre-fail  Always       -       181999400
      3 Spin_Up_Time            0x0003   098   098   000    Pre-fail  Always       -       0
      4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       15
      5 Reallocated_Sector_Ct   0x0033   097   097   036    Pre-fail  Always       -       584
      7 Seek_Error_Rate         0x000f   087   060   030    Pre-fail  Always       -       538350167
      9 Power_On_Hours          0x0032   076   076   000    Old_age   Always       -       21203 (23 243 0)
     10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
     12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       15
    184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
    187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       664
    188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
    189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
    190 Airflow_Temperature_Cel 0x0022   069   064   045    Old_age   Always       -       31 (Min/Max 29/33)
    191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       2
    192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       11
    193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       71
    194 Temperature_Celsius     0x0022   031   040   000    Old_age   Always       -       31 (0 25 0 0 0)
    197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       16
    198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       16
    199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
    240 Head_Flying_Hours       0x0000   076   076   000    Old_age   Offline      -       21202 (114 252 0)
    241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       48702137992
    242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       41224941657
    254 Free_Fall_Sensor        0x0032   100   100   000    Old_age   Always       -       0
    
     
  4. adamb

    adamb Member
    Proxmox Subscriber

    Joined:
    Mar 1, 2012
    Messages:
    990
    Likes Received:
    23

    I agree sounds like a bad disk, I am new to ceph as well, but trying to keep up with threads like this so I can learn from others experience. Let me know how you make out. Hopefully a new disk will get you up and going.
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice