disk error or something else

Gastondc · Aug 25, 2021

Dear,

How can I interpret this error in the dmesg. I have implemented a CEPH sotrage with 3 nodes. and about 24 disks each node.

I have had different disks on different nodes fail in the same way. I may have a bad batch of disks, but I suspect some other problem or other.

pveversion
pve-manager/6.4-11/28d576c2 (running kernel: 5.4.124-1-pve)

ceph -v
ceph version 15.2.13 (1f5c7871ec0e36ade641773b9b05b6211c308b9d) octopus (stable)

Code:

[Tue Aug 24 23:01:53 2021] vmbr0: port 28(fwpr715p0) entered disabled state
[Tue Aug 24 23:19:04 2021] EXT4-fs (rbd4): mounted filesystem with ordered data mode. Opts: (null)
[Tue Aug 24 23:19:04 2021] EXT4-fs (rbd5): mounted filesystem with ordered data mode. Opts: (null)
[Tue Aug 24 23:19:04 2021] EXT4-fs (rbd6): mounted filesystem with ordered data mode. Opts: (null)
[Tue Aug 24 23:59:56 2021] audit: type=1400 audit(1629860431.232:110): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-103_</var/lib/lxc>" name="/run/systemd/unit-root/" pid=3273376 comm="(ogrotate)" srcname="/" flags="rw, rbind"
[Wed Aug 25 01:59:29 2021] rbd: rbd7: capacity 8589934592 features 0x3d
[Wed Aug 25 01:59:29 2021] EXT4-fs (rbd7): write access unavailable, skipping orphan cleanup
[Wed Aug 25 01:59:29 2021] EXT4-fs (rbd7): mounted filesystem without journal. Opts: noload
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:14 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:14 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:14 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:02:07 2021] rbd: rbd7: capacity 34359738368 features 0x3d
[Wed Aug 25 02:02:08 2021] EXT4-fs (rbd7): mounted filesystem without journal. Opts: noload
[Wed Aug 25 05:58:23 2021] blk_update_request: I/O error, dev nvme10n1, sector 3002655384 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[Wed Aug 25 05:58:23 2021] Buffer I/O error on dev dm-11, logical block 375327571, lost async page write
[Wed Aug 25 05:58:23 2021] blk_update_request: I/O error, dev nvme10n1, sector 3002655392 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0
[Wed Aug 25 05:58:23 2021] Buffer I/O error on dev dm-11, logical block 375327572, lost async page write
[Wed Aug 25 05:58:23 2021] Buffer I/O error on dev dm-11, logical block 375327573, lost async page write
[Wed Aug 25 05:58:23 2021] libceph: osd23 (1)172.16.250.4:6806 socket closed (con state OPEN)
[Wed Aug 25 05:58:24 2021] libceph: osd23 (1)172.16.250.4:6806 socket closed (con state CONNECTING)
[Wed Aug 25 05:58:24 2021] libceph: osd23 down
[Wed Aug 25 05:58:56 2021] libceph: osd23 up

Code:

{
    "archived": "2021-08-25 15:47:43.902779",
    "assert_condition": "abort",
    "assert_file": "/build/ceph/ceph-15.2.13/src/os/bluestore/KernelDevice.cc",
    "assert_func": "virtual int KernelDevice::flush()",
    "assert_line": 435,
    "assert_msg": "/build/ceph/ceph-15.2.13/src/os/bluestore/KernelDevice.cc: In function 'virtual int KernelDevice::flush()' thread 7f79c1a4d700 time 2021-08-25T05:58:58.717456-0300\n/build/ceph/ceph-15.2.13/src/os/bluestore/KernelDevice.cc: 435: ceph_abort_msg(\"abort() called\")\n",
    "assert_thread_name": "bstore_kv_sync",
    "backtrace": [
        "(()+0x12730) [0x7f79d27b1730]",
        "(gsignal()+0x10b) [0x7f79d22927bb]",
        "(abort()+0x121) [0x7f79d227d535]",
        "(ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1b2) [0x559a52e3b243]",
        "(KernelDevice::flush()+0x589) [0x559a53499d19]",
        "(BlueStore::_kv_sync_thread()+0xf9e) [0x559a533beb1e]",
        "(BlueStore::KVSyncThread::entry()+0xd) [0x559a533e76ad]",
        "(()+0x7fa3) [0x7f79d27a6fa3]",
        "(clone()+0x3f) [0x7f79d23544cf]"
    ],
    "ceph_version": "15.2.13",
    "crash_id": "2021-08-25T08:58:58.723172Z_8221aeXXXXXXX378e1e825f1f",
    "entity_name": "osd.23",
    "os_id": "10",
    "os_name": "Debian GNU/Linux 10 (buster)",
    "os_version": "10 (buster)",
    "os_version_id": "10",
    "process_name": "ceph-osd",
    "stack_sig": "53cb40b4b466f57271XXXXXXXXXb25d50703c19492cb247aaad8ccd01",
    "timestamp": "2021-08-25T08:58:58.723172Z",
    "utsname_hostname": "pve2",
    "utsname_machine": "x86_64",
    "utsname_release": "5.4.124-1-pve",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP PVE 5.4.124-1 (Wed, 23 Jun 2021 13:47:09 +0200)"
}

Could it be indeed disk or could it be something else? some networking problem?

I had this same error with other disks randomly. I get the impression that it is not a problem with the disks, but something else.

Search

Search

disk error or something else

Gastondc

Well-Known Member

We value your privacy