disk error or something else

Gastondc

Well-Known Member
Aug 3, 2017
33
0
46
40
Dear,

How can I interpret this error in the dmesg. I have implemented a CEPH sotrage with 3 nodes. and about 24 disks each node.

I have had different disks on different nodes fail in the same way. I may have a bad batch of disks, but I suspect some other problem or other.

pveversion
pve-manager/6.4-11/28d576c2 (running kernel: 5.4.124-1-pve)

ceph -v
ceph version 15.2.13 (1f5c7871ec0e36ade641773b9b05b6211c308b9d) octopus (stable)


Code:
[Tue Aug 24 23:01:53 2021] vmbr0: port 28(fwpr715p0) entered disabled state
[Tue Aug 24 23:19:04 2021] EXT4-fs (rbd4): mounted filesystem with ordered data mode. Opts: (null)
[Tue Aug 24 23:19:04 2021] EXT4-fs (rbd5): mounted filesystem with ordered data mode. Opts: (null)
[Tue Aug 24 23:19:04 2021] EXT4-fs (rbd6): mounted filesystem with ordered data mode. Opts: (null)
[Tue Aug 24 23:59:56 2021] audit: type=1400 audit(1629860431.232:110): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-103_</var/lib/lxc>" name="/run/systemd/unit-root/" pid=3273376 comm="(ogrotate)" srcname="/" flags="rw, rbind"
[Wed Aug 25 01:59:29 2021] rbd: rbd7: capacity 8589934592 features 0x3d
[Wed Aug 25 01:59:29 2021] EXT4-fs (rbd7): write access unavailable, skipping orphan cleanup
[Wed Aug 25 01:59:29 2021] EXT4-fs (rbd7): mounted filesystem without journal. Opts: noload
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:14 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:14 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:14 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:02:07 2021] rbd: rbd7: capacity 34359738368 features 0x3d
[Wed Aug 25 02:02:08 2021] EXT4-fs (rbd7): mounted filesystem without journal. Opts: noload
[Wed Aug 25 05:58:23 2021] blk_update_request: I/O error, dev nvme10n1, sector 3002655384 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[Wed Aug 25 05:58:23 2021] Buffer I/O error on dev dm-11, logical block 375327571, lost async page write
[Wed Aug 25 05:58:23 2021] blk_update_request: I/O error, dev nvme10n1, sector 3002655392 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0
[Wed Aug 25 05:58:23 2021] Buffer I/O error on dev dm-11, logical block 375327572, lost async page write
[Wed Aug 25 05:58:23 2021] Buffer I/O error on dev dm-11, logical block 375327573, lost async page write
[Wed Aug 25 05:58:23 2021] libceph: osd23 (1)172.16.250.4:6806 socket closed (con state OPEN)
[Wed Aug 25 05:58:24 2021] libceph: osd23 (1)172.16.250.4:6806 socket closed (con state CONNECTING)
[Wed Aug 25 05:58:24 2021] libceph: osd23 down
[Wed Aug 25 05:58:56 2021] libceph: osd23 up



Code:
{
    "archived": "2021-08-25 15:47:43.902779",
    "assert_condition": "abort",
    "assert_file": "/build/ceph/ceph-15.2.13/src/os/bluestore/KernelDevice.cc",
    "assert_func": "virtual int KernelDevice::flush()",
    "assert_line": 435,
    "assert_msg": "/build/ceph/ceph-15.2.13/src/os/bluestore/KernelDevice.cc: In function 'virtual int KernelDevice::flush()' thread 7f79c1a4d700 time 2021-08-25T05:58:58.717456-0300\n/build/ceph/ceph-15.2.13/src/os/bluestore/KernelDevice.cc: 435: ceph_abort_msg(\"abort() called\")\n",
    "assert_thread_name": "bstore_kv_sync",
    "backtrace": [
        "(()+0x12730) [0x7f79d27b1730]",
        "(gsignal()+0x10b) [0x7f79d22927bb]",
        "(abort()+0x121) [0x7f79d227d535]",
        "(ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1b2) [0x559a52e3b243]",
        "(KernelDevice::flush()+0x589) [0x559a53499d19]",
        "(BlueStore::_kv_sync_thread()+0xf9e) [0x559a533beb1e]",
        "(BlueStore::KVSyncThread::entry()+0xd) [0x559a533e76ad]",
        "(()+0x7fa3) [0x7f79d27a6fa3]",
        "(clone()+0x3f) [0x7f79d23544cf]"
    ],
    "ceph_version": "15.2.13",
    "crash_id": "2021-08-25T08:58:58.723172Z_8221aeXXXXXXX378e1e825f1f",
    "entity_name": "osd.23",
    "os_id": "10",
    "os_name": "Debian GNU/Linux 10 (buster)",
    "os_version": "10 (buster)",
    "os_version_id": "10",
    "process_name": "ceph-osd",
    "stack_sig": "53cb40b4b466f57271XXXXXXXXXb25d50703c19492cb247aaad8ccd01",
    "timestamp": "2021-08-25T08:58:58.723172Z",
    "utsname_hostname": "pve2",
    "utsname_machine": "x86_64",
    "utsname_release": "5.4.124-1-pve",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP PVE 5.4.124-1 (Wed, 23 Jun 2021 13:47:09 +0200)"
}


Could it be indeed disk or could it be something else? some networking problem?

I had this same error with other disks randomly. I get the impression that it is not a problem with the disks, but something else.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!