Dear,
How can I interpret this error in the dmesg. I have implemented a CEPH sotrage with 3 nodes. and about 24 disks each node.
I have had different disks on different nodes fail in the same way. I may have a bad batch of disks, but I suspect some other problem or other.
pveversion
pve-manager/6.4-11/28d576c2 (running kernel: 5.4.124-1-pve)
ceph -v
ceph version 15.2.13 (1f5c7871ec0e36ade641773b9b05b6211c308b9d) octopus (stable)
Could it be indeed disk or could it be something else? some networking problem?
I had this same error with other disks randomly. I get the impression that it is not a problem with the disks, but something else.
How can I interpret this error in the dmesg. I have implemented a CEPH sotrage with 3 nodes. and about 24 disks each node.
I have had different disks on different nodes fail in the same way. I may have a bad batch of disks, but I suspect some other problem or other.
pveversion
pve-manager/6.4-11/28d576c2 (running kernel: 5.4.124-1-pve)
ceph -v
ceph version 15.2.13 (1f5c7871ec0e36ade641773b9b05b6211c308b9d) octopus (stable)
Code:
[Tue Aug 24 23:01:53 2021] vmbr0: port 28(fwpr715p0) entered disabled state
[Tue Aug 24 23:19:04 2021] EXT4-fs (rbd4): mounted filesystem with ordered data mode. Opts: (null)
[Tue Aug 24 23:19:04 2021] EXT4-fs (rbd5): mounted filesystem with ordered data mode. Opts: (null)
[Tue Aug 24 23:19:04 2021] EXT4-fs (rbd6): mounted filesystem with ordered data mode. Opts: (null)
[Tue Aug 24 23:59:56 2021] audit: type=1400 audit(1629860431.232:110): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-103_</var/lib/lxc>" name="/run/systemd/unit-root/" pid=3273376 comm="(ogrotate)" srcname="/" flags="rw, rbind"
[Wed Aug 25 01:59:29 2021] rbd: rbd7: capacity 8589934592 features 0x3d
[Wed Aug 25 01:59:29 2021] EXT4-fs (rbd7): write access unavailable, skipping orphan cleanup
[Wed Aug 25 01:59:29 2021] EXT4-fs (rbd7): mounted filesystem without journal. Opts: noload
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:13 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:14 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:14 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:01:14 2021] rbd: rbd1: no lock owners detected
[Wed Aug 25 02:02:07 2021] rbd: rbd7: capacity 34359738368 features 0x3d
[Wed Aug 25 02:02:08 2021] EXT4-fs (rbd7): mounted filesystem without journal. Opts: noload
[Wed Aug 25 05:58:23 2021] blk_update_request: I/O error, dev nvme10n1, sector 3002655384 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[Wed Aug 25 05:58:23 2021] Buffer I/O error on dev dm-11, logical block 375327571, lost async page write
[Wed Aug 25 05:58:23 2021] blk_update_request: I/O error, dev nvme10n1, sector 3002655392 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0
[Wed Aug 25 05:58:23 2021] Buffer I/O error on dev dm-11, logical block 375327572, lost async page write
[Wed Aug 25 05:58:23 2021] Buffer I/O error on dev dm-11, logical block 375327573, lost async page write
[Wed Aug 25 05:58:23 2021] libceph: osd23 (1)172.16.250.4:6806 socket closed (con state OPEN)
[Wed Aug 25 05:58:24 2021] libceph: osd23 (1)172.16.250.4:6806 socket closed (con state CONNECTING)
[Wed Aug 25 05:58:24 2021] libceph: osd23 down
[Wed Aug 25 05:58:56 2021] libceph: osd23 up
Code:
{
"archived": "2021-08-25 15:47:43.902779",
"assert_condition": "abort",
"assert_file": "/build/ceph/ceph-15.2.13/src/os/bluestore/KernelDevice.cc",
"assert_func": "virtual int KernelDevice::flush()",
"assert_line": 435,
"assert_msg": "/build/ceph/ceph-15.2.13/src/os/bluestore/KernelDevice.cc: In function 'virtual int KernelDevice::flush()' thread 7f79c1a4d700 time 2021-08-25T05:58:58.717456-0300\n/build/ceph/ceph-15.2.13/src/os/bluestore/KernelDevice.cc: 435: ceph_abort_msg(\"abort() called\")\n",
"assert_thread_name": "bstore_kv_sync",
"backtrace": [
"(()+0x12730) [0x7f79d27b1730]",
"(gsignal()+0x10b) [0x7f79d22927bb]",
"(abort()+0x121) [0x7f79d227d535]",
"(ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1b2) [0x559a52e3b243]",
"(KernelDevice::flush()+0x589) [0x559a53499d19]",
"(BlueStore::_kv_sync_thread()+0xf9e) [0x559a533beb1e]",
"(BlueStore::KVSyncThread::entry()+0xd) [0x559a533e76ad]",
"(()+0x7fa3) [0x7f79d27a6fa3]",
"(clone()+0x3f) [0x7f79d23544cf]"
],
"ceph_version": "15.2.13",
"crash_id": "2021-08-25T08:58:58.723172Z_8221aeXXXXXXX378e1e825f1f",
"entity_name": "osd.23",
"os_id": "10",
"os_name": "Debian GNU/Linux 10 (buster)",
"os_version": "10 (buster)",
"os_version_id": "10",
"process_name": "ceph-osd",
"stack_sig": "53cb40b4b466f57271XXXXXXXXXb25d50703c19492cb247aaad8ccd01",
"timestamp": "2021-08-25T08:58:58.723172Z",
"utsname_hostname": "pve2",
"utsname_machine": "x86_64",
"utsname_release": "5.4.124-1-pve",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP PVE 5.4.124-1 (Wed, 23 Jun 2021 13:47:09 +0200)"
}
Could it be indeed disk or could it be something else? some networking problem?
I had this same error with other disks randomly. I get the impression that it is not a problem with the disks, but something else.
Last edited: