Hello,
I have install proxmox-ve 7.3 and pve-ceph with nvme only.
But i found problem about osd down & up after loadtest (Clone & Start many VMs).
I try installed both version quincy & pacific but still the problem.
I found this log.
Ceph crash
ID ENTITY NEW
2023-02-02T11:26:01.154516Z_d016a459-c8e2-4d23-8175-97a96cc4e37d osd.5
2023-02-02T11:29:06.282001Z_61261c26-6cd6-4192-aa3f-3ef79cd8cd65 osd.0
2023-02-02T11:32:57.670558Z_364bba97-b48a-4cd5-b345-625f6c8ecbab osd.3
2023-02-02T12:42:27.025134Z_a8416d51-6935-4615-b43b-3bf233279eed osd.0
2023-02-02T12:42:35.431270Z_58ebf032-8ed4-4beb-9056-24bf57537740 osd.6
2023-02-02T12:44:02.615518Z_b017c1bb-1de0-4c68-bae1-06a79abd2edb osd.5
2023-02-02T12:45:15.558282Z_a705a87e-9bb4-44db-9c57-78bf3f734958 osd.2
2023-02-02T12:45:48.229621Z_3a2c0fee-47a3-4282-adb2-1a1925abce0d osd.3
2023-02-02T12:49:55.234863Z_409d40c2-e883-4b28-9e8b-96164c75be5f osd.3
2023-02-02T12:54:48.722873Z_76916821-fa13-4cb7-b7fc-57d8d9f14648 osd.3
2023-02-02T12:59:30.636228Z_aaad7b46-d0f1-4ef9-a60f-960492d5d3d5 osd.3
2023-02-02T17:05:55.863425Z_d5b966a9-d3ba-4f32-9396-d7f41299b52e osd.5
2023-02-02T17:09:16.243641Z_6b57ae53-4e67-4ff5-9eb9-345a89379942 osd.0
2023-02-02T18:00:00.623676Z_0f90310b-1da9-4003-8cdf-457cc1478b95 osd.5
2023-02-05T16:01:28.684962Z_516c60cf-317f-492f-9305-96ae799612ab osd.18 *
2023-02-05T16:09:10.985826Z_51b1b475-896e-4525-96b7-101f11e12941 osd.14 *
2023-02-05T16:17:00.243955Z_0c7b6eb1-c5df-4bf3-8dfc-acedeacf4f88 osd.22 *
2023-02-05T16:20:18.564526Z_4b1be3c1-a5c0-47b0-8dad-272874a9258a osd.23 *
2023-02-05T17:19:51.913316Z_14548931-e9a6-4765-945f-854ba6e7cb8b osd.23 *
2023-02-05T17:31:29.503036Z_64c85da0-7fde-4187-a85f-c552dd53ec6e osd.23 *
2023-02-05T19:54:31.362256Z_c3463354-cf9a-45ef-9490-6374f3691afc osd.14 *
ceph crash info 2023-02-05T16:01:28.684962Z_516c60cf-317f-492f-9305-96ae799612ab (All osd same error)
{
"assert_condition": "abort",
"assert_file": "./src/blk/kernel/KernelDevice.cc",
"assert_func": "void KernelDevice::_aio_thread()",
"assert_line": 618,
"assert_msg": "./src/blk/kernel/KernelDevice.cc: In function 'void KernelDevice::_aio_thread()' thread 7f8b54eff700 time 2023-02-05T23:01:28.679276+0700\n./src/blk/kernel/KernelDevice.cc: 618: ceph_abort_msg(\"Unexpected IO error. This may suggest a hardware issue. Please check your kernel log!\")\n",
"assert_thread_name": "bstore_aio",
"backtrace": [
"/lib/x86_64-linux-gnu/libpthread.so.0(+0x13140) [0x7f8b6185c140]",
"gsignal()",
"abort()",
"(ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x18c) [0x561e3b4fd636]",
"(KernelDevice::_aio_thread()+0xe09) [0x561e3c0d2999]",
"(KernelDevice::AioCompletionThread::entry()+0xd) [0x561e3c0dc36d]",
"/lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7f8b61850ea7]",
"clone()"
],
"ceph_version": "17.2.5",
"crash_id": "2023-02-05T16:01:28.684962Z_516c60cf-317f-492f-9305-96ae799612ab",
"entity_name": "osd.18",
"io_error": true,
"io_error_code": -5,
"io_error_devname": "dm-7",
"io_error_length": 65536,
"io_error_offset": 38617284608,
"io_error_optype": 8,
"io_error_path": "/var/lib/ceph/osd/ceph-18/block",
"os_id": "11",
"os_name": "Debian GNU/Linux 11 (bullseye)",
"os_version": "11 (bullseye)",
"os_version_id": "11",
"process_name": "ceph-osd",
"stack_sig": "4939975fa6e0c6307f99824a995118cdb06b9b8f88ffe0d4f0df0d13901562bd",
"timestamp": "2023-02-05T16:01:28.684962Z",
"utsname_hostname": "Node02",
"utsname_machine": "x86_64",
"utsname_release": "5.15.83-1-pve",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP PVE 5.15.83-1 (2022-12-15T00:00Z)"
}
Log dmesg
Node01
[Fri Feb 3 00:09:15 2023] blk_update_request: I/O error, dev nvme2n1, sector 116830656 op 0x1WRITE) flags 0x8800 phys_seg 32 prio class 0
[Fri Feb 3 00:09:29 2023] blk_update_request: I/O error, dev nvme2n1, sector 14646272 op 0x0READ) flags 0x80700 phys_seg 9 prio class 0
[Fri Feb 3 00:09:30 2023] blk_update_request: I/O error, dev nvme2n1, sector 20962688 op 0x0READ) flags 0x80700 phys_seg 2 prio class 0
[Fri Feb 3 00:09:31 2023] blk_update_request: I/O error, dev nvme2n1, sector 19433216 op 0x0READ) flags 0x80700 phys_seg 1 prio class 0
[Fri Feb 3 00:09:31 2023] blk_update_request: I/O error, dev nvme2n1, sector 93293568 op 0x0READ) flags 0x80700 phys_seg 1 prio class 0
[Fri Feb 3 00:09:33 2023] blk_update_request: I/O error, dev nvme2n1, sector 82667904 op 0x0READ) flags 0x80700 phys_seg 1 prio class 0
[Fri Feb 3 00:09:33 2023] blk_update_request: I/O error, dev nvme2n1, sector 83051392 op 0x0READ) flags 0x80700 phys_seg 1 prio class 0
[Fri Feb 3 00:09:35 2023] blk_update_request: I/O error, dev nvme2n1, sector 115971840 op 0x0READ) flags 0x80700 phys_seg 2 prio class 0
[Fri Feb 3 00:09:36 2023] blk_update_request: I/O error, dev nvme2n1, sector 170921728 op 0x0READ) flags 0x80700 phys_seg 2 prio class 0
[Fri Feb 3 21:00:39 2023] blk_update_request: I/O error, dev nvme3n1, sector 2332544 op 0x0READ) flags 0x0 phys_seg 2 prio class 0
Node02
[Fri Feb 3 00:05:55 2023] blk_update_request: I/O error, dev nvme4n1, sector 213514752 op 0x1WRITE) flags 0x8800 phys_seg 28 prio class 0
[Fri Feb 3 00:06:12 2023] blk_update_request: I/O error, dev nvme4n1, sector 65750272 op 0x0READ) flags 0x80700 phys_seg 32 prio class 0
[Fri Feb 3 00:06:14 2023] blk_update_request: I/O error, dev nvme4n1, sector 208847104 op 0x0READ) flags 0x80700 phys_seg 32 prio class 0
[Fri Feb 3 01:00:00 2023] blk_update_request: I/O error, dev nvme4n1, sector 18178888 op 0x1WRITE) flags 0x8800 phys_seg 19 prio class 0
[Sun Feb 5 23:01:27 2023] blk_update_request: I/O error, dev nvme9n1, sector 75426432 op 0x1WRITE) flags 0x8800 phys_seg 26 prio class 0
[Sun Feb 5 23:09:09 2023] blk_update_request: I/O error, dev nvme5n1, sector 125865192 op 0x1WRITE) flags 0x8800 phys_seg 31 prio class 0
[Sun Feb 5 23:09:24 2023] blk_update_request: I/O error, dev nvme5n1, sector 121124992 op 0x0READ) flags 0x80700 phys_seg 32 prio class 0
[Sun Feb 5 23:09:24 2023] blk_update_request: I/O error, dev nvme5n1, sector 49280 op 0x0READ) flags 0x80700 phys_seg 17 prio class 0
[Sun Feb 5 23:09:25 2023] blk_update_request: I/O error, dev nvme5n1, sector 7581056 op 0x0READ) flags 0x80700 phys_seg 17 prio class 0
[Sun Feb 5 23:09:26 2023] blk_update_request: I/O error, dev nvme5n1, sector 11028608 op 0x0READ) flags 0x80700 phys_seg 5 prio class 0
[Mon Feb 6 02:54:29 2023] blk_update_request: I/O error, dev nvme5n1, sector 26314472 op 0x1WRITE) flags 0x8800 phys_seg 10 prio class 0
Node03
[Fri Feb 3 02:16:08 2023] blk_update_request: I/O error, dev nvme4n1, sector 205319680 op 0x0READ) flags 0x80700 phys_seg 3 prio class 0
[Sun Feb 5 23:16:58 2023] blk_update_request: I/O error, dev nvme8n1, sector 132352152 op 0x1WRITE) flags 0x8800 phys_seg 25 prio class 0
[Sun Feb 5 23:20:16 2023] blk_update_request: I/O error, dev nvme9n1, sector 234724840 op 0x1WRITE) flags 0x8800 phys_seg 21 prio class 0
[Sun Feb 5 23:20:32 2023] blk_update_request: I/O error, dev nvme9n1, sector 37248 op 0x0READ) flags 0x80700 phys_seg 19 prio class 0
[Mon Feb 6 00:19:49 2023] blk_update_request: I/O error, dev nvme9n1, sector 39435520 op 0x1WRITE) flags 0x8800 phys_seg 25 prio class 0
[Mon Feb 6 00:31:27 2023] blk_update_request: I/O error, dev nvme9n1, sector 241601664 op 0x1WRITE) flags 0x8800 phys_seg 20 prio class 0
Another log
2023-02-02T13:14:30.342+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6804 osd.0 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:30.342+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6812 osd.1 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:30.342+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6820 osd.2 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:31.318+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6804 osd.0 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:31.318+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6812 osd.1 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:31.318+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6820 osd.2 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:32.346+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6804 osd.0 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:32.346+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6812 osd.1 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:32.346+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6820 osd.2 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:33.382+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6804 osd.0 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:33.382+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6812 osd.1 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:33.382+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6820 osd.2 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:34.430+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6804 osd.0 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:34.430+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6812 osd.1 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:34.430+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6820 osd.2 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:35.438+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6804 osd.0 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:35.438+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6812 osd.1 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:35.438+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6820 osd.2 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:36.438+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6804 osd.0 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:36.438+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6812 osd.1 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:36.438+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6820 osd.2 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:37.402+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6804 osd.0 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:37.402+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6812 osd.1 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:37.402+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6820 osd.2 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:38.374+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6804 osd.0 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:38.374+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6812 osd.1 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:38.374+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6820 osd.2 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:39.370+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6804 osd.0 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:39.370+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6812 osd.1 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:39.370+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6820 osd.2 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T14:35:26.769+0700 7f6f4c41d700 -1 bdev(0x556b1a082000 /var/lib/ceph/osd/ceph-5/block) _aio_thread got r=-5 ((5) Input/output error)
Anyone can help or suggest how to testing and check.
Best Regards,
I have install proxmox-ve 7.3 and pve-ceph with nvme only.
But i found problem about osd down & up after loadtest (Clone & Start many VMs).
I try installed both version quincy & pacific but still the problem.
I found this log.
Ceph crash
ID ENTITY NEW
2023-02-02T11:26:01.154516Z_d016a459-c8e2-4d23-8175-97a96cc4e37d osd.5
2023-02-02T11:29:06.282001Z_61261c26-6cd6-4192-aa3f-3ef79cd8cd65 osd.0
2023-02-02T11:32:57.670558Z_364bba97-b48a-4cd5-b345-625f6c8ecbab osd.3
2023-02-02T12:42:27.025134Z_a8416d51-6935-4615-b43b-3bf233279eed osd.0
2023-02-02T12:42:35.431270Z_58ebf032-8ed4-4beb-9056-24bf57537740 osd.6
2023-02-02T12:44:02.615518Z_b017c1bb-1de0-4c68-bae1-06a79abd2edb osd.5
2023-02-02T12:45:15.558282Z_a705a87e-9bb4-44db-9c57-78bf3f734958 osd.2
2023-02-02T12:45:48.229621Z_3a2c0fee-47a3-4282-adb2-1a1925abce0d osd.3
2023-02-02T12:49:55.234863Z_409d40c2-e883-4b28-9e8b-96164c75be5f osd.3
2023-02-02T12:54:48.722873Z_76916821-fa13-4cb7-b7fc-57d8d9f14648 osd.3
2023-02-02T12:59:30.636228Z_aaad7b46-d0f1-4ef9-a60f-960492d5d3d5 osd.3
2023-02-02T17:05:55.863425Z_d5b966a9-d3ba-4f32-9396-d7f41299b52e osd.5
2023-02-02T17:09:16.243641Z_6b57ae53-4e67-4ff5-9eb9-345a89379942 osd.0
2023-02-02T18:00:00.623676Z_0f90310b-1da9-4003-8cdf-457cc1478b95 osd.5
2023-02-05T16:01:28.684962Z_516c60cf-317f-492f-9305-96ae799612ab osd.18 *
2023-02-05T16:09:10.985826Z_51b1b475-896e-4525-96b7-101f11e12941 osd.14 *
2023-02-05T16:17:00.243955Z_0c7b6eb1-c5df-4bf3-8dfc-acedeacf4f88 osd.22 *
2023-02-05T16:20:18.564526Z_4b1be3c1-a5c0-47b0-8dad-272874a9258a osd.23 *
2023-02-05T17:19:51.913316Z_14548931-e9a6-4765-945f-854ba6e7cb8b osd.23 *
2023-02-05T17:31:29.503036Z_64c85da0-7fde-4187-a85f-c552dd53ec6e osd.23 *
2023-02-05T19:54:31.362256Z_c3463354-cf9a-45ef-9490-6374f3691afc osd.14 *
ceph crash info 2023-02-05T16:01:28.684962Z_516c60cf-317f-492f-9305-96ae799612ab (All osd same error)
{
"assert_condition": "abort",
"assert_file": "./src/blk/kernel/KernelDevice.cc",
"assert_func": "void KernelDevice::_aio_thread()",
"assert_line": 618,
"assert_msg": "./src/blk/kernel/KernelDevice.cc: In function 'void KernelDevice::_aio_thread()' thread 7f8b54eff700 time 2023-02-05T23:01:28.679276+0700\n./src/blk/kernel/KernelDevice.cc: 618: ceph_abort_msg(\"Unexpected IO error. This may suggest a hardware issue. Please check your kernel log!\")\n",
"assert_thread_name": "bstore_aio",
"backtrace": [
"/lib/x86_64-linux-gnu/libpthread.so.0(+0x13140) [0x7f8b6185c140]",
"gsignal()",
"abort()",
"(ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x18c) [0x561e3b4fd636]",
"(KernelDevice::_aio_thread()+0xe09) [0x561e3c0d2999]",
"(KernelDevice::AioCompletionThread::entry()+0xd) [0x561e3c0dc36d]",
"/lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7f8b61850ea7]",
"clone()"
],
"ceph_version": "17.2.5",
"crash_id": "2023-02-05T16:01:28.684962Z_516c60cf-317f-492f-9305-96ae799612ab",
"entity_name": "osd.18",
"io_error": true,
"io_error_code": -5,
"io_error_devname": "dm-7",
"io_error_length": 65536,
"io_error_offset": 38617284608,
"io_error_optype": 8,
"io_error_path": "/var/lib/ceph/osd/ceph-18/block",
"os_id": "11",
"os_name": "Debian GNU/Linux 11 (bullseye)",
"os_version": "11 (bullseye)",
"os_version_id": "11",
"process_name": "ceph-osd",
"stack_sig": "4939975fa6e0c6307f99824a995118cdb06b9b8f88ffe0d4f0df0d13901562bd",
"timestamp": "2023-02-05T16:01:28.684962Z",
"utsname_hostname": "Node02",
"utsname_machine": "x86_64",
"utsname_release": "5.15.83-1-pve",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP PVE 5.15.83-1 (2022-12-15T00:00Z)"
}
Log dmesg
Node01
[Fri Feb 3 00:09:15 2023] blk_update_request: I/O error, dev nvme2n1, sector 116830656 op 0x1WRITE) flags 0x8800 phys_seg 32 prio class 0
[Fri Feb 3 00:09:29 2023] blk_update_request: I/O error, dev nvme2n1, sector 14646272 op 0x0READ) flags 0x80700 phys_seg 9 prio class 0
[Fri Feb 3 00:09:30 2023] blk_update_request: I/O error, dev nvme2n1, sector 20962688 op 0x0READ) flags 0x80700 phys_seg 2 prio class 0
[Fri Feb 3 00:09:31 2023] blk_update_request: I/O error, dev nvme2n1, sector 19433216 op 0x0READ) flags 0x80700 phys_seg 1 prio class 0
[Fri Feb 3 00:09:31 2023] blk_update_request: I/O error, dev nvme2n1, sector 93293568 op 0x0READ) flags 0x80700 phys_seg 1 prio class 0
[Fri Feb 3 00:09:33 2023] blk_update_request: I/O error, dev nvme2n1, sector 82667904 op 0x0READ) flags 0x80700 phys_seg 1 prio class 0
[Fri Feb 3 00:09:33 2023] blk_update_request: I/O error, dev nvme2n1, sector 83051392 op 0x0READ) flags 0x80700 phys_seg 1 prio class 0
[Fri Feb 3 00:09:35 2023] blk_update_request: I/O error, dev nvme2n1, sector 115971840 op 0x0READ) flags 0x80700 phys_seg 2 prio class 0
[Fri Feb 3 00:09:36 2023] blk_update_request: I/O error, dev nvme2n1, sector 170921728 op 0x0READ) flags 0x80700 phys_seg 2 prio class 0
[Fri Feb 3 21:00:39 2023] blk_update_request: I/O error, dev nvme3n1, sector 2332544 op 0x0READ) flags 0x0 phys_seg 2 prio class 0
Node02
[Fri Feb 3 00:05:55 2023] blk_update_request: I/O error, dev nvme4n1, sector 213514752 op 0x1WRITE) flags 0x8800 phys_seg 28 prio class 0
[Fri Feb 3 00:06:12 2023] blk_update_request: I/O error, dev nvme4n1, sector 65750272 op 0x0READ) flags 0x80700 phys_seg 32 prio class 0
[Fri Feb 3 00:06:14 2023] blk_update_request: I/O error, dev nvme4n1, sector 208847104 op 0x0READ) flags 0x80700 phys_seg 32 prio class 0
[Fri Feb 3 01:00:00 2023] blk_update_request: I/O error, dev nvme4n1, sector 18178888 op 0x1WRITE) flags 0x8800 phys_seg 19 prio class 0
[Sun Feb 5 23:01:27 2023] blk_update_request: I/O error, dev nvme9n1, sector 75426432 op 0x1WRITE) flags 0x8800 phys_seg 26 prio class 0
[Sun Feb 5 23:09:09 2023] blk_update_request: I/O error, dev nvme5n1, sector 125865192 op 0x1WRITE) flags 0x8800 phys_seg 31 prio class 0
[Sun Feb 5 23:09:24 2023] blk_update_request: I/O error, dev nvme5n1, sector 121124992 op 0x0READ) flags 0x80700 phys_seg 32 prio class 0
[Sun Feb 5 23:09:24 2023] blk_update_request: I/O error, dev nvme5n1, sector 49280 op 0x0READ) flags 0x80700 phys_seg 17 prio class 0
[Sun Feb 5 23:09:25 2023] blk_update_request: I/O error, dev nvme5n1, sector 7581056 op 0x0READ) flags 0x80700 phys_seg 17 prio class 0
[Sun Feb 5 23:09:26 2023] blk_update_request: I/O error, dev nvme5n1, sector 11028608 op 0x0READ) flags 0x80700 phys_seg 5 prio class 0
[Mon Feb 6 02:54:29 2023] blk_update_request: I/O error, dev nvme5n1, sector 26314472 op 0x1WRITE) flags 0x8800 phys_seg 10 prio class 0
Node03
[Fri Feb 3 02:16:08 2023] blk_update_request: I/O error, dev nvme4n1, sector 205319680 op 0x0READ) flags 0x80700 phys_seg 3 prio class 0
[Sun Feb 5 23:16:58 2023] blk_update_request: I/O error, dev nvme8n1, sector 132352152 op 0x1WRITE) flags 0x8800 phys_seg 25 prio class 0
[Sun Feb 5 23:20:16 2023] blk_update_request: I/O error, dev nvme9n1, sector 234724840 op 0x1WRITE) flags 0x8800 phys_seg 21 prio class 0
[Sun Feb 5 23:20:32 2023] blk_update_request: I/O error, dev nvme9n1, sector 37248 op 0x0READ) flags 0x80700 phys_seg 19 prio class 0
[Mon Feb 6 00:19:49 2023] blk_update_request: I/O error, dev nvme9n1, sector 39435520 op 0x1WRITE) flags 0x8800 phys_seg 25 prio class 0
[Mon Feb 6 00:31:27 2023] blk_update_request: I/O error, dev nvme9n1, sector 241601664 op 0x1WRITE) flags 0x8800 phys_seg 20 prio class 0
Another log
2023-02-02T13:14:30.342+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6804 osd.0 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:30.342+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6812 osd.1 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:30.342+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6820 osd.2 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:31.318+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6804 osd.0 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:31.318+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6812 osd.1 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:31.318+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6820 osd.2 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:32.346+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6804 osd.0 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:32.346+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6812 osd.1 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:32.346+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6820 osd.2 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:33.382+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6804 osd.0 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:33.382+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6812 osd.1 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:33.382+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6820 osd.2 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:34.430+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6804 osd.0 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:34.430+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6812 osd.1 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:34.430+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6820 osd.2 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:35.438+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6804 osd.0 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:35.438+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6812 osd.1 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:35.438+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6820 osd.2 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:36.438+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6804 osd.0 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:36.438+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6812 osd.1 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:36.438+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6820 osd.2 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:37.402+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6804 osd.0 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:37.402+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6812 osd.1 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:37.402+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6820 osd.2 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:38.374+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6804 osd.0 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:38.374+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6812 osd.1 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:38.374+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6820 osd.2 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:39.370+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6804 osd.0 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:39.370+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6812 osd.1 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:39.370+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6820 osd.2 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T14:35:26.769+0700 7f6f4c41d700 -1 bdev(0x556b1a082000 /var/lib/ceph/osd/ceph-5/block) _aio_thread got r=-5 ((5) Input/output error)
Anyone can help or suggest how to testing and check.
Best Regards,