OSD down and up

Suphachai

Member
Oct 30, 2018
9
0
21
33
Hello,

I have install proxmox-ve 7.3 and pve-ceph with nvme only.

But i found problem about osd down & up after loadtest (Clone & Start many VMs).

I try installed both version quincy & pacific but still the problem.

I found this log.

Ceph crash
ID ENTITY NEW
2023-02-02T11:26:01.154516Z_d016a459-c8e2-4d23-8175-97a96cc4e37d osd.5
2023-02-02T11:29:06.282001Z_61261c26-6cd6-4192-aa3f-3ef79cd8cd65 osd.0
2023-02-02T11:32:57.670558Z_364bba97-b48a-4cd5-b345-625f6c8ecbab osd.3
2023-02-02T12:42:27.025134Z_a8416d51-6935-4615-b43b-3bf233279eed osd.0
2023-02-02T12:42:35.431270Z_58ebf032-8ed4-4beb-9056-24bf57537740 osd.6
2023-02-02T12:44:02.615518Z_b017c1bb-1de0-4c68-bae1-06a79abd2edb osd.5
2023-02-02T12:45:15.558282Z_a705a87e-9bb4-44db-9c57-78bf3f734958 osd.2
2023-02-02T12:45:48.229621Z_3a2c0fee-47a3-4282-adb2-1a1925abce0d osd.3
2023-02-02T12:49:55.234863Z_409d40c2-e883-4b28-9e8b-96164c75be5f osd.3
2023-02-02T12:54:48.722873Z_76916821-fa13-4cb7-b7fc-57d8d9f14648 osd.3
2023-02-02T12:59:30.636228Z_aaad7b46-d0f1-4ef9-a60f-960492d5d3d5 osd.3
2023-02-02T17:05:55.863425Z_d5b966a9-d3ba-4f32-9396-d7f41299b52e osd.5
2023-02-02T17:09:16.243641Z_6b57ae53-4e67-4ff5-9eb9-345a89379942 osd.0
2023-02-02T18:00:00.623676Z_0f90310b-1da9-4003-8cdf-457cc1478b95 osd.5
2023-02-05T16:01:28.684962Z_516c60cf-317f-492f-9305-96ae799612ab osd.18 *
2023-02-05T16:09:10.985826Z_51b1b475-896e-4525-96b7-101f11e12941 osd.14 *
2023-02-05T16:17:00.243955Z_0c7b6eb1-c5df-4bf3-8dfc-acedeacf4f88 osd.22 *
2023-02-05T16:20:18.564526Z_4b1be3c1-a5c0-47b0-8dad-272874a9258a osd.23 *
2023-02-05T17:19:51.913316Z_14548931-e9a6-4765-945f-854ba6e7cb8b osd.23 *
2023-02-05T17:31:29.503036Z_64c85da0-7fde-4187-a85f-c552dd53ec6e osd.23 *
2023-02-05T19:54:31.362256Z_c3463354-cf9a-45ef-9490-6374f3691afc osd.14 *

ceph crash info 2023-02-05T16:01:28.684962Z_516c60cf-317f-492f-9305-96ae799612ab (All osd same error)
{
"assert_condition": "abort",
"assert_file": "./src/blk/kernel/KernelDevice.cc",
"assert_func": "void KernelDevice::_aio_thread()",
"assert_line": 618,
"assert_msg": "./src/blk/kernel/KernelDevice.cc: In function 'void KernelDevice::_aio_thread()' thread 7f8b54eff700 time 2023-02-05T23:01:28.679276+0700\n./src/blk/kernel/KernelDevice.cc: 618: ceph_abort_msg(\"Unexpected IO error. This may suggest a hardware issue. Please check your kernel log!\")\n",
"assert_thread_name": "bstore_aio",
"backtrace": [
"/lib/x86_64-linux-gnu/libpthread.so.0(+0x13140) [0x7f8b6185c140]",
"gsignal()",
"abort()",
"(ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x18c) [0x561e3b4fd636]",
"(KernelDevice::_aio_thread()+0xe09) [0x561e3c0d2999]",
"(KernelDevice::AioCompletionThread::entry()+0xd) [0x561e3c0dc36d]",
"/lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7) [0x7f8b61850ea7]",
"clone()"
],
"ceph_version": "17.2.5",
"crash_id": "2023-02-05T16:01:28.684962Z_516c60cf-317f-492f-9305-96ae799612ab",
"entity_name": "osd.18",
"io_error": true,
"io_error_code": -5,
"io_error_devname": "dm-7",
"io_error_length": 65536,
"io_error_offset": 38617284608,
"io_error_optype": 8,
"io_error_path": "/var/lib/ceph/osd/ceph-18/block",
"os_id": "11",
"os_name": "Debian GNU/Linux 11 (bullseye)",
"os_version": "11 (bullseye)",
"os_version_id": "11",
"process_name": "ceph-osd",
"stack_sig": "4939975fa6e0c6307f99824a995118cdb06b9b8f88ffe0d4f0df0d13901562bd",
"timestamp": "2023-02-05T16:01:28.684962Z",
"utsname_hostname": "Node02",
"utsname_machine": "x86_64",
"utsname_release": "5.15.83-1-pve",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP PVE 5.15.83-1 (2022-12-15T00:00Z)"
}




Log dmesg
Node01
[Fri Feb 3 00:09:15 2023] blk_update_request: I/O error, dev nvme2n1, sector 116830656 op 0x1:(WRITE) flags 0x8800 phys_seg 32 prio class 0
[Fri Feb 3 00:09:29 2023] blk_update_request: I/O error, dev nvme2n1, sector 14646272 op 0x0:(READ) flags 0x80700 phys_seg 9 prio class 0
[Fri Feb 3 00:09:30 2023] blk_update_request: I/O error, dev nvme2n1, sector 20962688 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0
[Fri Feb 3 00:09:31 2023] blk_update_request: I/O error, dev nvme2n1, sector 19433216 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[Fri Feb 3 00:09:31 2023] blk_update_request: I/O error, dev nvme2n1, sector 93293568 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[Fri Feb 3 00:09:33 2023] blk_update_request: I/O error, dev nvme2n1, sector 82667904 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[Fri Feb 3 00:09:33 2023] blk_update_request: I/O error, dev nvme2n1, sector 83051392 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[Fri Feb 3 00:09:35 2023] blk_update_request: I/O error, dev nvme2n1, sector 115971840 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0
[Fri Feb 3 00:09:36 2023] blk_update_request: I/O error, dev nvme2n1, sector 170921728 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 0
[Fri Feb 3 21:00:39 2023] blk_update_request: I/O error, dev nvme3n1, sector 2332544 op 0x0:(READ) flags 0x0 phys_seg 2 prio class 0

Node02
[Fri Feb 3 00:05:55 2023] blk_update_request: I/O error, dev nvme4n1, sector 213514752 op 0x1:(WRITE) flags 0x8800 phys_seg 28 prio class 0
[Fri Feb 3 00:06:12 2023] blk_update_request: I/O error, dev nvme4n1, sector 65750272 op 0x0:(READ) flags 0x80700 phys_seg 32 prio class 0
[Fri Feb 3 00:06:14 2023] blk_update_request: I/O error, dev nvme4n1, sector 208847104 op 0x0:(READ) flags 0x80700 phys_seg 32 prio class 0
[Fri Feb 3 01:00:00 2023] blk_update_request: I/O error, dev nvme4n1, sector 18178888 op 0x1:(WRITE) flags 0x8800 phys_seg 19 prio class 0
[Sun Feb 5 23:01:27 2023] blk_update_request: I/O error, dev nvme9n1, sector 75426432 op 0x1:(WRITE) flags 0x8800 phys_seg 26 prio class 0
[Sun Feb 5 23:09:09 2023] blk_update_request: I/O error, dev nvme5n1, sector 125865192 op 0x1:(WRITE) flags 0x8800 phys_seg 31 prio class 0
[Sun Feb 5 23:09:24 2023] blk_update_request: I/O error, dev nvme5n1, sector 121124992 op 0x0:(READ) flags 0x80700 phys_seg 32 prio class 0
[Sun Feb 5 23:09:24 2023] blk_update_request: I/O error, dev nvme5n1, sector 49280 op 0x0:(READ) flags 0x80700 phys_seg 17 prio class 0
[Sun Feb 5 23:09:25 2023] blk_update_request: I/O error, dev nvme5n1, sector 7581056 op 0x0:(READ) flags 0x80700 phys_seg 17 prio class 0
[Sun Feb 5 23:09:26 2023] blk_update_request: I/O error, dev nvme5n1, sector 11028608 op 0x0:(READ) flags 0x80700 phys_seg 5 prio class 0
[Mon Feb 6 02:54:29 2023] blk_update_request: I/O error, dev nvme5n1, sector 26314472 op 0x1:(WRITE) flags 0x8800 phys_seg 10 prio class 0


Node03
[Fri Feb 3 02:16:08 2023] blk_update_request: I/O error, dev nvme4n1, sector 205319680 op 0x0:(READ) flags 0x80700 phys_seg 3 prio class 0
[Sun Feb 5 23:16:58 2023] blk_update_request: I/O error, dev nvme8n1, sector 132352152 op 0x1:(WRITE) flags 0x8800 phys_seg 25 prio class 0
[Sun Feb 5 23:20:16 2023] blk_update_request: I/O error, dev nvme9n1, sector 234724840 op 0x1:(WRITE) flags 0x8800 phys_seg 21 prio class 0
[Sun Feb 5 23:20:32 2023] blk_update_request: I/O error, dev nvme9n1, sector 37248 op 0x0:(READ) flags 0x80700 phys_seg 19 prio class 0
[Mon Feb 6 00:19:49 2023] blk_update_request: I/O error, dev nvme9n1, sector 39435520 op 0x1:(WRITE) flags 0x8800 phys_seg 25 prio class 0
[Mon Feb 6 00:31:27 2023] blk_update_request: I/O error, dev nvme9n1, sector 241601664 op 0x1:(WRITE) flags 0x8800 phys_seg 20 prio class 0

Another log
2023-02-02T13:14:30.342+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6804 osd.0 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:30.342+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6812 osd.1 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:30.342+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6820 osd.2 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:31.318+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6804 osd.0 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:31.318+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6812 osd.1 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:31.318+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6820 osd.2 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:32.346+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6804 osd.0 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:32.346+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6812 osd.1 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:32.346+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6820 osd.2 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:33.382+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6804 osd.0 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:33.382+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6812 osd.1 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:33.382+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6820 osd.2 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:34.430+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6804 osd.0 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:34.430+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6812 osd.1 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:34.430+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6820 osd.2 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:35.438+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6804 osd.0 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:35.438+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6812 osd.1 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:35.438+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6820 osd.2 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:36.438+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6804 osd.0 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:36.438+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6812 osd.1 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:36.438+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6820 osd.2 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:37.402+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6804 osd.0 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:37.402+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6812 osd.1 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:37.402+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6820 osd.2 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:38.374+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6804 osd.0 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:38.374+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6812 osd.1 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:38.374+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6820 osd.2 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:39.370+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6804 osd.0 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:39.370+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6812 osd.1 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T13:14:39.370+0700 7f6f4e421700 -1 osd.5 114 heartbeat_check: no reply from 10.122.167.12:6820 osd.2 ever on either front or back, first ping sent 2023-02-02T13:07:29.287412+0700 (oldest deadline 2023-02-02T13:07:49.287412+0700)
2023-02-02T14:35:26.769+0700 7f6f4c41d700 -1 bdev(0x556b1a082000 /var/lib/ceph/osd/ceph-5/block) _aio_thread got r=-5 ((5) Input/output error)

Anyone can help or suggest how to testing and check.

Best Regards,
 

Attachments

  • down.JPG
    down.JPG
    37.2 KB · Views: 5

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!