Ceph osd's go down over time

Faryu

New Member
Dec 14, 2017
4
0
1
Hello,

I have a cluster with 3 PCs, each having 3 osd's.
It works great so far, but over time (after a few hours / days) the osd's start to go down.

The cluster has been in use for about 4 weeks and roughly I lose one osd per day.
After it's down it cannot be restarted via ceph commands.
Even a (hot) reboot of the pc won't fix it.
But if I shut down the pc and start it again, all osd's are back online again and the "osd goes down" cycle starts again.

I'm using consumer grade hdd disks. What I would like to know is, if it's a hardware problem or a software problem. The fact, that it works fine for a time after a restart lets me doubt that the hdd's are defect. I also find it unlikely that all 9 hdds have the same problem.

After the osd is down the block device is still available in the system. Checked it with lsblk and in the /dev directory.

pve-manager/5.1-43/bdb08029 (running kernel: 4.13.13-5-pve)
ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable)

Here the dmesg output after an osd failed:

[15144.958084] device tap103i0 entered promiscuous mode
[15144.963120] vmbr0: port 3(tap103i0) entered blocking state
[15144.963121] vmbr0: port 3(tap103i0) entered disabled state
[15144.963175] vmbr0: port 3(tap103i0) entered blocking state
[15144.963176] vmbr0: port 3(tap103i0) entered forwarding state
[95238.849549] ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[95238.849561] ata10.00: failed command: FLUSH CACHE EXT
[95238.849567] ata10.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 18
res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[95238.849573] ata10.00: status: { DRDY }
[95238.849578] ata10: hard resetting link
[95248.849796] ata10: softreset failed (1st FIS failed)
[95248.849808] ata10: hard resetting link
[95258.850199] ata10: softreset failed (1st FIS failed)
[95258.850210] ata10: hard resetting link
[95293.849896] ata10: softreset failed (1st FIS failed)
[95293.849910] ata10: limiting SATA link speed to 3.0 Gbps
[95293.849911] ata10: hard resetting link
[95299.025448] ata10: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
[95299.025453] ata10.00: link online but device misclassified
[95304.117429] ata10.00: qc timeout (cmd 0xec)
[95304.117438] ata10.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[95304.117439] ata10.00: revalidation failed (errno=-5)
[95304.117448] ata10: hard resetting link
[95314.117507] ata10: softreset failed (1st FIS failed)
[95314.117516] ata10: hard resetting link
[95324.118412] ata10: softreset failed (1st FIS failed)
[95324.118421] ata10: hard resetting link
[95337.141381] INFO: task bstore_kv_sync:2185 blocked for more than 120 seconds.
[95337.141390] Tainted: P O 4.13.13-5-pve #1
[95337.141394] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[95337.141399] bstore_kv_sync D 0 2185 1 0x00000000
[95337.141401] Call Trace:
[95337.141407] __schedule+0x3cc/0x850
[95337.141409] schedule+0x36/0x80
[95337.141410] schedule_timeout+0x1da/0x350
[95337.141412] ? __blk_run_queue+0x3d/0x60
[95337.141414] ? blk_queue_bio+0x3d3/0x400
[95337.141415] io_schedule_timeout+0x1e/0x50
[95337.141416] ? io_schedule_timeout+0x1e/0x50
[95337.141417] wait_for_completion_io+0xb4/0x140
[95337.141418] ? wake_up_q+0x80/0x80
[95337.141420] submit_bio_wait+0x68/0x90
[95337.141421] blkdev_issue_flush+0x5c/0x90
[95337.141422] blkdev_fsync+0x35/0x50
[95337.141423] vfs_fsync_range+0x4b/0xb0
[95337.141424] do_fsync+0x3d/0x70
[95337.141425] SyS_fdatasync+0x13/0x20
[95337.141427] entry_SYSCALL_64_fastpath+0x33/0xa3
[95337.141428] RIP: 0033:0x7fa37558463d
[95337.141429] RSP: 002b:00007fa365129110 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
[95337.141430] RAX: ffffffffffffffda RBX: 00005601c7ab0d80 RCX: 00007fa37558463d
[95337.141431] RDX: 0a20c08100000000 RSI: 00007fa3651290f0 RDI: 0000000000000016
[95337.141431] RBP: 0a20c0815a7d85da R08: 00005601c7ab0ed0 R09: 000000000002e21d
[95337.141432] R10: 00007fa3651290d0 R11: 0000000000000293 R12: 00007fa37650b020
[95337.141432] R13: 00005601c7ab0ed0 R14: 00005601c7dbe000 R15: 0000000000000000
[95337.141461] INFO: task vgs:9632 blocked for more than 120 seconds.
[95337.141465] Tainted: P O 4.13.13-5-pve #1
[95337.141469] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[95337.141473] vgs D 0 9632 1952 0x00000000
[95337.141474] Call Trace:
[95337.141476] __schedule+0x3cc/0x850
[95337.141478] schedule+0x36/0x80
[95337.141479] io_schedule+0x16/0x40
[95337.141480] __blkdev_direct_IO_simple+0x1e7/0x320
[95337.141481] ? bdget+0x120/0x120
[95337.141482] blkdev_direct_IO+0x3e1/0x3f0
[95337.141483] ? blkdev_direct_IO+0x3e1/0x3f0
[95337.141485] ? __filemap_fdatawrite_range+0xd4/0x100
[95337.141486] generic_file_read_iter+0xcb/0x9e0
[95337.141487] ? generic_file_read_iter+0xcb/0x9e0
[95337.141489] ? do_filp_open+0xad/0x110
[95337.141490] ? _copy_to_user+0x2a/0x40
[95337.141491] ? cp_new_stat+0x152/0x180
[95337.141492] blkdev_read_iter+0x35/0x40
[95337.141494] new_sync_read+0xde/0x130
[95337.141495] __vfs_read+0x26/0x40
[95337.141496] vfs_read+0x96/0x130
[95337.141498] SyS_read+0x55/0xc0
[95337.141499] entry_SYSCALL_64_fastpath+0x33/0xa3
[95337.141500] RIP: 0033:0x7fb2ea2d7700
[95337.141500] RSP: 002b:00007fff10228278 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[95337.141501] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb2ea2d7700
[95337.141502] RDX: 0000000000001000 RSI: 000055fcab943000 RDI: 0000000000000004
[95337.141502] RBP: 00007fff102282d0 R08: 00007fb2ea596248 R09: 0000000000001000
[95337.141502] R10: 0000000000000080 R11: 0000000000000246 R12: 000055fcab943000
[95337.141503] R13: 0000000000000000 R14: 0000000000000004 R15: 00007fff10228330
[95359.118207] ata10: softreset failed (1st FIS failed)
[95359.118220] ata10: limiting SATA link speed to 1.5 Gbps
[95359.118221] ata10: hard resetting link
[95364.285325] ata10: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[95364.285330] ata10.00: link online but device misclassified
[95374.517311] ata10.00: qc timeout (cmd 0xec)
[95374.517321] ata10.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[95374.517322] ata10.00: revalidation failed (errno=-5)
[95374.517334] ata10: hard resetting link
[95384.517482] ata10: softreset failed (1st FIS failed)
[95384.517492] ata10: hard resetting link
[95394.518306] ata10: softreset failed (1st FIS failed)
[95394.518315] ata10: hard resetting link
[95429.517872] ata10: softreset failed (1st FIS failed)
[95429.517883] ata10: hard resetting link
[95434.693187] ata10: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[95434.693192] ata10.00: link online but device misclassified
[95457.977159] INFO: task bstore_kv_sync:2185 blocked for more than 120 seconds.
[95457.977169] Tainted: P O 4.13.13-5-pve #1
[95457.977173] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[95457.977177] bstore_kv_sync D 0 2185 1 0x00000000
[95457.977180] Call Trace:
[95457.977186] __schedule+0x3cc/0x850
[95457.977188] schedule+0x36/0x80
[95457.977189] schedule_timeout+0x1da/0x350
[95457.977191] ? __blk_run_queue+0x3d/0x60
[95457.977192] ? blk_queue_bio+0x3d3/0x400
[95457.977193] io_schedule_timeout+0x1e/0x50
[95457.977194] ? io_schedule_timeout+0x1e/0x50
[95457.977195] wait_for_completion_io+0xb4/0x140
[95457.977197] ? wake_up_q+0x80/0x80
[95457.977198] submit_bio_wait+0x68/0x90
[95457.977200] blkdev_issue_flush+0x5c/0x90
[95457.977201] blkdev_fsync+0x35/0x50
[95457.977202] vfs_fsync_range+0x4b/0xb0
[95457.977203] do_fsync+0x3d/0x70
[95457.977204] SyS_fdatasync+0x13/0x20
[95457.977205] entry_SYSCALL_64_fastpath+0x33/0xa3
[95457.977207] RIP: 0033:0x7fa37558463d
[95457.977208] RSP: 002b:00007fa365129110 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
[95457.977209] RAX: ffffffffffffffda RBX: 00005601c7ab0d80 RCX: 00007fa37558463d
[95457.977210] RDX: 0a20c08100000000 RSI: 00007fa3651290f0 RDI: 0000000000000016
[95457.977210] RBP: 0a20c0815a7d85da R08: 00005601c7ab0ed0 R09: 000000000002e21d
[95457.977211] R10: 00007fa3651290d0 R11: 0000000000000293 R12: 00007fa37650b020
[95457.977211] R13: 00005601c7ab0ed0 R14: 00005601c7dbe000 R15: 0000000000000000
[95457.977241] INFO: task vgs:9632 blocked for more than 120 seconds.
[95457.977245] Tainted: P O 4.13.13-5-pve #1
[95457.977248] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[95457.977252] vgs D 0 9632 1952 0x00000000
[95457.977253] Call Trace:
[95457.977258] __schedule+0x3cc/0x850
[95457.977260] schedule+0x36/0x80
[95457.977261] io_schedule+0x16/0x40
[95457.977262] __blkdev_direct_IO_simple+0x1e7/0x320
[95457.977263] ? bdget+0x120/0x120
[95457.977264] blkdev_direct_IO+0x3e1/0x3f0
[95457.977265] ? blkdev_direct_IO+0x3e1/0x3f0
[95457.977267] ? __filemap_fdatawrite_range+0xd4/0x100
[95457.977268] generic_file_read_iter+0xcb/0x9e0
[95457.977269] ? generic_file_read_iter+0xcb/0x9e0
[95457.977271] ? do_filp_open+0xad/0x110
[95457.977272] ? _copy_to_user+0x2a/0x40
[95457.977273] ? cp_new_stat+0x152/0x180
[95457.977274] blkdev_read_iter+0x35/0x40
[95457.977275] new_sync_read+0xde/0x130
[95457.977277] __vfs_read+0x26/0x40
[95457.977278] vfs_read+0x96/0x130
[95457.977279] SyS_read+0x55/0xc0
[95457.977281] entry_SYSCALL_64_fastpath+0x33/0xa3
[95457.977281] RIP: 0033:0x7fb2ea2d7700
[95457.977282] RSP: 002b:00007fff10228278 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[95457.977283] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb2ea2d7700
[95457.977283] RDX: 0000000000001000 RSI: 000055fcab943000 RDI: 0000000000000004
[95457.977284] RBP: 00007fff102282d0 R08: 00007fb2ea596248 R09: 0000000000001000
[95457.977284] R10: 0000000000000080 R11: 0000000000000246 R12: 000055fcab943000
[95457.977285] R13: 0000000000000000 R14: 0000000000000004 R15: 00007fff10228330
[95466.165227] ata10.00: qc timeout (cmd 0xec)
[95466.165237] ata10.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[95466.165238] ata10.00: revalidation failed (errno=-5)
[95466.165246] ata10.00: disabled
[95466.165258] ata10: hard resetting link
[95476.165476] ata10: softreset failed (1st FIS failed)
[95476.165486] ata10: hard resetting link
[95486.165106] ata10: softreset failed (1st FIS failed)
[95486.165115] ata10: hard resetting link
[95521.165018] ata10: softreset failed (1st FIS failed)
[95521.165031] ata10: hard resetting link
[95526.381011] ata10: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[95526.381015] ata10.00: link online but device misclassified
[95526.381026] ata10: EH complete
[95526.381042] sd 9:0:0:0: [sdd] tag#19 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95526.381044] sd 9:0:0:0: [sdd] tag#19 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
[95526.381048] print_req_error: I/O error, dev sdd, sector 0
[95526.381085] sd 9:0:0:0: [sdd] tag#20 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95526.381086] sd 9:0:0:0: [sdd] tag#20 CDB: Read(16) 88 00 00 00 00 00 00 00 10 00 00 00 00 08 00 00
[95526.381087] print_req_error: I/O error, dev sdd, sector 4096
[95526.381091] sd 9:0:0:0: [sdd] tag#21 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95526.381092] sd 9:0:0:0: [sdd] tag#21 CDB: ATA command pass through(16) 85 06 2c 00 00 00 00 00 00 00 00 00 00 00 e5 00
[95526.381198] sd 9:0:0:0: [sdd] tag#23 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95526.381200] sd 9:0:0:0: [sdd] tag#23 CDB: Read(16) 88 00 00 00 00 01 d1 c0 af 80 00 00 00 08 00 00
[95526.381202] print_req_error: I/O error, dev sdd, sector 7814033280
[95526.381258] sd 9:0:0:0: [sdd] tag#24 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95526.381260] sd 9:0:0:0: [sdd] tag#24 CDB: Read(16) 88 00 00 00 00 01 d1 c0 af f0 00 00 00 08 00 00
[95526.381261] print_req_error: I/O error, dev sdd, sector 7814033392
[95526.381328] sd 9:0:0:0: [sdd] tag#25 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95526.381330] sd 9:0:0:0: [sdd] tag#25 CDB: Read(16) 88 00 00 00 00 00 00 00 10 00 00 00 00 08 00 00
[95526.381331] print_req_error: I/O error, dev sdd, sector 4096
[95526.381979] sd 9:0:0:0: [sdd] tag#26 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95526.381980] sd 9:0:0:0: [sdd] tag#26 CDB: Read(16) 88 00 00 00 00 00 00 00 10 08 00 00 00 08 00 00
[95526.381981] print_req_error: I/O error, dev sdd, sector 4104
[95526.382459] sd 9:0:0:0: [sdd] tag#27 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95526.382460] sd 9:0:0:0: [sdd] tag#27 CDB: Read(16) 88 00 00 00 00 00 00 00 10 00 00 00 00 08 00 00
[95526.382461] print_req_error: I/O error, dev sdd, sector 4096
[95526.393092] sd 9:0:0:0: [sdd] tag#29 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95526.393094] sd 9:0:0:0: [sdd] tag#29 CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
[95526.393095] print_req_error: I/O error, dev sdd, sector 0
[95526.393548] sd 9:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95526.393549] sd 9:0:0:0: [sdd] tag#0 CDB: Read(16) 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00
[95526.393550] print_req_error: I/O error, dev sdd, sector 7814036992
[95526.393964] print_req_error: I/O error, dev sdd, sector 7814037152
[95526.524404] Buffer I/O error on dev dm-0, logical block 976753648, async page read
[95526.525126] Buffer I/O error on dev dm-0, logical block 976753648, async page read
[95536.559431] scsi_io_completion: 64 callbacks suppressed
[95536.559434] sd 9:0:0:0: [sdd] tag#28 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95536.559437] sd 9:0:0:0: [sdd] tag#28 CDB: Read(16) 88 00 00 00 00 00 00 00 10 00 00 00 00 08 00 00
[95536.559437] print_req_error: 63 callbacks suppressed
[95536.559438] print_req_error: I/O error, dev sdd, sector 4096
[95536.559881] sd 9:0:0:0: [sdd] tag#29 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95536.559882] sd 9:0:0:0: [sdd] tag#29 CDB: Read(16) 88 00 00 00 00 01 d1 c0 af 80 00 00 00 08 00 00
[95536.559883] print_req_error: I/O error, dev sdd, sector 7814033280
[95536.560297] sd 9:0:0:0: [sdd] tag#30 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95536.560299] sd 9:0:0:0: [sdd] tag#30 CDB: Read(16) 88 00 00 00 00 01 d1 c0 af f0 00 00 00 08 00 00
[95536.560300] print_req_error: I/O error, dev sdd, sector 7814033392
[95536.560701] sd 9:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95536.560702] sd 9:0:0:0: [sdd] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 00 10 00 00 00 00 08 00 00
[95536.560703] print_req_error: I/O error, dev sdd, sector 4096
[95536.561130] sd 9:0:0:0: [sdd] tag#1 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95536.561131] sd 9:0:0:0: [sdd] tag#1 CDB: Read(16) 88 00 00 00 00 00 00 00 10 08 00 00 00 08 00 00
[95536.561132] print_req_error: I/O error, dev sdd, sector 4104
[95536.561511] sd 9:0:0:0: [sdd] tag#2 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95536.561513] sd 9:0:0:0: [sdd] tag#2 CDB: Read(16) 88 00 00 00 00 00 00 00 10 00 00 00 00 08 00 00
[95536.561513] print_req_error: I/O error, dev sdd, sector 4096
[95536.579212] sd 9:0:0:0: [sdd] tag#4 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95536.579214] sd 9:0:0:0: [sdd] tag#4 CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
[95536.579215] print_req_error: I/O error, dev sdd, sector 0
[95536.579585] sd 9:0:0:0: [sdd] tag#6 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95536.579587] sd 9:0:0:0: [sdd] tag#6 CDB: Read(16) 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00
[95536.579588] print_req_error: I/O error, dev sdd, sector 7814036992
[95536.579938] sd 9:0:0:0: [sdd] tag#7 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95536.579939] sd 9:0:0:0: [sdd] tag#7 CDB: Read(16) 88 00 00 00 00 01 d1 c0 be a0 00 00 00 08 00 00
[95536.579940] print_req_error: I/O error, dev sdd, sector 7814037152
[95536.580283] sd 9:0:0:0: [sdd] tag#8 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95536.580285] sd 9:0:0:0: [sdd] tag#8 CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
[95536.580286] print_req_error: I/O error, dev sdd, sector 0
[95546.578153] scsi_io_completion: 24 callbacks suppressed
[95546.578158] sd 9:0:0:0: [sdd] tag#13 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95546.578160] sd 9:0:0:0: [sdd] tag#13 CDB: Read(16) 88 00 00 00 00 00 00 00 10 00 00 00 00 20 00 00
[95546.578160] print_req_error: 24 callbacks suppressed
[95546.578161] print_req_error: I/O error, dev sdd, sector 4096
[95546.578520] sd 9:0:0:0: [sdd] tag#14 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95546.578522] sd 9:0:0:0: [sdd] tag#14 CDB: Read(16) 88 00 00 00 00 00 00 00 10 00 00 00 00 08 00 00
[95546.578523] print_req_error: I/O error, dev sdd, sector 4096
[95546.578894] Buffer I/O error on dev dm-0, logical block 0, async page read
[95546.719591] sd 9:0:0:0: [sdd] tag#15 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95546.719593] sd 9:0:0:0: [sdd] tag#15 CDB: Read(16) 88 00 00 00 00 00 00 00 10 00 00 00 00 08 00 00
[95546.719594] print_req_error: I/O error, dev sdd, sector 4096
[95546.719973] sd 9:0:0:0: [sdd] tag#16 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95546.719975] sd 9:0:0:0: [sdd] tag#16 CDB: Read(16) 88 00 00 00 00 01 d1 c0 af 80 00 00 00 08 00 00
[95546.719975] print_req_error: I/O error, dev sdd, sector 7814033280
[95546.720338] sd 9:0:0:0: [sdd] tag#17 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95546.720340] sd 9:0:0:0: [sdd] tag#17 CDB: Read(16) 88 00 00 00 00 01 d1 c0 af f0 00 00 00 08 00 00
[95546.720341] print_req_error: I/O error, dev sdd, sector 7814033392
[95546.720748] sd 9:0:0:0: [sdd] tag#18 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95546.720749] sd 9:0:0:0: [sdd] tag#18 CDB: Read(16) 88 00 00 00 00 00 00 00 10 00 00 00 00 08 00 00
[95546.720750] print_req_error: I/O error, dev sdd, sector 4096
[95546.721076] sd 9:0:0:0: [sdd] tag#19 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95546.721078] sd 9:0:0:0: [sdd] tag#19 CDB: Read(16) 88 00 00 00 00 00 00 00 10 08 00 00 00 08 00 00
[95546.721079] print_req_error: I/O error, dev sdd, sector 4104
[95546.721399] sd 9:0:0:0: [sdd] tag#20 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95546.721401] sd 9:0:0:0: [sdd] tag#20 CDB: Read(16) 88 00 00 00 00 00 00 00 10 00 00 00 00 08 00 00
[95546.721401] print_req_error: I/O error, dev sdd, sector 4096
[95546.731599] sd 9:0:0:0: [sdd] tag#21 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95546.731601] sd 9:0:0:0: [sdd] tag#21 CDB: Read(16) 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
[95546.731602] print_req_error: I/O error, dev sdd, sector 0
[95546.731918] sd 9:0:0:0: [sdd] tag#23 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95546.731920] sd 9:0:0:0: [sdd] tag#23 CDB: Read(16) 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00
[95546.731921] print_req_error: I/O error, dev sdd, sector 7814036992
[95556.834306] scsi_io_completion: 26 callbacks suppressed
[95556.834309] sd 9:0:0:0: [sdd] tag#0 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95556.834312] sd 9:0:0:0: [sdd] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 00 10 00 00 00 00 08 00 00
[95556.834312] print_req_error: 26 callbacks suppressed
[95556.834313] print_req_error: I/O error, dev sdd, sector 4096
[95556.834614] sd 9:0:0:0: [sdd] tag#1 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95556.834615] sd 9:0:0:0: [sdd] tag#1 CDB: Read(16) 88 00 00 00 00 01 d1 c0 af 80 00 00 00 08 00 00
[95556.834616] print_req_error: I/O error, dev sdd, sector 7814033280
[95556.834916] sd 9:0:0:0: [sdd] tag#2 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[95556.834918] sd 9:0:0:0: [sdd] tag#2 CDB: Read(16) 88 00 00 00 00 01 d1 c0 af f0 00 00 00 08 00 00
[95556.834919] print_req_error: I/O error, dev sdd, sector 7814033392
 
Might be something with the hdd. The proxmox UI does not show smart values for the failed disk. It displays UNKNOWN.
If I click on "Show S.M.A.R.T. values" then I receive: "Error getting S.M.A.R.T. data: Exit code: 2 (500)"
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!