By disabling or deleting the storage, this problem is solved.but yesterday I noticed that the barras under Datacenter -> Storage was still enabled
❯ zpool status -v
pool: barras
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 0B in 00:08:05 with 0 errors on Sun May 8 00:32:06 2022
config:
NAME STATE READ WRITE CKSUM
barras ONLINE 0 0 0
sdc ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
<0x85c5>:<0x185c5>
pool: wolfs
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: scrub repaired 0B in 07:46:25 with 0 errors on Sun May 8 08:10:28 2022
config:
NAME STATE READ WRITE CKSUM
wolfs ONLINE 0 0 0
wwn-0x5000c500c931825e ONLINE 0 0 0
wwn-0x5000c500c9316572 ONLINE 0 0 0
errors: No known data errors
zpool scrub
Jun 04 07:22:35 proxmox-media kernel: ata5.00: exception Emask 0x0 SAct 0xe604008 SErr 0x0 action 0x0
Jun 04 07:22:35 proxmox-media kernel: ata5.00: irq_stat 0x40000008
Jun 04 07:22:35 proxmox-media kernel: ata5.00: failed command: READ FPDMA QUEUED
Jun 04 07:22:35 proxmox-media kernel: ata5.00: cmd 60/00:a8:50:7c:4f/02:00:0c:00:00/40 tag 21 ncq dma 262144 in
res 43/40:00:30:7d:4f/00:02:0c:00:00/00 Emask 0x409 (media error) <F>
Jun 04 07:22:35 proxmox-media kernel: ata5.00: status: { DRDY SENSE ERR }
Jun 04 07:22:35 proxmox-media kernel: ata5.00: error: { UNC }
Jun 04 07:22:35 proxmox-media kernel: ata5.00: configured for UDMA/133
Jun 04 07:22:35 proxmox-media kernel: sd 4:0:0:0: [sdc] tag#21 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Jun 04 07:22:35 proxmox-media kernel: sd 4:0:0:0: [sdc] tag#21 Sense Key : Medium Error [current]
Jun 04 07:22:35 proxmox-media kernel: sd 4:0:0:0: [sdc] tag#21 Add. Sense: Unrecovered read error - auto reallocate failed
Jun 04 07:22:35 proxmox-media kernel: sd 4:0:0:0: [sdc] tag#21 CDB: Read(10) 28 00 0c 4f 7c 50 00 02 00 00
Jun 04 07:22:35 proxmox-media kernel: blk_update_request: I/O error, dev sdc, sector 206535984 op 0x0:(READ) flags 0x700 phys_seg 3 prio class 0
Jun 04 07:22:35 proxmox-media kernel: zio pool=barras vdev=/dev/sdc1 error=5 type=1 offset=105745260544 size=262144 flags=40080c80
Jun 04 07:22:35 proxmox-media kernel: ata5: EH complete
Jun 04 07:22:35 proxmox-media zed[2099921]: eid=28 class=io pool='barras' vdev=sdc1 size=262144 offset=105745260544 priority=2 err=5 flags=0x40080c80 delay=217ms
Jun 04 07:22:35 proxmox-media kernel: ata5.00: exception Emask 0x0 SAct 0x900900 SErr 0x0 action 0x0
Jun 04 07:22:35 proxmox-media kernel: ata5.00: irq_stat 0x40000008
Jun 04 07:22:35 proxmox-media kernel: ata5.00: failed command: READ FPDMA QUEUED
Jun 04 07:22:35 proxmox-media kernel: ata5.00: cmd 60/00:40:50:7c:4f/01:00:0c:00:00/40 tag 8 ncq dma 131072 in
res 43/40:00:30:7d:4f/00:01:0c:00:00/00 Emask 0x409 (media error) <F>
Jun 04 07:22:35 proxmox-media kernel: ata5.00: status: { DRDY SENSE ERR }
Jun 04 07:22:35 proxmox-media kernel: ata5.00: error: { UNC }
Jun 04 07:22:35 proxmox-media kernel: ata5.00: configured for UDMA/133
Jun 04 07:22:35 proxmox-media kernel: sd 4:0:0:0: [sdc] tag#8 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Jun 04 07:22:35 proxmox-media kernel: sd 4:0:0:0: [sdc] tag#8 Sense Key : Medium Error [current]
Jun 04 07:22:35 proxmox-media kernel: sd 4:0:0:0: [sdc] tag#8 Add. Sense: Unrecovered read error - auto reallocate failed
Jun 04 07:22:35 proxmox-media kernel: sd 4:0:0:0: [sdc] tag#8 CDB: Read(10) 28 00 0c 4f 7c 50 00 01 00 00
Jun 04 07:22:35 proxmox-media kernel: blk_update_request: I/O error, dev sdc, sector 206535984 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jun 04 07:22:35 proxmox-media kernel: zio pool=barras vdev=/dev/sdc1 error=5 type=1 offset=105745260544 size=131072 flags=188981
Jun 04 07:22:35 proxmox-media kernel: ata5: EH complete
Jun 04 07:22:35 proxmox-media zed[2099968]: eid=29 class=io pool='barras' vdev=sdc1 size=131072 offset=105745260544 priority=0 err=5 flags=0x188981 delay=232ms bookmark=34245:99781:0:290
Jun 04 07:22:35 proxmox-media zed[2099970]: eid=30 class=data pool='barras' priority=0 err=5 flags=0x8981 bookmark=34245:99781:0:290
102: 2022-06-13 03:01:56 INFO: 26% (284.6 GiB of 1.1 TiB) in 10h 8m 8s, read: 2.3 MiB/s, write: 2.3 MiB/s
102: 2022-06-13 03:01:56 ERROR: backup write data failed: command error: protocol canceled
102: 2022-06-13 03:01:56 INFO: aborting backup job
102: 2022-06-13 03:01:56 INFO: resuming VM again
102: 2022-06-13 03:01:56 ERROR: Backup of VM 102 failed - backup write data failed: command error: protocol canceled
INFO: 53% (583.6 GiB of 1.1 TiB) in 7h 2m 56s, read: 5.0 MiB/s, write: 4.1 MiB/s
ERROR: backup write data failed: command error: write_data upload error: broken pipe
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 102 failed - backup write data failed: command error: write_data upload error: broken pipe
INFO: Failed at 2022-06-26 07:19:48
[73836.893377] systemd[1]: systemd-journald.service: State 'stop-sigterm' timed out. Killing.
[73836.893518] systemd[1]: systemd-journald.service: Killing process 427185 (systemd-journal) with signal SIGKILL.
[73836.893652] systemd[1]: systemd-journald.service: Killing process 294 (systemd-journal) with signal SIGKILL.
[73836.893796] systemd[1]: systemd-journald.service: Killing process 427116 (journal-offline) with signal SIGKILL.
[73860.767621] sd 2:0:0:0: [sda] tag#201 timing out command, waited 180s
[73860.767666] sd 2:0:0:0: [sda] tag#201 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=913s
[73860.767674] sd 2:0:0:0: [sda] tag#201 Sense Key : Aborted Command [current]
[73860.767675] sd 2:0:0:1: [sdb] tag#9 timing out command, waited 180s
[73860.767681] sd 2:0:0:0: [sda] tag#201 Add. Sense: I/O process terminated
[73860.767688] sd 2:0:0:0: [sda] tag#201 CDB: Write(10) 2a 00 02 f9 e0 00 00 00 10 00
[73860.767694] sd 2:0:0:1: [sdb] tag#9 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=889s
[73860.767693] I/O error, dev sda, sector 49930240 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0
[73860.767699] sd 2:0:0:1: [sdb] tag#9 Sense Key : Aborted Command [current]
[73860.767702] sd 2:0:0:1: [sdb] tag#9 Add. Sense: I/O process terminated
[73860.767707] sd 2:0:0:1: [sdb] tag#9 CDB: Write(10) 2a 00 62 49 40 00 00 01 68 00
[73860.767709] I/O error, dev sdb, sector 1648967680 op 0x1:(WRITE) flags 0x0 phys_seg 45 prio class 0
[73860.767767] EXT4-fs warning (device dm-3): ext4_end_bio:343: I/O error 10 writing to inode 51555441 starting block 206116608)
[73860.767785] sd 2:0:0:1: [sdb] tag#17 timing out command, waited 180s
[73860.767789] sd 2:0:0:1: [sdb] tag#17 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=858s
[73860.767792] sd 2:0:0:1: [sdb] tag#17 Sense Key : Aborted Command [current]
[73860.767794] sd 2:0:0:1: [sdb] tag#17 Add. Sense: I/O process terminated
[73860.767797] sd 2:0:0:1: [sdb] tag#17 CDB: Write(10) 2a 00 67 81 8f b0 00 00 08 00
[73860.767799] I/O error, dev sdb, sector 1736544176 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0
[73860.767807] Buffer I/O error on dev dm-3, logical block 217063670, lost async page write
[73860.767887] EXT4-fs error (device dm-3): ext4_check_bdev_write_error:217: comm kworker/u8:4: Error while async write back metadata
[73860.767937] Buffer I/O error on device dm-3, logical block 206116608
[73860.767942] Buffer I/O error on device dm-3, logical block 206116609
[73860.767943] Buffer I/O error on device dm-3, logical block 206116610
[73860.767945] Buffer I/O error on device dm-3, logical block 206116611
[73860.767947] Buffer I/O error on device dm-3, logical block 206116612
[73860.767948] Buffer I/O error on device dm-3, logical block 206116613
[73860.767950] Buffer I/O error on device dm-3, logical block 206116614
[73860.767951] Buffer I/O error on device dm-3, logical block 206116615
[73860.767952] Buffer I/O error on device dm-3, logical block 206116616
[73860.767954] Buffer I/O error on device dm-3, logical block 206116617
[73860.775235] sd 2:0:0:0: [sda] tag#202 timing out command, waited 180s
[73860.775262] sd 2:0:0:0: [sda] tag#202 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=913s
[73860.775267] sd 2:0:0:0: [sda] tag#202 Sense Key : Aborted Command [current]
[73860.775271] sd 2:0:0:0: [sda] tag#202 Add. Sense: I/O process terminated
[73860.775276] sd 2:0:0:0: [sda] tag#202 CDB: Write(10) 2a 00 02 f9 e0 18 00 00 08 00
[73860.775279] I/O error, dev sda, sector 49930264 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[73860.778984] sd 2:0:0:0: [sda] tag#204 timing out command, waited 180s
[73860.779010] sd 2:0:0:0: [sda] tag#204 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=913s
[73860.779033] sd 2:0:0:0: [sda] tag#204 Sense Key : Aborted Command [current]
[73860.779037] sd 2:0:0:0: [sda] tag#204 Add. Sense: I/O process terminated
[73860.779042] sd 2:0:0:0: [sda] tag#204 CDB: Write(10) 2a 00 02 f9 e0 20 00 00 08 00
[73860.779045] I/O error, dev sda, sector 49930272 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[73860.781260] sd 2:0:0:0: [sda] tag#203 timing out command, waited 180s
[73860.781281] sd 2:0:0:0: [sda] tag#203 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=913s
[73860.781285] sd 2:0:0:0: [sda] tag#203 Sense Key : Aborted Command [current]
[73860.781287] sd 2:0:0:0: [sda] tag#203 Add. Sense: I/O process terminated
[73860.781290] sd 2:0:0:0: [sda] tag#203 CDB: Write(10) 2a 00 02 f9 e0 10 00 00 08 00
[73860.781292] I/O error, dev sda, sector 49930256 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[73860.784623] sd 2:0:0:0: [sda] tag#205 timing out command, waited 180s
[73860.784640] sd 2:0:0:0: [sda] tag#205 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=913s
[73860.784642] sd 2:0:0:0: [sda] tag#205 Sense Key : Aborted Command [current]
[73860.784644] sd 2:0:0:0: [sda] tag#205 Add. Sense: I/O process terminated
[73860.784647] sd 2:0:0:0: [sda] tag#205 CDB: Write(10) 2a 00 02 f9 e0 28 00 00 08 00
[73860.784649] I/O error, dev sda, sector 49930280 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[73860.784745] Aborting journal on device dm-2-8.
[73860.784756] EXT4-fs error (device dm-2) in ext4_reserve_inode_write:5748: Journal has aborted
[73860.784762] EXT4-fs error (device dm-2): __ext4_ext_dirty:183: inode #1574110: comm kworker/u8:0: mark_inode_dirty error
[73860.784834] EXT4-fs error (device dm-2) in ext4_mb_clear_bb:6076: Journal has aborted
[73860.784853] EXT4-fs (dm-2): Delayed block allocation failed for inode 1049937 at logical offset 3 with max blocks 2 with error 30
[73860.784865] EXT4-fs (dm-2): Delayed block allocation failed for inode 1574110 at logical offset 0 with max blocks 2 with error 30
[73860.784892] EXT4-fs (dm-2): This should not happen!! Data will be lost
[73860.785549] EXT4-fs (dm-2): This should not happen!! Data will be lost
[73860.785937] EXT4-fs error (device dm-2) in ext4_writepages:2817: Journal has aborted
[73860.786004] EXT4-fs error (device dm-2): ext4_journal_check_start:83: comm kworker/u8:3: Detected aborted journal
[73860.786028] EXT4-fs error (device dm-2) in ext4_writepages:2817: Journal has aborted
[73860.786189] EXT4-fs error (device dm-2): __ext4_ext_dirty:183: inode #1582379: comm dockerd: mark_inode_dirty error
[73860.786256] EXT4-fs error (device dm-2) in ext4_mb_clear_bb:6076: Journal has aborted
[73860.786301] EXT4-fs error (device dm-2) in ext4_writepages:2817: Journal has aborted
[73860.786412] EXT4-fs error (device dm-2): ext4_journal_check_start:83: comm dockerd: Detected aborted journal
[73860.822626] EXT4-fs (dm-2): Remounting filesystem read-only
[73860.822679] EXT4-fs (dm-2): ext4_writepages: jbd2_start: 1024 pages, ino 1048907; err -30
[73860.829903] EXT4-fs (dm-2): failed to convert unwritten extents to written extents -- potential data loss! (inode 1048659, error -30)
[73860.830024] EXT4-fs (dm-2): failed to convert unwritten extents to written extents -- potential data loss! (inode 1050465, error -30)
[73860.830085] EXT4-fs (dm-2): failed to convert unwritten extents to written extents -- potential data loss! (inode 1048743, error -30)
[73860.830137] EXT4-fs (dm-2): failed to convert unwritten extents to written extents -- potential data loss! (inode 1055490, error -30)
[73860.836021] systemd[1]: systemd-journald.service: Main process exited, code=killed, status=9/KILL
[73860.846006] systemd[1]: systemd-journald.service: Failed with result 'timeout'.
[73860.846437] systemd[1]: Failed to start Journal Service.
[73860.847652] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 2.
[73860.847819] systemd-coredump[427203]: Failed to get COMM: No such process
[73860.848017] systemd[1]: Stopped Journal Service.
[73860.848383] audit: type=1130 audit(1656184788.832:725): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-journald comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
[73860.848390] audit: type=1130 audit(1656184788.832:726): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-journald comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[73860.848396] audit: type=1131 audit(1656184788.832:727): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-journald comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[73860.875678] systemd[1]: Starting Journal Service...
[73860.875874] audit: type=1334 audit(1656184788.859:728): prog-id=55 op=LOAD
[73860.875879] audit: type=1334 audit(1656184788.859:729): prog-id=56 op=LOAD
[73860.875882] audit: type=1334 audit(1656184788.859:730): prog-id=57 op=LOAD
[73860.876020] audit: type=1334 audit(1656184788.859:731): prog-id=0 op=UNLOAD
[73860.876022] audit: type=1334 audit(1656184788.859:732): prog-id=0 op=UNLOAD
[73860.876025] audit: type=1334 audit(1656184788.859:733): prog-id=0 op=UNLOAD
[73860.895312] audit: type=1305 audit(1656184788.879:734): op=set audit_enabled=1 old=1 auid=4294967295 ses=4294967295 res=1
[73860.923360] systemd[1]: Started Journal Service.
[73866.395298] kauditd_printk_skb: 15 callbacks suppressed
Sounds like the drive is dying. I'd suggest not using it anymore and salvaging the data on (e.g. usingOkay, it just happend again.
But this time, it happend on the other backup storage.
Backup logs:
Code:INFO: 53% (583.6 GiB of 1.1 TiB) in 7h 2m 56s, read: 5.0 MiB/s, write: 4.1 MiB/s ERROR: backup write data failed: command error: write_data upload error: broken pipe INFO: aborting backup job INFO: resuming VM again ERROR: Backup of VM 102 failed - backup write data failed: command error: write_data upload error: broken pipe INFO: Failed at 2022-06-26 07:19:48
dmesg logs:
Code:[73836.893377] systemd[1]: systemd-journald.service: State 'stop-sigterm' timed out. Killing. [73836.893518] systemd[1]: systemd-journald.service: Killing process 427185 (systemd-journal) with signal SIGKILL. [73836.893652] systemd[1]: systemd-journald.service: Killing process 294 (systemd-journal) with signal SIGKILL. [73836.893796] systemd[1]: systemd-journald.service: Killing process 427116 (journal-offline) with signal SIGKILL. [73860.767621] sd 2:0:0:0: [sda] tag#201 timing out command, waited 180s [73860.767666] sd 2:0:0:0: [sda] tag#201 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=913s [73860.767674] sd 2:0:0:0: [sda] tag#201 Sense Key : Aborted Command [current] [73860.767675] sd 2:0:0:1: [sdb] tag#9 timing out command, waited 180s [73860.767681] sd 2:0:0:0: [sda] tag#201 Add. Sense: I/O process terminated [73860.767688] sd 2:0:0:0: [sda] tag#201 CDB: Write(10) 2a 00 02 f9 e0 00 00 00 10 00 [73860.767694] sd 2:0:0:1: [sdb] tag#9 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=889s [73860.767693] I/O error, dev sda, sector 49930240 op 0x1:(WRITE) flags 0x800 phys_seg 2 prio class 0 [73860.767699] sd 2:0:0:1: [sdb] tag#9 Sense Key : Aborted Command [current] [73860.767702] sd 2:0:0:1: [sdb] tag#9 Add. Sense: I/O process terminated [73860.767707] sd 2:0:0:1: [sdb] tag#9 CDB: Write(10) 2a 00 62 49 40 00 00 01 68 00 [73860.767709] I/O error, dev sdb, sector 1648967680 op 0x1:(WRITE) flags 0x0 phys_seg 45 prio class 0 [73860.767767] EXT4-fs warning (device dm-3): ext4_end_bio:343: I/O error 10 writing to inode 51555441 starting block 206116608) [73860.767785] sd 2:0:0:1: [sdb] tag#17 timing out command, waited 180s [73860.767789] sd 2:0:0:1: [sdb] tag#17 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=858s [73860.767792] sd 2:0:0:1: [sdb] tag#17 Sense Key : Aborted Command [current] [73860.767794] sd 2:0:0:1: [sdb] tag#17 Add. Sense: I/O process terminated [73860.767797] sd 2:0:0:1: [sdb] tag#17 CDB: Write(10) 2a 00 67 81 8f b0 00 00 08 00 [73860.767799] I/O error, dev sdb, sector 1736544176 op 0x1:(WRITE) flags 0x103000 phys_seg 1 prio class 0 [73860.767807] Buffer I/O error on dev dm-3, logical block 217063670, lost async page write [73860.767887] EXT4-fs error (device dm-3): ext4_check_bdev_write_error:217: comm kworker/u8:4: Error while async write back metadata [73860.767937] Buffer I/O error on device dm-3, logical block 206116608 [73860.767942] Buffer I/O error on device dm-3, logical block 206116609 [73860.767943] Buffer I/O error on device dm-3, logical block 206116610 [73860.767945] Buffer I/O error on device dm-3, logical block 206116611 [73860.767947] Buffer I/O error on device dm-3, logical block 206116612 [73860.767948] Buffer I/O error on device dm-3, logical block 206116613 [73860.767950] Buffer I/O error on device dm-3, logical block 206116614 [73860.767951] Buffer I/O error on device dm-3, logical block 206116615 [73860.767952] Buffer I/O error on device dm-3, logical block 206116616 [73860.767954] Buffer I/O error on device dm-3, logical block 206116617 [73860.775235] sd 2:0:0:0: [sda] tag#202 timing out command, waited 180s [73860.775262] sd 2:0:0:0: [sda] tag#202 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=913s [73860.775267] sd 2:0:0:0: [sda] tag#202 Sense Key : Aborted Command [current] [73860.775271] sd 2:0:0:0: [sda] tag#202 Add. Sense: I/O process terminated [73860.775276] sd 2:0:0:0: [sda] tag#202 CDB: Write(10) 2a 00 02 f9 e0 18 00 00 08 00 [73860.775279] I/O error, dev sda, sector 49930264 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0 [73860.778984] sd 2:0:0:0: [sda] tag#204 timing out command, waited 180s [73860.779010] sd 2:0:0:0: [sda] tag#204 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=913s [73860.779033] sd 2:0:0:0: [sda] tag#204 Sense Key : Aborted Command [current] [73860.779037] sd 2:0:0:0: [sda] tag#204 Add. Sense: I/O process terminated [73860.779042] sd 2:0:0:0: [sda] tag#204 CDB: Write(10) 2a 00 02 f9 e0 20 00 00 08 00 [73860.779045] I/O error, dev sda, sector 49930272 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0 [73860.781260] sd 2:0:0:0: [sda] tag#203 timing out command, waited 180s [73860.781281] sd 2:0:0:0: [sda] tag#203 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=913s [73860.781285] sd 2:0:0:0: [sda] tag#203 Sense Key : Aborted Command [current] [73860.781287] sd 2:0:0:0: [sda] tag#203 Add. Sense: I/O process terminated [73860.781290] sd 2:0:0:0: [sda] tag#203 CDB: Write(10) 2a 00 02 f9 e0 10 00 00 08 00 [73860.781292] I/O error, dev sda, sector 49930256 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0 [73860.784623] sd 2:0:0:0: [sda] tag#205 timing out command, waited 180s [73860.784640] sd 2:0:0:0: [sda] tag#205 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=913s [73860.784642] sd 2:0:0:0: [sda] tag#205 Sense Key : Aborted Command [current] [73860.784644] sd 2:0:0:0: [sda] tag#205 Add. Sense: I/O process terminated [73860.784647] sd 2:0:0:0: [sda] tag#205 CDB: Write(10) 2a 00 02 f9 e0 28 00 00 08 00 [73860.784649] I/O error, dev sda, sector 49930280 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0 [73860.784745] Aborting journal on device dm-2-8. [73860.784756] EXT4-fs error (device dm-2) in ext4_reserve_inode_write:5748: Journal has aborted [73860.784762] EXT4-fs error (device dm-2): __ext4_ext_dirty:183: inode #1574110: comm kworker/u8:0: mark_inode_dirty error [73860.784834] EXT4-fs error (device dm-2) in ext4_mb_clear_bb:6076: Journal has aborted [73860.784853] EXT4-fs (dm-2): Delayed block allocation failed for inode 1049937 at logical offset 3 with max blocks 2 with error 30 [73860.784865] EXT4-fs (dm-2): Delayed block allocation failed for inode 1574110 at logical offset 0 with max blocks 2 with error 30 [73860.784892] EXT4-fs (dm-2): This should not happen!! Data will be lost [73860.785549] EXT4-fs (dm-2): This should not happen!! Data will be lost [73860.785937] EXT4-fs error (device dm-2) in ext4_writepages:2817: Journal has aborted [73860.786004] EXT4-fs error (device dm-2): ext4_journal_check_start:83: comm kworker/u8:3: Detected aborted journal [73860.786028] EXT4-fs error (device dm-2) in ext4_writepages:2817: Journal has aborted [73860.786189] EXT4-fs error (device dm-2): __ext4_ext_dirty:183: inode #1582379: comm dockerd: mark_inode_dirty error [73860.786256] EXT4-fs error (device dm-2) in ext4_mb_clear_bb:6076: Journal has aborted [73860.786301] EXT4-fs error (device dm-2) in ext4_writepages:2817: Journal has aborted [73860.786412] EXT4-fs error (device dm-2): ext4_journal_check_start:83: comm dockerd: Detected aborted journal [73860.822626] EXT4-fs (dm-2): Remounting filesystem read-only [73860.822679] EXT4-fs (dm-2): ext4_writepages: jbd2_start: 1024 pages, ino 1048907; err -30 [73860.829903] EXT4-fs (dm-2): failed to convert unwritten extents to written extents -- potential data loss! (inode 1048659, error -30) [73860.830024] EXT4-fs (dm-2): failed to convert unwritten extents to written extents -- potential data loss! (inode 1050465, error -30) [73860.830085] EXT4-fs (dm-2): failed to convert unwritten extents to written extents -- potential data loss! (inode 1048743, error -30) [73860.830137] EXT4-fs (dm-2): failed to convert unwritten extents to written extents -- potential data loss! (inode 1055490, error -30) [73860.836021] systemd[1]: systemd-journald.service: Main process exited, code=killed, status=9/KILL [73860.846006] systemd[1]: systemd-journald.service: Failed with result 'timeout'. [73860.846437] systemd[1]: Failed to start Journal Service. [73860.847652] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 2. [73860.847819] systemd-coredump[427203]: Failed to get COMM: No such process [73860.848017] systemd[1]: Stopped Journal Service. [73860.848383] audit: type=1130 audit(1656184788.832:725): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-journald comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed' [73860.848390] audit: type=1130 audit(1656184788.832:726): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-journald comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [73860.848396] audit: type=1131 audit(1656184788.832:727): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-journald comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [73860.875678] systemd[1]: Starting Journal Service... [73860.875874] audit: type=1334 audit(1656184788.859:728): prog-id=55 op=LOAD [73860.875879] audit: type=1334 audit(1656184788.859:729): prog-id=56 op=LOAD [73860.875882] audit: type=1334 audit(1656184788.859:730): prog-id=57 op=LOAD [73860.876020] audit: type=1334 audit(1656184788.859:731): prog-id=0 op=UNLOAD [73860.876022] audit: type=1334 audit(1656184788.859:732): prog-id=0 op=UNLOAD [73860.876025] audit: type=1334 audit(1656184788.859:733): prog-id=0 op=UNLOAD [73860.895312] audit: type=1305 audit(1656184788.879:734): op=set audit_enabled=1 old=1 auid=4294967295 ses=4294967295 res=1 [73860.923360] systemd[1]: Started Journal Service. [73866.395298] kauditd_printk_skb: 15 callbacks suppressed
ddrescue
) it while you still can.Smart Log for NVME device:nvme0 namespace-id:ffffffff
critical_warning : 0
temperature : 33 C
available_spare : 100%
available_spare_threshold : 5%
percentage_used : 2%
endurance group critical warning summary: 0
data_units_read : 115,701,789
data_units_written : 71,179,265
host_read_commands : 1,209,460,035
host_write_commands : 1,659,256,558
controller_busy_time : 12,462
power_cycles : 46
power_on_hours : 9,446
unsafe_shutdowns : 16
media_errors : 0
num_err_log_entries : 22
Warning Temperature Time : 0
Critical Composite Temperature Time : 0
Thermal Management T1 Trans Count : 0
Thermal Management T2 Trans Count : 0
Thermal Management T1 Total Time : 0
Thermal Management T2 Total Time : 0
dmesg
log did sound rather bad, there were errors for sdb
and sda
. I made another quick search and there was a similar report on the kernel mailing list. Were the above errors also during shutdown for you? The kernel developer also suggested replacing the drive, but since it never happened again, maybe it is/was some kernel bug after all? Regularly checking the drive health is a good idea nonetheless