ZFS Faulted Device - new SSDs

jrussell05

New Member
Dec 13, 2022
4
0
1
Hi All,

I've been wrestling with this problem for a while. I keep getting a degraded ZFS pool due to 'too many write' errors. The pool is a Raid 0 set of two SSDs that is dedicated to my Containers and VMs.

I've recently swapped the SSDs for Samsung PM893s and also replaced the SATA cables. Still with the same results.

Here's my current configuration:
  • Supermicro H11SSL-i with 64GB of memory
  • Epyc Processor: AMD EPYC 7551P 32-Core Processor
  • Latest BIOS
  • Two 100GB SSDs in Raid 1 Zpool for the boot and OS connected directly to SATA ports on MB
  • Two 480GB SSDs in Raid 1 Zpool for the KVM/LXC connected directly to SATA ports on MB
  • One 500GB HDD for DVR on Plex LXC connected directly to SATA ports on MB
  • One 500GB HDD for Proxmox KVM and LXC backup connected directly to SATA ports on MB
  • Three HDDs in PCI Passthrough to TrueNAS VM for NAS services connected directly to SATA ports on MB
Could this be case of the issue being with the SATA "controller"? I believe that there isn't a controller on the MB, and that the EPYC has a "built-in" interface.

Before I drop ~$150 on an LSI 9300 HBA card, are there any other configurations I could try? I suppose I could move away from ZFS, but I don't know enough about advantages/disadvantages of various Linux FS and Proxmox to make the right decision.

FYI, this is a home-lab system. Although it's not a "mission critical" system, if it goes down I start catching hell from the wife and kids.

Thanks in advance.
 
Last edited:
Are you passing just the disks or the sata controller through to the TrueNAS VM? Could be where your problem is perhaps?
 
  • Two 100GB SSDs in Raid 0 Zpool for the boot and OS connected directly to SATA ports on MB
  • Two 480GB SSDs in Raid 0 Zpool for the KVM/LXC connected directly to SATA ports on MB
Also, why a raid0?
1.) you really mean a raid0 (ZFS stripe) and I don't see the point doing that...
2.) you actually mean a raid1 (ZFS mirror)
3.) you are using pseudo HW raid1 of your mainboard and then use "single disk" (=raid array) ZFS raid0 because it is alread mirrored

In my opinion, 1 and 3 would be a bad choice.
 
Also, why a raid0?
1.) you really mean a raid0 (ZFS stripe) and I don't see the point doing that...
2.) you actually mean a raid1 (ZFS mirror)
3.) you are using pseudo HW raid1 of your mainboard and then use "single disk" (=raid array) ZFS raid0 because it is alread mirrored

In my opinion, 1 and 3 would be a bad choice.

Apologies. I meant raid1 (ZFS mirror).
 
There are groups of SATA ports that show up as different PCI controllers. I am only passing through the group shown by the green rectangle below. The LSI3008 doesn't exist on my version of the MB.


Screenshot 2023-04-03 175401.png
 

Attachments

  • Screenshot 2023-04-03 175401.png
    Screenshot 2023-04-03 175401.png
    95.8 KB · Views: 4
Last edited:
Here is the log output from the last time there was a fault:

Code:
Apr 01 20:12:32 pve-1 kernel: ata5.00: exception Emask 0x10 SAct 0x440 SErr 0x400000 action 0x6 frozen
Apr 01 20:12:32 pve-1 kernel: ata5.00: irq_stat 0x08000000, interface fatal error
Apr 01 20:12:32 pve-1 kernel: ata5: SError: { Handshk }
Apr 01 20:12:32 pve-1 kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 01 20:12:32 pve-1 kernel: ata5.00: cmd 61/60:30:58:e2:81/00:00:0b:00:00/40 tag 6 ncq dma 49152 out
         res 40/00:50:40:68:80/00:00:0b:00:00/40 Emask 0x10 (ATA bus error)
Apr 01 20:12:32 pve-1 kernel: ata5.00: status: { DRDY }
Apr 01 20:12:32 pve-1 kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 01 20:12:32 pve-1 kernel: ata5.00: cmd 61/40:50:40:68:80/00:00:0b:00:00/40 tag 10 ncq dma 32768 out
         res 40/00:50:40:68:80/00:00:0b:00:00/40 Emask 0x10 (ATA bus error)
Apr 01 20:12:32 pve-1 kernel: ata5.00: status: { DRDY }
Apr 01 20:12:32 pve-1 kernel: ata5: hard resetting link
Apr 01 20:12:32 pve-1 kernel: ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Apr 01 20:12:32 pve-1 kernel: ata5.00: supports DRM functions and may not be fully accessible
Apr 01 20:12:32 pve-1 kernel: ata5.00: supports DRM functions and may not be fully accessible
Apr 01 20:12:32 pve-1 kernel: ata5.00: configured for UDMA/133
Apr 01 20:12:32 pve-1 kernel: sd 4:0:0:0: [sdc] tag#6 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Apr 01 20:12:32 pve-1 kernel: sd 4:0:0:0: [sdc] tag#6 Sense Key : Illegal Request [current]
Apr 01 20:12:32 pve-1 kernel: sd 4:0:0:0: [sdc] tag#6 Add. Sense: Unaligned write command
Apr 01 20:12:32 pve-1 kernel: sd 4:0:0:0: [sdc] tag#6 CDB: Write(10) 2a 00 0b 81 e2 58 00 00 60 00
Apr 01 20:12:32 pve-1 kernel: blk_update_request: I/O error, dev sdc, sector 193061464 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
Apr 01 20:12:32 pve-1 kernel: zio pool=kvm_lxc2 vdev=/dev/disk/by-id/ata-SAMSUNG_MZ7L3480HCHQ-00A07_S664NA0T802016-part1 error=5 type=2 offset=98846420992 size=49152 flags=180880
Apr 01 20:12:32 pve-1 kernel: sd 4:0:0:0: [sdc] tag#10 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Apr 01 20:12:32 pve-1 kernel: sd 4:0:0:0: [sdc] tag#10 Sense Key : Illegal Request [current]
Apr 01 20:12:32 pve-1 kernel: sd 4:0:0:0: [sdc] tag#10 Add. Sense: Unaligned write command
Apr 01 20:12:32 pve-1 kernel: sd 4:0:0:0: [sdc] tag#10 CDB: Write(10) 2a 00 0b 80 68 40 00 00 40 00
Apr 01 20:12:32 pve-1 kernel: blk_update_request: I/O error, dev sdc, sector 192964672 op 0x1:(WRITE) flags 0x700 phys_seg 2 prio class 0
Apr 01 20:12:32 pve-1 kernel: zio pool=kvm_lxc2 vdev=/dev/disk/by-id/ata-SAMSUNG_MZ7L3480HCHQ-00A07_S664NA0T802016-part1 error=5 type=2 offset=98796863488 size=32768 flags=180880
Apr 01 20:12:32 pve-1 kernel: ata5: EH complete
Apr 01 20:12:32 pve-1 zed[1373874]: eid=160 class=io pool='kvm_lxc2' vdev=ata-SAMSUNG_MZ7L3480HCHQ-00A07_S664NA0T802016-part1 size=32768 offset=98796863488 priority=3 err=5 flags=0x180880 delay=563ms bookmark=2442:1:1:6340
Apr 01 20:12:32 pve-1 zed[1373891]: eid=161 class=io pool='kvm_lxc2' vdev=ata-SAMSUNG_MZ7L3480HCHQ-00A07_S664NA0T802016-part1 size=49152 offset=98846420992 priority=3 err=5 flags=0x180880 delay=555ms bookmark=2442:1:2:6
Apr 01 20:13:43 pve-1 kernel: ata5.00: exception Emask 0x10 SAct 0x12 SErr 0x400000 action 0x6 frozen
Apr 01 20:13:43 pve-1 kernel: ata5.00: irq_stat 0x08000000, interface fatal error
Apr 01 20:13:43 pve-1 kernel: ata5: SError: { Handshk }
Apr 01 20:13:43 pve-1 kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 01 20:13:43 pve-1 kernel: ata5.00: cmd 61/a0:08:a8:52:e7/00:00:0e:00:00/40 tag 1 ncq dma 81920 out
         res 40/00:08:a8:52:e7/00:00:0e:00:00/40 Emask 0x10 (ATA bus error)
Apr 01 20:13:43 pve-1 kernel: ata5.00: status: { DRDY }
Apr 01 20:13:43 pve-1 kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 01 20:13:43 pve-1 kernel: ata5.00: cmd 61/08:20:48:53:e7/00:00:0e:00:00/40 tag 4 ncq dma 4096 out
         res 40/00:08:a8:52:e7/00:00:0e:00:00/40 Emask 0x10 (ATA bus error)
Apr 01 20:13:43 pve-1 kernel: ata5.00: status: { DRDY }
Apr 01 20:13:43 pve-1 kernel: ata5: hard resetting link
Apr 01 20:13:44 pve-1 kernel: ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Apr 01 20:13:44 pve-1 kernel: ata5.00: supports DRM functions and may not be fully accessible
Apr 01 20:13:44 pve-1 kernel: ata5.00: supports DRM functions and may not be fully accessible
Apr 01 20:13:44 pve-1 kernel: ata5.00: configured for UDMA/133
Apr 01 20:13:44 pve-1 kernel: sd 4:0:0:0: [sdc] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Apr 01 20:13:44 pve-1 kernel: sd 4:0:0:0: [sdc] tag#1 Sense Key : Illegal Request [current]
Apr 01 20:13:44 pve-1 kernel: sd 4:0:0:0: [sdc] tag#1 Add. Sense: Unaligned write command
Apr 01 20:13:44 pve-1 kernel: sd 4:0:0:0: [sdc] tag#1 CDB: Write(10) 2a 00 0e e7 52 a8 00 00 a0 00
Apr 01 20:13:44 pve-1 kernel: blk_update_request: I/O error, dev sdc, sector 250041000 op 0x1:(WRITE) flags 0x700 phys_seg 14 prio class 0
Apr 01 20:13:44 pve-1 kernel: zio pool=kvm_lxc2 vdev=/dev/disk/by-id/ata-SAMSUNG_MZ7L3480HCHQ-00A07_S664NA0T802016-part1 error=5 type=2 offset=128019943424 size=81920 flags=40080c80
Apr 01 20:13:44 pve-1 kernel: sd 4:0:0:0: [sdc] tag#4 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
Apr 01 20:13:44 pve-1 kernel: sd 4:0:0:0: [sdc] tag#4 Sense Key : Illegal Request [current]
Apr 01 20:13:44 pve-1 kernel: sd 4:0:0:0: [sdc] tag#4 Add. Sense: Unaligned write command
Apr 01 20:13:44 pve-1 kernel: sd 4:0:0:0: [sdc] tag#4 CDB: Write(10) 2a 00 0e e7 53 48 00 00 08 00
Apr 01 20:13:44 pve-1 kernel: blk_update_request: I/O error, dev sdc, sector 250041160 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
Apr 01 20:13:44 pve-1 kernel: zio pool=kvm_lxc2 vdev=/dev/disk/by-id/ata-SAMSUNG_MZ7L3480HCHQ-00A07_S664NA0T802016-part1 error=5 type=2 offset=128020025344 size=4096 flags=180880
Apr 01 20:13:44 pve-1 kernel: ata5: EH complete
Apr 01 20:13:44 pve-1 zed[1381153]: eid=162 class=io pool='kvm_lxc2' vdev=ata-SAMSUNG_MZ7L3480HCHQ-00A07_S664NA0T802016-part1 size=81920 offset=128019943424 priority=3 err=5 flags=0x40080c80 delay=565ms
Apr 01 20:13:44 pve-1 zed[1381155]: eid=163 class=io pool='kvm_lxc2' vdev=ata-SAMSUNG_MZ7L3480HCHQ-00A07_S664NA0T802016-part1 size=4096 offset=128020025344 priority=3 err=5 flags=0x180880 delay=573ms bookmark=918:1:0:368436
Apr 01 20:13:44 pve-1 zed[1381172]: eid=165 class=io pool='kvm_lxc2' vdev=ata-SAMSUNG_MZ7L3480HCHQ-00A07_S664NA0T802016-part1 size=4096 offset=128020000768 priority=3 err=5 flags=0x380880 bookmark=918:1:0:368430
Apr 01 20:13:44 pve-1 zed[1381179]: eid=164 class=io pool='kvm_lxc2' vdev=ata-SAMSUNG_MZ7L3480HCHQ-00A07_S664NA0T802016-part1 size=4096 offset=128020017152 priority=3 err=5 flags=0x380880 bookmark=918:1:0:368434
Apr 01 20:13:44 pve-1 zed[1381189]: eid=167 class=io pool='kvm_lxc2' vdev=ata-SAMSUNG_MZ7L3480HCHQ-00A07_S664NA0T802016-part1 size=8192 offset=128019976192 priority=3 err=5 flags=0x380880 bookmark=918:1:0:368426
Apr 01 20:13:44 pve-1 zed[1381183]: eid=166 class=io pool='kvm_lxc2' vdev=ata-SAMSUNG_MZ7L3480HCHQ-00A07_S664NA0T802016-part1 size=4096 offset=128019992576 priority=3 err=5 flags=0x380880 bookmark=918:1:0:368428
Apr 01 20:13:44 pve-1 zed[1381199]: eid=168 class=io pool='kvm_lxc2' vdev=ata-SAMSUNG_MZ7L3480HCHQ-00A07_S664NA0T802016-part1 size=4096 offset=128020021248 priority=3 err=5 flags=0x380880 bookmark=918:1:0:368435
Apr 01 20:13:44 pve-1 zed[1381206]: eid=169 class=io pool='kvm_lxc2' vdev=ata-SAMSUNG_MZ7L3480HCHQ-00A07_S664NA0T802016-part1 size=8192 offset=128019984384 priority=3 err=5 flags=0x380880 bookmark=918:1:0:368427
Apr 01 20:13:44 pve-1 zed[1381213]: eid=170 class=io pool='kvm_lxc2' vdev=ata-SAMSUNG_MZ7L3480HCHQ-00A07_S664NA0T802016-part1 size=4096 offset=128019996672 priority=3 err=5 flags=0x380880 bookmark=918:1:0:368429
Apr 01 20:13:44 pve-1 zed[1381220]: eid=171 class=io pool='kvm_lxc2' vdev=ata-SAMSUNG_MZ7L3480HCHQ-00A07_S664NA0T802016-part1 size=4096 offset=128020013056 priority=3 err=5 flags=0x380880 bookmark=918:1:0:368433
Apr 01 20:13:44 pve-1 zed[1381234]: eid=172 class=io pool='kvm_lxc2' vdev=ata-SAMSUNG_MZ7L3480HCHQ-00A07_S664NA0T802016-part1 size=4096 offset=128020008960 priority=3 err=5 flags=0x380880 bookmark=918:1:0:368432
Apr 01 20:13:44 pve-1 zed[1381237]: eid=173 class=io pool='kvm_lxc2' vdev=ata-SAMSUNG_MZ7L3480HCHQ-00A07_S664NA0T802016-part1 size=8192 offset=128019968000 priority=3 err=5 flags=0x380880 bookmark=918:1:0:368421
Apr 01 20:13:44 pve-1 zed[1381239]: eid=174 class=io pool='kvm_lxc2' vdev=ata-SAMSUNG_MZ7L3480HCHQ-00A07_S664NA0T802016-part1 size=8192 offset=128019959808 priority=3 err=5 flags=0x380880 bookmark=918:1:0:368422
Apr 01 20:13:44 pve-1 zed[1381253]: eid=175 class=io pool='kvm_lxc2' vdev=ata-SAMSUNG_MZ7L3480HCHQ-00A07_S664NA0T802016-part1 size=4096 offset=128020004864 priority=3 err=5 flags=0x380880 bookmark=918:1:0:368431
Apr 01 20:13:44 pve-1 zed[1381255]: eid=176 class=io pool='kvm_lxc2' vdev=ata-SAMSUNG_MZ7L3480HCHQ-00A07_S664NA0T802016-part1 size=8192 offset=128019951616 priority=3 err=5 flags=0x380880 bookmark=918:1:0:368420
Apr 01 20:13:44 pve-1 zed[1381257]: eid=177 class=io pool='kvm_lxc2' vdev=ata-SAMSUNG_MZ7L3480HCHQ-00A07_S664NA0T802016-part1 size=8192 offset=128019943424 priority=3 err=5 flags=0x380880 bookmark=918:1:0:368419
 
Entschuldigung das ich in deutsch schreibe. Mein Englisch ist schlecht.
Ich habe das gleiche Board, Supermicro-H11SSL-I und die gleichen Probleme wie Du. Geräte die an SATA 8-11 und SATA12-15 hängen, bei mir EXOS X16 über Backplane, fallen mit ZFS Lese oder Schreibfehlern aus. An SATA 0-7 ist alles ok (bei mir SSD). Ich habe nun testweise Smart bei den HDDs wie in https://forum.proxmox.com/threads/zfs-fault-imme-wieder-bei-einem-system.81860/ deaktiviert und teste.

Viele Grüße
crmspezi
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!