Hello everyone,
I am currently struggling a lot with my ZFS Pool (mainly SATA Issues). Every now and then i get a "SATA link down", "hard resetting link", "link is slow to respond, please be patient (ready=0)". This then leads to ZFS Pool error, which than degregate my whole pool. As I thought a HDD is the cause of this whole issue, I tried to replace this HDD. But currently during resilvering, the SATA link issues still happen. I dug into the logs but just couldnt find any cause of the issue. Eventually you guys have an idea to solve this issue. First to my setup:
I run the whole ZFS Pool for 2 months now, here and then i got some issues. I already got the issue about a month ago, then just started from 0 and setup the pool again - which then worked like a charm. About two weeks ago - again i got a lot of SATA Link Errors, which i resolved just with a scrub and then the system worked nice until now! Currently the 4 drives are connected via 3 different SATA power lines (which i read could be an issue, but didnt resolve anything). I also have the feeling that the change of the HDD is not quite the solution to this problem - as I think the system have another issue. Also i tried to change the SATA cables, without any luck (tried 3 different pairs, I think CableMatters was one of them). For the drives in detail:
As i tried to replace one drive, as said, the pool is currently resilvering - but i have the feeling this will not solve the issue (for a long time). Also i have a second pool (with SSDs, which dont make any problem), see:
I know this is a lot of information / logs - but i would preciate any kind of hint that could help me to reduce this errors! If i forgot any kind of infromation, please let me know. Thanks in advance!!!
I am currently struggling a lot with my ZFS Pool (mainly SATA Issues). Every now and then i get a "SATA link down", "hard resetting link", "link is slow to respond, please be patient (ready=0)". This then leads to ZFS Pool error, which than degregate my whole pool. As I thought a HDD is the cause of this whole issue, I tried to replace this HDD. But currently during resilvering, the SATA link issues still happen. I dug into the logs but just couldnt find any cause of the issue. Eventually you guys have an idea to solve this issue. First to my setup:
- Motherboard: AsRock B450 Pro4 - i already checked for Aggressive Link Power Management (didnt find this option in the BIOS) and other options that could influence the behavior. The BIOS version is 10.41. Every HDD / SSD
- CPU: Ryzen 5 5600G
- HDD: 4x SEAGATE 4TB IronWolf (these are different models)
- SSD: 2x SANDISK 1TB
- OS: Proxmox VE 9.1.1
- GPU: Intel ARC A380 (mainly for transcoding
- Power Supply: BeQuiet! Power 11 Platinum (1000W Platinum Plus)
I run the whole ZFS Pool for 2 months now, here and then i got some issues. I already got the issue about a month ago, then just started from 0 and setup the pool again - which then worked like a charm. About two weeks ago - again i got a lot of SATA Link Errors, which i resolved just with a scrub and then the system worked nice until now! Currently the 4 drives are connected via 3 different SATA power lines (which i read could be an issue, but didnt resolve anything). I also have the feeling that the change of the HDD is not quite the solution to this problem - as I think the system have another issue. Also i tried to change the SATA cables, without any luck (tried 3 different pairs, I think CableMatters was one of them). For the drives in detail:
- lsblk: https://pastebin.com/shJn2ryK
- more detailed lsblk: https://pastebin.com/JszCL33G
- dmesg -T: https://pastebin.com/DG159WLU (interestingly the drives operate for quite some time, and suddenly start loosing SATA connection, then operate again)
-
Code:
[Mon Nov 24 21:20:28 2025] audit: type=1400 audit(1764015628.258:513): apparmor="DENIED" operation="sendmsg" class="file" namespace="root//lxc-123_<-var-lib-lxc>" profile="rsyslogd" name="/run/systemd/journal/dev-log" pid=14739 comm="systemd-journal" requested_mask="r" denied_mask="r" fsuid=100000 ouid=100000 [Mon Nov 24 21:21:49 2025] ata9.00: exception Emask 0x10 SAct 0x20400 SErr 0x40002 action 0x6 frozen [Mon Nov 24 21:21:49 2025] ata9.00: irq_stat 0x08000000, interface fatal error [Mon Nov 24 21:21:49 2025] ata9: SError: { RecovComm CommWake } [Mon Nov 24 21:21:49 2025] ata9.00: failed command: WRITE FPDMA QUEUED [Mon Nov 24 21:21:49 2025] ata9.00: cmd 61/50:50:c0:47:82/00:00:2b:00:00/40 tag 10 ncq dma 40960 out res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x10 (ATA bus error) [Mon Nov 24 21:21:49 2025] ata9.00: status: { DRDY } [Mon Nov 24 21:21:49 2025] ata9.00: failed command: WRITE FPDMA QUEUED [Mon Nov 24 21:21:49 2025] ata9.00: cmd 61/50:88:18:48:82/00:00:2b:00:00/40 tag 17 ncq dma 40960 out res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error) [Mon Nov 24 21:21:49 2025] ata9.00: status: { DRDY } [Mon Nov 24 21:21:49 2025] ata9: hard resetting link [Mon Nov 24 21:21:49 2025] ata6.00: limiting speed to UDMA/100:PIO4 [Mon Nov 24 21:21:49 2025] ata6.00: exception Emask 0x52 SAct 0x1000 SErr 0x30c02 action 0xe frozen [Mon Nov 24 21:21:49 2025] ata6.00: irq_stat 0x00400000, PHY RDY changed [Mon Nov 24 21:21:49 2025] ata6: SError: { RecovComm Proto HostInt PHYRdyChg PHYInt } [Mon Nov 24 21:21:49 2025] ata6.00: failed command: READ FPDMA QUEUED [Mon Nov 24 21:21:49 2025] ata6.00: cmd 60/e8:60:a0:4e:82/07:00:2b:00:00/40 tag 12 ncq dma 1036288 in res 40/00:01:06:4f:c2/00:00:00:00:00/00 Emask 0x52 (ATA bus error) [Mon Nov 24 21:21:49 2025] ata6.00: status: { DRDY } [Mon Nov 24 21:21:49 2025] ata6: hard resetting link [Mon Nov 24 21:21:54 2025] ata9: link is slow to respond, please be patient (ready=0) [Mon Nov 24 21:21:55 2025] ata6: link is slow to respond, please be patient (ready=0) [Mon Nov 24 21:21:56 2025] ata9: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [Mon Nov 24 21:21:56 2025] ata9.00: configured for UDMA/33 [Mon Nov 24 21:21:56 2025] ata9: EH complete [Mon Nov 24 21:21:59 2025] ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310) [Mon Nov 24 21:21:59 2025] ata6.00: configured for UDMA/100 [Mon Nov 24 21:21:59 2025] ata6: EH complete [Mon Nov 24 21:25:01 2025] audit: type=1400 audit(1764015901.480:514): apparmor="DENIED" operation="sendmsg" class="file" namespace="root//lxc-123_<-var-lib-lxc>" profile="rsyslogd" name="/run/systemd/journal/dev-log" pid=14739 comm="systemd-journal" requested_mask="r" denied_mask="r" fsuid=100000 ouid=100000 [Mon Nov 24 21:25:01 2025] audit: type=1400 audit(1764015901.480:515): apparmor="DENIED" operation="sendmsg" class="file" namespace="root//lxc-123_<-var-lib-lxc>" profile="rsyslogd" name="/run/systemd/journal/dev-log" pid=14739 comm="systemd-journal" requested_mask="r" denied_mask="r" fsuid=100000 ouid=100000 - smartctl -a /dev/sdc: https://pastebin.com/fFK5Nwam
- smartctl -a /dev/sdd: https://pastebin.com/E907QRx7
- smartctl -a /dev/sde: https://pastebin.com/DvVsDxnc
- smartctl -a /dev/sdf: https://pastebin.com/9vVxc2F0
As i tried to replace one drive, as said, the pool is currently resilvering - but i have the feeling this will not solve the issue (for a long time). Also i have a second pool (with SSDs, which dont make any problem), see:
- zpool status: https://pastebin.com/ErKPvhne
- zpool status -v: https://pastebin.com/WtdX81UB
- hdparm -Tt dev/xxx: https://pastebin.com/1BKquwK1
- zpool iostat -v 1 10: https://pastebin.com/dJNn53Z5
I know this is a lot of information / logs - but i would preciate any kind of hint that could help me to reduce this errors! If i forgot any kind of infromation, please let me know. Thanks in advance!!!