Hello, I've had this issue for >6 months with a couple of Linux VMs that display this error during / after backup (Proxmox Backup Server, kept up to date).
kern.log on the VM shows:
Jul 22 08:00:44 vubuntu3-srv kernel: [ 146.627177] loop6: detected capacity change from 0 to 8
Jul 22 08:10:27 vubuntu3-srv kernel: [ 726.317041] ata3.00: exception Emask 0x0 SAct 0xe0 SErr 0x0 action 0x6 frozen
Jul 22 08:10:27 vubuntu3-srv kernel: [ 726.317242] ata3.00: failed command: WRITE FPDMA QUEUED
Jul 22 08:10:27 vubuntu3-srv kernel: [ 726.317399] ata3.00: cmd 61/08:28:80:a8:a1/00:00:03:00:00/40 tag 5 ncq dma 4096 out
Jul 22 08:10:27 vubuntu3-srv kernel: [ 726.317399] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 22 08:10:27 vubuntu3-srv kernel: [ 726.317713] ata3.00: status: { DRDY }
Jul 22 08:10:27 vubuntu3-srv kernel: [ 726.317827] ata3.00: failed command: WRITE FPDMA QUEUED
Jul 22 08:10:27 vubuntu3-srv kernel: [ 726.317945] ata3.00: cmd 61/10:30:00:a9:a1/00:00:03:00:00/40 tag 6 ncq dma 8192 out
Jul 22 08:10:27 vubuntu3-srv kernel: [ 726.317945] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 22 08:10:27 vubuntu3-srv kernel: [ 726.318191] ata3.00: status: { DRDY }
Jul 22 08:10:27 vubuntu3-srv kernel: [ 726.318320] ata3.00: failed command: WRITE FPDMA QUEUED
Jul 22 08:10:27 vubuntu3-srv kernel: [ 726.318460] ata3.00: cmd 61/08:38:00:a9:a2/00:00:03:00:00/40 tag 7 ncq dma 4096 out
Jul 22 08:10:27 vubuntu3-srv kernel: [ 726.318460] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 22 08:10:27 vubuntu3-srv kernel: [ 726.318707] ata3.00: status: { DRDY }
Jul 22 08:10:27 vubuntu3-srv kernel: [ 726.318868] ata3: hard resetting link
Jul 22 08:10:27 vubuntu3-srv kernel: [ 730.132665] clocksource: Long readout interval, skipping watchdog check: cs_nsec: 4291693351 wd_nsec: 4291691291
Jul 22 08:10:27 vubuntu3-srv kernel: [ 730.446145] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jul 22 08:10:27 vubuntu3-srv kernel: [ 730.446950] ata3.00: configured for UDMA/100
Jul 22 08:10:27 vubuntu3-srv kernel: [ 730.446972] ata3.00: device reported invalid CHS sector 0
Jul 22 08:10:27 vubuntu3-srv kernel: [ 730.446977] ata3.00: device reported invalid CHS sector 0
Jul 22 08:10:27 vubuntu3-srv kernel: [ 730.446979] ata3.00: device reported invalid CHS sector 0
Jul 22 08:10:27 vubuntu3-srv kernel: [ 730.447000] ata3: EH complete
Jul 22 08:15:15 vubuntu3-srv kernel: [ 1017.592300] ata3.00: exception Emask 0x0 SAct 0x8000000 SErr 0x0 action 0x6 frozen
Jul 22 08:15:15 vubuntu3-srv kernel: [ 1017.592536] ata3.00: failed command: WRITE FPDMA QUEUED
Jul 22 08:15:15 vubuntu3-srv kernel: [ 1017.592655] ata3.00: cmd 61/10:d8:b8:e7:65/00:00:03:00:00/40 tag 27 ncq dma 8192 out
Jul 22 08:15:15 vubuntu3-srv kernel: [ 1017.592655] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 22 08:15:15 vubuntu3-srv kernel: [ 1017.592969] ata3.00: status: { DRDY }
Jul 22 08:15:15 vubuntu3-srv kernel: [ 1017.593118] ata3: hard resetting link
Jul 22 08:15:15 vubuntu3-srv kernel: [ 1017.930565] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jul 22 08:15:15 vubuntu3-srv kernel: [ 1017.931303] ata3.00: configured for UDMA/100
Jul 22 08:15:15 vubuntu3-srv kernel: [ 1017.931320] ata3.00: device reported invalid CHS sector 0
Jul 22 08:15:15 vubuntu3-srv kernel: [ 1017.931337] ata3: EH complete
Jul 22 08:16:01 vubuntu3-srv kernel: [ 1063.647412] ata3.00: exception Emask 0x0 SAct 0x8000 SErr 0x0 action 0x6 frozen
Jul 22 08:16:01 vubuntu3-srv kernel: [ 1063.647584] ata3.00: failed command: WRITE FPDMA QUEUED
Jul 22 08:16:01 vubuntu3-srv kernel: [ 1063.647754] ata3.00: cmd 61/10:78:f0:e7:65/00:00:03:00:00/40 tag 15 ncq dma 8192 out
Jul 22 08:16:01 vubuntu3-srv kernel: [ 1063.647754] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 22 08:16:01 vubuntu3-srv kernel: [ 1063.648021] ata3.00: status: { DRDY }
Jul 22 08:16:01 vubuntu3-srv kernel: [ 1063.648257] ata3: hard resetting link
Jul 22 08:16:01 vubuntu3-srv kernel: [ 1064.022547] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jul 22 08:16:01 vubuntu3-srv kernel: [ 1064.023182] ata3.00: configured for UDMA/100
Jul 22 08:16:01 vubuntu3-srv kernel: [ 1064.023195] ata3.00: device reported invalid CHS sector 0
Jul 22 08:16:01 vubuntu3-srv kernel: [ 1064.023217] ata3: EH complete
This only happens if the VM HDD is on a Compellent SC4020 iSCSI storage. If I move the HDDs to local storage, the VM is stable during backups. Fsck doesn't show any errors, the iSCSI storage doesn't show any errors. All the updates didn't solve anything, I'm on PVE 8.4.5 now.
I have multiple Linux and Windows VMs that run just fine on the iSCSI storage, without issues because of the backup. Only two Linux VMs have this behaviour, and a Windows VM that resets (I think BSOD), and only if they are backed by the iSCSI storage. If moved on local storage everything is ok.
So far I've tried:
- all the possible combinations of the VM config (VirtIO SCSI / LSI / even PVSCSI), async_io on threads / native / io_ring, SSD emulation, etc.
- net.ipv4.conf.all.arp_ignore = 1 and rp_filter = 1 in /etc/sysctl.conf on the host
- SCSI header and body digests on and off on the SC4020
I'm out of ideas. Have anyone run into something similar? Any ideas how to debug this issue?
Best regards,
kern.log on the VM shows:
Jul 22 08:00:44 vubuntu3-srv kernel: [ 146.627177] loop6: detected capacity change from 0 to 8
Jul 22 08:10:27 vubuntu3-srv kernel: [ 726.317041] ata3.00: exception Emask 0x0 SAct 0xe0 SErr 0x0 action 0x6 frozen
Jul 22 08:10:27 vubuntu3-srv kernel: [ 726.317242] ata3.00: failed command: WRITE FPDMA QUEUED
Jul 22 08:10:27 vubuntu3-srv kernel: [ 726.317399] ata3.00: cmd 61/08:28:80:a8:a1/00:00:03:00:00/40 tag 5 ncq dma 4096 out
Jul 22 08:10:27 vubuntu3-srv kernel: [ 726.317399] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 22 08:10:27 vubuntu3-srv kernel: [ 726.317713] ata3.00: status: { DRDY }
Jul 22 08:10:27 vubuntu3-srv kernel: [ 726.317827] ata3.00: failed command: WRITE FPDMA QUEUED
Jul 22 08:10:27 vubuntu3-srv kernel: [ 726.317945] ata3.00: cmd 61/10:30:00:a9:a1/00:00:03:00:00/40 tag 6 ncq dma 8192 out
Jul 22 08:10:27 vubuntu3-srv kernel: [ 726.317945] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 22 08:10:27 vubuntu3-srv kernel: [ 726.318191] ata3.00: status: { DRDY }
Jul 22 08:10:27 vubuntu3-srv kernel: [ 726.318320] ata3.00: failed command: WRITE FPDMA QUEUED
Jul 22 08:10:27 vubuntu3-srv kernel: [ 726.318460] ata3.00: cmd 61/08:38:00:a9:a2/00:00:03:00:00/40 tag 7 ncq dma 4096 out
Jul 22 08:10:27 vubuntu3-srv kernel: [ 726.318460] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 22 08:10:27 vubuntu3-srv kernel: [ 726.318707] ata3.00: status: { DRDY }
Jul 22 08:10:27 vubuntu3-srv kernel: [ 726.318868] ata3: hard resetting link
Jul 22 08:10:27 vubuntu3-srv kernel: [ 730.132665] clocksource: Long readout interval, skipping watchdog check: cs_nsec: 4291693351 wd_nsec: 4291691291
Jul 22 08:10:27 vubuntu3-srv kernel: [ 730.446145] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jul 22 08:10:27 vubuntu3-srv kernel: [ 730.446950] ata3.00: configured for UDMA/100
Jul 22 08:10:27 vubuntu3-srv kernel: [ 730.446972] ata3.00: device reported invalid CHS sector 0
Jul 22 08:10:27 vubuntu3-srv kernel: [ 730.446977] ata3.00: device reported invalid CHS sector 0
Jul 22 08:10:27 vubuntu3-srv kernel: [ 730.446979] ata3.00: device reported invalid CHS sector 0
Jul 22 08:10:27 vubuntu3-srv kernel: [ 730.447000] ata3: EH complete
Jul 22 08:15:15 vubuntu3-srv kernel: [ 1017.592300] ata3.00: exception Emask 0x0 SAct 0x8000000 SErr 0x0 action 0x6 frozen
Jul 22 08:15:15 vubuntu3-srv kernel: [ 1017.592536] ata3.00: failed command: WRITE FPDMA QUEUED
Jul 22 08:15:15 vubuntu3-srv kernel: [ 1017.592655] ata3.00: cmd 61/10:d8:b8:e7:65/00:00:03:00:00/40 tag 27 ncq dma 8192 out
Jul 22 08:15:15 vubuntu3-srv kernel: [ 1017.592655] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 22 08:15:15 vubuntu3-srv kernel: [ 1017.592969] ata3.00: status: { DRDY }
Jul 22 08:15:15 vubuntu3-srv kernel: [ 1017.593118] ata3: hard resetting link
Jul 22 08:15:15 vubuntu3-srv kernel: [ 1017.930565] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jul 22 08:15:15 vubuntu3-srv kernel: [ 1017.931303] ata3.00: configured for UDMA/100
Jul 22 08:15:15 vubuntu3-srv kernel: [ 1017.931320] ata3.00: device reported invalid CHS sector 0
Jul 22 08:15:15 vubuntu3-srv kernel: [ 1017.931337] ata3: EH complete
Jul 22 08:16:01 vubuntu3-srv kernel: [ 1063.647412] ata3.00: exception Emask 0x0 SAct 0x8000 SErr 0x0 action 0x6 frozen
Jul 22 08:16:01 vubuntu3-srv kernel: [ 1063.647584] ata3.00: failed command: WRITE FPDMA QUEUED
Jul 22 08:16:01 vubuntu3-srv kernel: [ 1063.647754] ata3.00: cmd 61/10:78:f0:e7:65/00:00:03:00:00/40 tag 15 ncq dma 8192 out
Jul 22 08:16:01 vubuntu3-srv kernel: [ 1063.647754] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 22 08:16:01 vubuntu3-srv kernel: [ 1063.648021] ata3.00: status: { DRDY }
Jul 22 08:16:01 vubuntu3-srv kernel: [ 1063.648257] ata3: hard resetting link
Jul 22 08:16:01 vubuntu3-srv kernel: [ 1064.022547] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jul 22 08:16:01 vubuntu3-srv kernel: [ 1064.023182] ata3.00: configured for UDMA/100
Jul 22 08:16:01 vubuntu3-srv kernel: [ 1064.023195] ata3.00: device reported invalid CHS sector 0
Jul 22 08:16:01 vubuntu3-srv kernel: [ 1064.023217] ata3: EH complete
This only happens if the VM HDD is on a Compellent SC4020 iSCSI storage. If I move the HDDs to local storage, the VM is stable during backups. Fsck doesn't show any errors, the iSCSI storage doesn't show any errors. All the updates didn't solve anything, I'm on PVE 8.4.5 now.
I have multiple Linux and Windows VMs that run just fine on the iSCSI storage, without issues because of the backup. Only two Linux VMs have this behaviour, and a Windows VM that resets (I think BSOD), and only if they are backed by the iSCSI storage. If moved on local storage everything is ok.
So far I've tried:
- all the possible combinations of the VM config (VirtIO SCSI / LSI / even PVSCSI), async_io on threads / native / io_ring, SSD emulation, etc.
- net.ipv4.conf.all.arp_ignore = 1 and rp_filter = 1 in /etc/sysctl.conf on the host
- SCSI header and body digests on and off on the SC4020
I'm out of ideas. Have anyone run into something similar? Any ideas how to debug this issue?
Best regards,