ata2 resets

jeanmars

New Member
May 26, 2023
18
2
3
Hi,
I noticed that sometimes proxmox web interface freezes for some seconds, looking at /var/log/syslog and filtering on ata2, I see:
Jul 16 18:39:36 ganymede kernel: [1805235.264998] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jul 16 18:39:36 ganymede kernel: [1805235.265019] ata2.00: failed command: FLUSH CACHE EXT
Jul 16 18:39:36 ganymede kernel: [1805235.265024] ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 20
Jul 16 18:39:36 ganymede kernel: [1805235.265037] ata2.00: status: { DRDY }
Jul 16 18:39:36 ganymede kernel: [1805235.265044] ata2: hard resetting link
Jul 16 18:39:46 ganymede kernel: [1805245.281141] ata2: softreset failed (device not ready)
Jul 16 18:39:46 ganymede kernel: [1805245.281159] ata2: hard resetting link
Jul 16 18:39:56 ganymede kernel: [1805255.305151] ata2: softreset failed (device not ready)
Jul 16 18:39:56 ganymede kernel: [1805255.305168] ata2: hard resetting link
Jul 16 18:39:59 ganymede kernel: [1805258.605285] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Jul 16 18:39:59 ganymede kernel: [1805258.610051] ata2.00: configured for UDMA/133
Jul 16 18:39:59 ganymede kernel: [1805258.610057] ata2.00: retrying FLUSH 0xea Emask 0x4
Jul 16 18:39:59 ganymede kernel: [1805258.620222] ata2: EH complete
...
Jul 21 22:50:38 ganymede kernel: [2252297.175925] ata2: EH complete
Jul 21 22:58:37 ganymede kernel: [2252776.517166] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jul 21 22:58:37 ganymede kernel: [2252776.517187] ata2.00: failed command: FLUSH CACHE EXT
Jul 21 22:58:37 ganymede kernel: [2252776.517192] ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 23
Jul 21 22:58:37 ganymede kernel: [2252776.517206] ata2.00: status: { DRDY }
Jul 21 22:58:37 ganymede kernel: [2252776.517213] ata2: hard resetting link
Jul 21 22:58:47 ganymede kernel: [2252786.537147] ata2: softreset failed (device not ready)
Jul 21 22:58:47 ganymede kernel: [2252786.537165] ata2: hard resetting link
Jul 21 22:58:48 ganymede kernel: [2252787.012899] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Jul 21 22:58:48 ganymede kernel: [2252787.017818] ata2.00: configured for UDMA/133
Jul 21 22:58:48 ganymede kernel: [2252787.017824] ata2.00: retrying FLUSH 0xea Emask 0x4
Jul 21 22:58:48 ganymede kernel: [2252787.028258] ata2: EH complete

Using dmesg | grep ata2 | head, ata2 is the NVMe disk where I installed proxmox:

[ 1.015008] ata2: SATA max UDMA/133 abar m2048@0xfcc00000 port 0xfcc00100 irq 37
[ 1.476849] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1.479311] ata2.00: ATA-11: Lexar SSD NQ100 480GB, SN11873, max UDMA/133
[ 1.479388] ata2.00: 937703088 sectors, multi 1: LBA48 NCQ (depth 32), AA
[ 1.481923] ata2.00: configured for UDMA/133
I initially thought of a wrong SATA cable (I also have a SATA disk) but I guess the NVMe disk is simply faulty.
However SMART report is:
root@ganymede:~# smartctl -a /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.108-1-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: Lexar SSD NQ100 480GB
Serial Number: NAC205R0054910S30D
LU WWN Device Id: 5 3a5a27 2050b082b
Firmware Version: SN11873
User Capacity: 480,103,981,056 bytes [480 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: 2.5 inches
TRIM Command: Available, deterministic, zeroed
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-4 (minor revision not indicated)
SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Jul 21 23:22:22 2023 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 33) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 85) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x0031) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 20
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x1300 100 100 010 Old_age Offline - 0
9 Power_On_Hours 0x1200 100 100 000 Old_age Offline - 876
12 Power_Cycle_Count 0x1200 100 100 000 Old_age Offline - 3
164 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 17180983304
165 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 17
166 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 4
167 Unknown_Attribute 0x2200 100 100 000 Old_age Offline - 8
194 Temperature_Celsius 0x2200 038 038 000 Old_age Offline - 38 (Min/Max 16/48)
199 UDMA_CRC_Error_Count 0x1200 100 100 000 Old_age Offline - 0
241 Total_LBAs_Written 0x3200 100 100 000 Old_age Offline - 4428
242 Total_LBAs_Read 0x3200 100 100 000 Old_age Offline - 7596

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

And when doing a test:
root@ganymede:~# smartctl -t short /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.108-1-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Fri Jul 21 23:26:23 2023 CEST
Use smartctl -X to abort test.
root@ganymede:~# smartctl --log selftest /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.108-1-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 877 -

Does not look that bad. Any idea?
Full logs/tests attached.
Thanks,
Jean
 

Attachments

  • dmesg.txt
    755 bytes · Views: 0
  • smartctl.txt
    5.7 KB · Views: 0
  • syslog.txt
    12.5 KB · Views: 0
If it is connected via SATA, it's not an NVMe SDD (which uses PCI express), it's a SATA SDD. If a (long) SMART self-test does not give you any errors, then it might be the M.2 connection. Maybe it's not inserted properly or it is defective or the drive side of it is defective. Try another M.2 slot and try another SATA M.2 SSD in the same slot to test this. Or maybe it's an incompatibility between the motherboard/chipset/CPU and the drive. Update the motherboard BIOS and/or try another M.2 slot and/or try another SATA M.2 SSD. Or maybe it's an issue with the Proxmox kernel driver; try a newer kernel version and/or another SATA M.2 SSD.
 
Hi again,

oups my bad, got confused by NVMe/SATA interfaces:
root@ganymede:~# fdisk -l
Disk /dev/nvme0n1: 1.82 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: Lexar SSD NM610PRO 2TB
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 0F5E15A7-5C67-A844-9B0A-EA6124429F38

Device Start End Sectors Size Type
/dev/nvme0n1p1 2048 3907012607 3907010560 1.8T Linux filesystem
/dev/nvme0n1p9 3907012608 3907028991 16384 8M Solaris reserved 1


Disk /dev/sda: 447.13 GiB, 480103981056 bytes, 937703088 sectors
Disk model: Lexar SSD NQ100
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 8ACCE50D-5C16-45CD-92D0-AFD4CB4EC730

Device Start End Sectors Size Type
/dev/sda1 34 2047 2014 1007K BIOS boot
/dev/sda2 2048 2099199 2097152 1G EFI System
/dev/sda3 2099200 937703054 935603855 446.1G Linux LVM


Disk /dev/mapper/pve-swap: 8 GiB, 8589934592 bytes, 16777216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/pve-root: 96 GiB, 103079215104 bytes, 201326592 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/pve-vm--100--disk--0: 200 GiB, 214748364800 bytes, 419430400 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 65536 bytes / 65536 bytes
Disklabel type: dos
Disk identifier: 0xb205431b

Device Boot Start End Sectors Size Id Type
/dev/mapper/pve-vm--100--disk--0-part1 * 2048 417429503 417427456 199G 83 Linux
/dev/mapper/pve-vm--100--disk--0-part2 417431550 419428351 1996802 975M 5 Extended
/dev/mapper/pve-vm--100--disk--0-part5 417431552 419428351 1996800 975M 82 Linux swap / Solaris

Partition 2 does not start on physical sector boundary.


Disk /dev/mapper/pve-vm--101--disk--0: 100 GiB, 107374182400 bytes, 209715200 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 65536 bytes / 65536 bytes
Disklabel type: dos
Disk identifier: 0xb4b6e1ea

Device Boot Start End Sectors Size Id Type
/dev/mapper/pve-vm--101--disk--0-part1 * 2048 207714303 207712256 99G 83 Linux
/dev/mapper/pve-vm--101--disk--0-part2 207716350 209713151 1996802 975M 5 Extended
/dev/mapper/pve-vm--101--disk--0-part5 207716352 209713151 1996800 975M 82 Linux swap / Solaris

Partition 2 does not start on physical sector boundary.

Turns out that /dev/sda is SATA and not NVMe, so it could be SATA cable unless it's the disk itself?
Any advice appreciated,
Jean
 
Hi again,

oups my bad, got confused by NVMe/SATA interfaces:


Turns out that /dev/sda is SATA and not NVMe, so it could be SATA cable unless it's the disk itself?
Any advice appreciated,
Jean
M.2 slots often support NVMe and SATA. Some M.2 SSD are NVMe and some are SATA (two notches in the connector). From your description it's not clear to me whether the problematic SSD is SATA via an M.2 slot or connected to a SATA contoller. I think the same approach applies:
If a (long) SMART self-test does not give you any errors, then it might be the connection or the controller. Maybe it's not inserted properly or the port/controller is defective or the drive side of it is defective. Try another SATA port and/or SATA controller and try another SATA SSD on the same port/controller to test this. Or maybe it's an incompatibility between the motherboard/chipset/CPU and the drive. Update the motherboard BIOS and/or try another SATA port or SATA contoller and/or try another SATA SSD. Or maybe it's an issue with the Proxmox kernel driver; try a newer kernel version and/or another SATA SSD.
 
Hi again,

the machine is a ACEMAGICIAN AM06 (https://www.acemagic.com/products/am06). I replaced the original 512GB NVMe SSD with a Lexar SSD NM610PRO 2TB and added a Lexar SSD NQ100 (480GB SSD).
Proxmox is installed on the Lexar SSD NQ100 (480GB SSD).
I'm running Proxmox 7.4.16, would be that be to use 8.0 for newer kernel?
Thanks,
Jean
 
Hi again,

the machine is a ACEMAGICIAN AM06 (https://www.acemagic.com/products/am06). I replaced the original 512GB NVMe SSD with a Lexar SSD NM610PRO 2TB and added a Lexar SSD NQ100 (480GB SSD).
Proxmox is installed on the Lexar SSD NQ100 (480GB SSD).
I'm running Proxmox 7.4.16, would be that be to use 8.0 for newer kernel?
Thanks,
Jean
I don't understand, sorry. Please test you hardware, as it most likely not a Promox configuration issues. If it's not a hardware problem, then it's still unlikely you can fix it with a Proxmox configuration settings.
EDIT: Maybe other people here know better than me or have more experience with this particular issue. You might want to wait for others to join in. Then again, testing your hardware might be a good idea regardless.
 
Last edited:
Hi,
just to give an update: there is definitively something wrong with the SATA cable/disk controller/disk; I minimized all possible useless Proxmox services (firewall/local HA, cluster HA), there are less disk errors but still (typically happens sometimes when proxmox trigs the hourly cron job).
My plan is to remove this 480GB SATA SSD and work only with the 2TB NVMe disk.
Will have to backup my VM (hopefully only one) and re-install proxmox.

Thanks,
Jean
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!