io-error - potential disk failing - how to identify

fpdragon

Member
Feb 24, 2022
46
3
13
39
Hi,

I have a proxmox running with a WS2022 VM. This VM bundles multiple physical disks as a storage space.

That runs quite well until now. I was getting the yellow sign with "io-error".

Since I hear some strange noise from the disks I guess one of the disks fails but I am not sure which one.

The VM itself stops on the io-error so I can't use the windows server tools to identify the disk.

In the proxmox syslog there are two disks mentioned:

Code:
ata2.00: failed command READ_FPDMA_QUEUED
ata2.00: cmd 60/28:58:f0:f8:7d/08:00:27:00:00/40 tag 11 ncq dma 1069056 in res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x50 (ATA bus error)

And also an other device has multiple entries:

Code:
ata6: link is slow to respond, please be patient (ready=0)
ata6: COMRESET failed (errno=-16)
ata6: hard resetting link
ata6.00: exception Emask 0x50 SAct 0x870000 SErr 0x4090800 action 0xe frozen
ata6.00: irq_stat 0x00400040, connection status changed

and several others...

proxmox reports for all the disks "SMART" beeing "PASSED"

My question now:

How can I identify the disks?
I do know all the printed information of the different labels including model and serial.
I maped the physical disks with the following commands:
Code:
qm set 1003 -virtio21 /dev/disk/by-id/ata-WDC_anymodelnumber_serialnumber
...

so I do know also how to match the "Disks" view of the proxmox GUI with the HW.

However these "ata2" or "ata6" I have no clue how to match these with my disks.

Sure you can help here.
Thanks!

PS: on windows I always had some performance tools to measure the disk latency and disk access queue and so on. These made it typically easy to identify a failing disk. Is there something similar in proxmox or linux?
 
Last edited:
Hello

Maybe there is something in the logs. Could you run journalctl --since '2023-11-07' > $(hostname)-journal.txt and share the output file with us?

Regards
Philipp
 
Thanks, I think I have identified the disks over the logs.

The strange thing is that one is a HDD and one is a SSD for storage tiering and cache.

I now replaced the HDD only and currently rebuilding the parity but it seems to work just fine. Not sure why the SSD also had mentions in the logs. I will wait and see how it goes. Maybe the SATA controller was blocked because of the broken HDD and thus the SSD also had io troubles?

However, the questions seems to be answered.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!