DRDY ERR

ace308

New Member
Jan 25, 2024
10
0
1
Hello All,
I have a homelab with proxmox OS installed on an nvme drive with 2 ssds (Intel enterprise 1.6Tb) that are passed through to a truenas VM running zfs (raid1).
I believe I had a power outage and now when I boot up the homelab, a screen load saying SMART test failed, f2 to setup and f12 to boot options.
Luckily I can log into proxmox with no issues, but the truenas VM doesn't boot because it can't find the ssds.

I get a repeating error for both ssds.
ata1.00: status: {DRDY ERR}
ata1.00: error {IDNF}
Failed Command: Read DMA

Under "Disks", I can see both drives but they are no longer in use, it also shows the smart tests have 'failed' for both drives. It looks like as if they are not initialized, like how a blank disk looks before you initialize it.

I took one drive out and put it in an external usb case and plugged it into my windows pc, and it appears under the disk utility as a blank drive that needs to be initialized/partioned. But I can't look at whats on the drive itself as I was going to try to backup everything.

I assume the drives are fine but I'm not sure what to do moving forward other than wiping everything and starting over which I would like to avoid if possible.

I am reading about e2fsck and tried a few things but I haven't had any luck yet.

How do you go about fixing this?

Thanks in advance.







PXL_20240510_205839050.jpg
 
What does SMART say, exactly?
I figure there's a chance that the disk's LBA is messed up. If that's the case you just lost everything within that disk.

If LBA is intact and its just some superblock got corrupted, then it is possible to recover data, given that you remember what partition table/file system was used on the drive.
 
Last edited:
Ok, i've plugged everything back in and booted up my homelab. My truenas VM came back online however the zpool is status is, of course 'offline'.

On proxmox - homelab - disks - show smart values, I get "Drive failure expected in less than 24 hours. Save ALL data. No failed attributes found.
111.JPG

I can see the drive info using sudo smartctl -a /dev/sda and but smart values are offiline with error log: scsi error badly formed scsi parameters.

2.JPG

I've tried to do a smart test but it says 'self test functions not supported'

sudo smartctl -t short /dev/sda
3.JPG

What should I try next?
 
My educated guess here is something went horribly wrong with the SSD controller. It’s not even returning well-formatted SMART data
 
You sure you got a power outage, not a failing PSU?
It's a good question, I have no idea. I wasn't there when it happened. My system is a Fujitsu q558 which has an internal PSU.

What should I do next? I believe I can plug a drive into my Windows PC and use the Intel storage software (forgot what's it called) to read the smart data. But I'm trying to put the zpool back together and getting my data back so I can back it up
 
If the drive is still readable (the disk still responds to block read command with sensible results, not just bunch of all zeros or all ones), you could apply standard data recovery procedure here.

But in no way you are getting your zpool back so forget about it and start looking for data recovery softwares.

Keep your drive powered off while you do your search.

Also I would replace the PSU/the whole machine if PSU is not easily r placable
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!