[SOLVED] Disk shows SMART errors, need to return it to vendor - how to wipe it?

andre85

New Member
Apr 16, 2024
17
1
3
Good morning.
I'm running a mirrored 2-bay ZFS "stack".
Unfortunately my two months old Seagate 12 TB drive shows SMART errors for a couple days. It has been a replacement of another faulty HDD which I've replaced with this one.
After being in contact with the vendor, they told me to return it. They want to check the drive and then send me a new one, which is going to take 5-7 workdays according to them.
Obviously I don't want them to have my data, so I'd like to wipe it securely before sending it in.
I thought about removing the HDD and wipe it using DBAN. But isn't there probably an option within proxmox?

Also, whats actually the best way to replace it within the ZFS stack? I think I've used zpool replace rpool the last time. But is there anything I need to do prior to removing the harddrive?

I appreciate any help!

Thanks
André
 
Obviously I don't want them to have my data, so I'd like to wipe it securely before sending it in.

You have many options, here are some of them:

1. Just zero byte write the whole disk with dd if=/dev/zero of=/dev/sdX bs=1M (replace sdX with actual drive - be careful!)
2. Use the shred command as in shred -vfz /dev/sdX (replace sdX with actual drive - be careful!)
3. If you want it at the firmware-level then use hdparm , you may need to unfreeze the drive - read up on the issue, there is plenty out there.
 
  • Like
Reactions: andre85
Thank you for your reply!
I guess I'd go for the zero byte approach.
Can I just run this while it's still part of the stack? I don't want to wipe my sda drive (only sdb). Or do I need to detach it somehow?

Sorry, I'm not very experienced at this topic :(
 
Logically (I don't use ZFS) you don't want to write to the drive (sdb) while its still part of the mirror. So first you'd want to detach it from the mirror & ONLY then write/erase it.

IDK your exact ZFS setup, but in theory (I've made up the IDs just for this small guide):

Code:
# zpool status
  pool: rzpool
 state: ONLINE
config:


        NAME                        STATE     READ WRITE CKSUM
        rzpool                      ONLINE       0     0     0
          mirror-0                  ONLINE       0     0     0
            abc-0x30000376db771f50  ONLINE       0     0     0
            abc-0x30000376db771f51  ONLINE       0     0     0

Determine which one is the failing drive of the 2 - BE CAREFUL YOU CHOOSE THE CORRECT ONE
(you probably need to corroborate that info elsewhere)

I'm going to choose the second one - just for tutorial:

Code:
zpool detach rzpool /dev/disk/by-id/abc-0x30000376db771f51

Check your ZFS status again, & if all is good you can go ahead write/erasing the failed drive.

One more thing; MAKE SURE YOU HAVE BACKUPS OF ALL LXCs & VMs BEFORE DOING ANYTHING & ESPECIALLY THE ABOVE
 
  • Like
Reactions: andre85
It has been a replacement of another faulty HDD which I've replaced with this one.
IDK the timespan of these failures - but possibly you want to check why this port/bay/enclosure keeps getting HDD failures. There maybe some underlying reason; power, thermal or electro-conductivity of connectors etc.
 
  • Like
Reactions: andre85
Thank you so much for this write-up! You are a great help!

IDK the timespan of these failures - but possibly you want to check why this port/bay/enclosure keeps getting HDD failures. There maybe some underlying reason; power, thermal or electro-conductivity of connectors etc.

I was using the first HDD in a NAS previously, where it also showed some weird behavior. When I've switched to proxmox as a homeserver, I've started with the old HDD, but the plan to replace it regardless. I've bought this new HDD as a re-certified HDD, so I'm not sure if something went wrong at the re-certification. Since both HDDs are sitting next to each other on a tray (I've got a Kolink Satellite case) I doubt that this is a temp issue (since the first HDD is perfectly fine according to the reports)
 
I've just detached the HDD. zpool status still shows the previous error, but I guess that's normal right?
This is how it looks like:

Code:
root@homeserver:~# zpool status
  pool: nas
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 10.4M in 11:17:21 with 10 errors on Sun Jun  9 11:41:22 2024
config:

        NAME                                   STATE     READ WRITE CKSUM
        nas                                    ONLINE       0     0     0
          mirror-0                             ONLINE       0     0     0
            ata-ST12000VN0008-2YS101_finehdd  ONLINE       0     0    96
            ata-ST12000NE0007-2GT116_faultyhdd  ONLINE       0     0   126

errors: 13 data errors, use '-v' for a list
root@homeserver:~# zpool detach nas ata-ST12000NE0007-2GT116_faultyhdd
root@homeserver:~# zpool status
  pool: nas
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 10.4M in 11:17:21 with 10 errors on Sun Jun  9 11:41:22 2024
config:

        NAME                                 STATE     READ WRITE CKSUM
        nas                                  ONLINE       0     0     0
          ata-ST12000VN0008-2YS101_finehdd  ONLINE       0     0    96
 
zpool status still shows the previous error, but I guess that's normal right?
Only because you have had errors on both copies of some data, which therefore cannot be recovered.
As it suggests: use zpool status nas -v for a list of the corrupted files. Delete or overwrite them (from a backup) to clear those errors.
 
  • Like
Reactions: andre85
Happy you got it all worked out. As leesteken correctly points out the errors mean that the said data was unrecoverable from either disk.

When you get the new disk, you will use this (made up new ID for _newhdd for tutorial):
Code:
zpool attach nas /dev/disk/by-id/ata-ST12000VN0008-2YS101_finehdd /dev/disk/by-id/ata-ST12000VN0008-2YS102_newhdd


Maybe tag prefix the thread-title with [SOLVED], (upper right hand corner under title).
 
Thank you so much! It's currently wiping, which obviously is going to take a while :D
Happy you got it all worked out. As leesteken correctly points out the errors mean that the said data was unrecoverable from either disk.

When you get the new disk, you will use this (made up new ID for _newhdd for tutorial):
Code:
zpool attach nas /dev/disk/by-id/ata-ST12000VN0008-2YS101_finehdd /dev/disk/by-id/ata-ST12000VN0008-2YS102_newhdd


Maybe tag prefix the thread-title with [SOLVED], (upper right hand corner under title).

I was about to ask this :)
I will mark the thread as solved.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!