Issues Replacing Failed Drive

rbeard.js

Member
Aug 11, 2022
56
2
13
Good Afternoon,

I had a drive in my zpool fail. The zpool already read the status of the disk as REMOVED.
I replaced the failed disk with a new one and attempted to replace the failed drive with zpool replace -f <pool> <old-device> <new-device>
This was giving me an error though saying there was no such device in the pool. I was trying to replace dev/sdd with dev/dsk
So I attemted to replace dev/sdd with the actually hdd name "ata-MZ7LM960HMJP0D3_S37KNX0KA09710" and now nothing looks right in my zpool status.

Code:
zpool status
  pool: carbon
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Dec 23 14:26:28 2024
        466G / 890G scanned at 14.1G/s, 0B / 645G issued
        0B resilvered, 0.00% done, no estimated completion time
config:

        NAME                                      STATE     READ WRITE CKSUM
        carbon                                    DEGRADED     0     0     0
          raidz2-0                                DEGRADED     0     0     0
            ata-MZ7LM960HMJP0D3_S37KNX0KA09571    ONLINE       0     0     0
            ata-MZ7LM960HMJP0D3_S37KNX0KA09715    ONLINE       0     0     0
            ata-MZ7LM960HMJP0D3_S37KNX0KA09570    ONLINE       0     0     0
            replacing-3                           DEGRADED     0     0     0
              ata-MZ7LM960HMJP0D3_S37KNX0KA09710  REMOVED      0     0     0
              sdk                                 ONLINE       0     0     0
            ata-MZ7LM960HMJP0D3_S37KNX0KA09567    ONLINE       0     0     0
            ata-MZ7LM960HMJP0D3_S37KNX0KA09707    ONLINE       0     0     0
            ata-MZ7LM960HMJP0D3_S37KNX0KA09583    ONLINE       0     0     0
            ata-MZ7LM960HMJP0D3_S37KNX0KA09573    ONLINE       0     0     0

errors: No known data errors

  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:04 with 0 errors on Sun Dec  8 00:24:07 2024
config:

        NAME                                     STATE     READ WRITE CKSUM
        rpool                                    ONLINE       0     0     0
          ata-CT240BX500SSD1_2244E680A3E7-part3  ONLINE       0     0     0
          ata-CT240BX500SSD1_2244E680A528-part3  ONLINE       0     0     0


Can someone please help me fix this and perhaps explain what I did wrong in the first place?
 
Well proxmox might just be smarter than me because I just got an email saying it finished reslivering and everything looks normal now despit the drive name being out of whack.

Code:
zpool status
  pool: carbon
 state: ONLINE
  scan: resilvered 92.9G in 00:09:35 with 0 errors on Mon Dec 23 14:36:03 2024
config:

        NAME                                    STATE     READ WRITE CKSUM
        carbon                                  ONLINE       0     0     0
          raidz2-0                              ONLINE       0     0     0
            ata-MZ7LM960HMJP0D3_S37KNX0KA09571  ONLINE       0     0     0
            ata-MZ7LM960HMJP0D3_S37KNX0KA09715  ONLINE       0     0     0
            ata-MZ7LM960HMJP0D3_S37KNX0KA09570  ONLINE       0     0     0
            sdk                                 ONLINE       0     0     0
            ata-MZ7LM960HMJP0D3_S37KNX0KA09567  ONLINE       0     0     0
            ata-MZ7LM960HMJP0D3_S37KNX0KA09707  ONLINE       0     0     0
            ata-MZ7LM960HMJP0D3_S37KNX0KA09583  ONLINE       0     0     0
            ata-MZ7LM960HMJP0D3_S37KNX0KA09573  ONLINE       0     0     0

errors: No known data errors

  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:04 with 0 errors on Sun Dec  8 00:24:07 2024
config:

        NAME                                     STATE     READ WRITE CKSUM
        rpool                                    ONLINE       0     0     0
          ata-CT240BX500SSD1_2244E680A3E7-part3  ONLINE       0     0     0
          ata-CT240BX500SSD1_2244E680A528-part3  ONLINE       0     0     0

Anyway to fix this? and I would still love to know what I was doing wrong in the first place if anyone can help me understand.
Thank you
 
Code:
zpool status
  pool: carbon
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Dec 23 14:26:28 2024
        466G / 890G scanned at 14.1G/s, 0B / 645G issued
        0B resilvered, 0.00% done, no estimated completion time
config:

        NAME                                      STATE     READ WRITE CKSUM
        carbon                                    DEGRADED     0     0     0
          raidz2-0                                DEGRADED     0     0     0
            ata-MZ7LM960HMJP0D3_S37KNX0KA09571    ONLINE       0     0     0
            ata-MZ7LM960HMJP0D3_S37KNX0KA09715    ONLINE       0     0     0
            ata-MZ7LM960HMJP0D3_S37KNX0KA09570    ONLINE       0     0     0
            replacing-3                           DEGRADED     0     0     0
              ata-MZ7LM960HMJP0D3_S37KNX0KA09710  REMOVED      0     0     0
              sdk                                 ONLINE       0     0     0
            ata-MZ7LM960HMJP0D3_S37KNX0KA09567    ONLINE       0     0     0
            ata-MZ7LM960HMJP0D3_S37KNX0KA09707    ONLINE       0     0     0
            ata-MZ7LM960HMJP0D3_S37KNX0KA09583    ONLINE       0     0     0
            ata-MZ7LM960HMJP0D3_S37KNX0KA09573    ONLINE       0     0     0

errors: No known data errors

  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:04 with 0 errors on Sun Dec  8 00:24:07 2024
config:

        NAME                                     STATE     READ WRITE CKSUM
        rpool                                    ONLINE       0     0     0
          ata-CT240BX500SSD1_2244E680A3E7-part3  ONLINE       0     0     0
          ata-CT240BX500SSD1_2244E680A528-part3  ONLINE       0     0     0


Can someone please help me fix this and perhaps explain what I did wrong in the first place?
This looks normal for resilvering a replacement drive.
Well proxmox might just be smarter than me because I just got an email saying it finished reslivering and everything looks normal now despit the drive name being out of whack.
You used something like zpool replace carbon old-device /dev/sdk instead of zpool carbon replace old-device /dev/disk/by-id/ata-..... The only way to fix this AFAIK is to run such a command again to replace /dev/sdk with /dev/disk/by-id/ata-... (and using the right disk identifier by looking it up from the output of ls -l /dev/disk/by-id/).

EDIT: TIL that there is a better way: https://forum.proxmox.com/threads/issues-replacing-failed-drive.159542/post-732521
 
Last edited:
id assume I need to remove the disk from the pool again and wipe it then reinstall and replace.
I wasnt aware of the by-id command. that will be helpful in the future
 
id assume I need to remove the disk from the pool again and wipe it then reinstall and replace.
I don't think you need to wipe it. I also think that you cannot remove it from the pool (since it is not a mirror). Just try the replace command? Maybe with -f? Or maybe first mark the vdev offline?
I wasnt aware of the by-id command. that will be helpful in the future
None of this is Proxmox specific and there are other documentation and resources out there on the internet, if you want to learn more about ZFS.
 
I would say you just should export the pool and then again import by path "zpool import carbon -d /dev/disk/by-id/"
 
  • Like
Reactions: leesteken

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!