Replacing a disk in a Proxmox install two disk ZFS Mirror

halb9

New Member
Nov 4, 2018
5
0
1
54
Hello beautiful people,

I hope you're doing well.

I tried to use my additional free time these days in a useful manner and started playing with a new Proxmox installation and the ZFS filesystem.

I wanted to install basic Proxmox VE system with zfs-mirroring on two drives. After that simulate a drive failure by unplugging one of the disks. Replacing the "failing" drive with a new good one. Syncing the mirror and replace the old drive in the pool with the new one.

I installed Proxmox onto a 500gb hard drive and a 500gb SSD.

01-proxmox-zfs-mirror-install.png

This worked as expected. It shows up in the GUI as well as console.

02-GUI-ZFS-online.png
03-ssh-ZFS-online.png

I then unplugged the hard disk /dev/sda and shortly after the pool showed up as degraded with /dev/sda missing.

04-GUI-ZFS-degraded.png
05-ssh-ZFS-degraded.png

My server does not support hotplug so I had to shut it down, replace the disk and fire it back up again.
I inserted a blank new ssd and the server did boot up again. Which was quite nice.
In ZFS:_Tips_and_Tricks#Replacing_a_failed_disk_in_the_root_pool it says it
could be interesting if it's /dev/sda that's failing.
So i was a bit worried here. But lucky me, this worked without any problems.

In the GUI the old disk is shown as completely missing. And on the console you see a new /dev/sda is there, however apart from that nothing has been done to it yet.

06-GUI-ZFS-missing.png
07-ssh-ZFS-missing.png

This the part where I'm honestly get a bit confused.

In ZFS:_Tips_and_Tricks#Replacing_a_failed_disk_in_the_root_pool the example is for RAIDZ-1 (RAID5) szenario. I don't have that. I have mirroring.
Additionaly it says to install grub. But in ZFS_on_Linux#_bootloader it says when EFI is used instead of Legacy BIOS, Proxmox uses systemd-boot instead of grub. My server is currently configured with UEFI instead of LegacyBIOS.
Therefore the steps mentioned in ZFS_on_Linux#_zfs_administration Changing a failed device should apply to me, right?

If so, I have the following question.

Code:
# zpool replace -f rpool <old device> <new device>
What exactly is 'old device'? Is it /dev/disk/by-id/ata-ST9500420ASG_SERiAL-part3 ? Or is it the number 8350069619613282498 ?
What is new device? Is it /dev/sda ? Is it the new /dev/disk/by-id/ata-Samsung_SSD_860_EVO_500GB_SERiAL ?

I'm confused. What is the logic here?

Thank you in advance

halb9
 
I haven't been in such a situation myself but if that is what zpool reports, try to use the identifier 835... in the replace command. The worst that can happen is that the command will not run and fail because it does not know what to do with it.

It's best if you identify the new disk via the /dev/disk/by-id path. This way it can be identified clearly by ZFS and you will see exactly which disk is which in the zpool output. /dev/sda is a but ambiguous.
 
Hi,

I had have your case (mirror with sda and sdb)! The best to use in any case is /dev/disk/by-id(it works for me in many situations without any problem), as also @aaron tell you.

In susch a case like you my cook book is:


- remove the faulty drive from the system(what you do it already)
- add the new disk(what you do it already)
- put the faulty drive on my desk (so i can see the SN printed on the label and use my smartphone to magnify the SN: my eayes are old ...)
- run "zpool status -v"(so I can compare the zpool faulty SN and make a copy of the path)
- run ls -l /dev/disk/by-id/* so I can see the new disk SN
- make the partions on the new disk(or copy the partions from the good disk to the new disk in case of mirror, using sfdisk)
- write on the new disk the EFI data
- reboot
- check if you can boot from new device(BIOS option)
- replace:

zpool replace -f rpool <old device> <new device>

- check 3 times the SN for old/new device, and then HIT the ENTER, and not before ... ;)

Good luck/Bafta
 
Ok, I did this so far.

Code:
root@hv01:~# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0 465.8G  0 disk 
sdb      8:16   0 465.8G  0 disk 
|-sdb1   8:17   0  1007K  0 part 
|-sdb2   8:18   0   512M  0 part 
`-sdb3   8:19   0 465.3G  0 part

Code:
root@hv01:~# zpool status -v
  pool: rpool
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: none requested
config:

    NAME                                                     STATE     READ WRITE CKSUM
    rpool                                                    DEGRADED     0     0     0
      mirror-0                                               DEGRADED     0     0     0
        8350069619613282498                                  UNAVAIL      0     0     0  was /dev/disk/by-id/ata-ST9500420ASG_SERiAL-part3
        ata-Samsung_SSD_860_EVO_500GB_SERiAL-SSD-2-part3  ONLINE       0     0     0

errors: No known data errors

Code:
root@hv01:~# sgdisk /dev/disk/by-id/ata-Samsung_SSD_860_EVO_500GB_SERiAL-SSD-2 -R /dev/disk/by-id/ata-Samsung_SSD_860_EVO_500GB_SERiAL-SSD-1
The operation has completed successfully.

sgdisk -G /dev/disk/by-id/ata-Samsung_SSD_860_EVO_500GB_SERiAL-SSD-2
The operation has completed successfully.

Code:
root@hv01:~# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0 465.8G  0 disk 
|-sda1   8:1    0  1007K  0 part 
|-sda2   8:2    0   512M  0 part 
`-sda3   8:3    0 465.3G  0 part 
sdb      8:16   0 465.8G  0 disk 
|-sdb1   8:17   0  1007K  0 part 
|-sdb2   8:18   0   512M  0 part 
`-sdb3   8:19   0 465.3G  0 part

The next step should probably be this, right?
Code:
zpool replace -f rpool 8350069619613282498 /dev/disk/by-id/ata-Samsung_SSD_860_EVO_500GB_SERiAL-SSD-2-part3

Followed by
Code:
pve-efiboot-tool format /dev/sda2
pve-efiboot-tool init /dev/sda2
correct?
 
Ok, I did this so far.
Why you run the command to generalize the guid on the good drive and not the new one? .
so why on sgdisk -G /dev/disk/by-id/ata-Samsung_SSD_860_EVO_500GB_SERiAL-SSD-2
and not on sgdisk -G /dev/disk/by-id/ata-Samsung_SSD_860_EVO_500GB_SERiAL-SSD-1 ???
 
zpool replace -f rpool 8350069619613282498 /dev/disk/by-id/ata-Samsung_SSD_860_EVO_500GB_SERiAL-SSD-2-part3
This wont work since you are using the remaining disk which is online. It will not let you continue. Also you picked the wrong disk. You need to enter there the /dev/disk/by-id/new_disk_to_replace_the_old_one
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!