Replace drives in 1 of 3 vdev of one zpool, changing the recordsize and rebalancing

Stumpy

Active Member
Jan 2, 2020
14
2
43
37
Hi

My ZFS tank is reaching its capacity limit and needs to be expanded.
It consists of 3 vdevs Raidz2 and 10 TB HDDs each.
I would now like to expand one of these vdevs from 8 x 10 TB to 8 x 26 TB.

Since I have never done this before, I wanted to ask first to make sure I don't make a mistake.
I would like to perform the replacement during operation and, if it offers a speed advantage, perform the resilvering in a non-degraded state.
This means that I connect the new HDD for the resilvering to a SATA port, start the resilvering, and when it is complete, I remove the old HDD from the front and install the new HDD there.
To use the SATA port, I still have to export the zpool connected there.
The record size should also be increased from the current 128k to 1mb to reduce overhead. (5.3% -> 0.3%; the tank mainly contains very large files ranging from 300MB to 50GB, no db on the tank; https://jro.io/capacity/)


Questions:
  1. Is it faster if I do the replacement while the old HDD is still installed? (copying the data instead of reconstructing them?)
  2. Changing the record size only affects newly written data. The “zfs-inplace-rebalancing” script copies the data and deletes the old data, which means that the new data would then be written in the new record size.
    If I now perform a rebalance with, for example, zfs rewrite, my understanding is that it can distribute the data across the VDEVs, but the data is not completely rewritten and changing the record size therefore has no effect. (https://github.com/openzfs/zfs/pull/17246)
    How do programs that recognize new data, such as Nextcloud, Jellyfin, etc., react? Would they recognize that each file is a new file and report it as such?
    What would you recommend for applying the new recordsize to all existing files?
  3. Could there be any problems/is there anything to consider if I perform the replacement on the SATA port and only then install the HDD in the front?
  4. Is my list of instructions correct?
The setup is as follows

Code:
        tank                        ONLINE       0     0     0
          raidz2-0                  ONLINE       0     0     0
            wwn-0x5000cca27ec7c0b0  ONLINE       0     0     0
            wwn-0x5000cca27ec76a9f  ONLINE       0     0     0
            wwn-0x5000cca27ec80322  ONLINE       0     0     0
            wwn-0x5000cca273e825f8  ONLINE       0     0     0
            wwn-0x5000cca27ec887ef  ONLINE       0     0     0
            wwn-0x5000cca267f793e5  ONLINE       0     0     0
            wwn-0x5000cca27ec7733e  ONLINE       0     0     0
            wwn-0x5000cca27ec7a63b  ONLINE       0     0     0
          raidz2-1                  ONLINE       0     0     0
            wwn-0x5000cca27ec6a49c  ONLINE       0     0     0
            wwn-0x5000cca27ec6e6dc  ONLINE       0     0     0
            wwn-0x5000cca27ec7d65c  ONLINE       0     0     0
            wwn-0x5000cca27ec20fdb  ONLINE       0     0     0
            wwn-0x5000cca267f7e9c2  ONLINE       0     0     0
            wwn-0x5000cca27ec7fa6c  ONLINE       0     0     0
            wwn-0x5000cca27ec781b9  ONLINE       0     0     0
            wwn-0x5000cca273f436ee  ONLINE       0     0     0
          raidz2-2                  ONLINE       0     0     0
            wwn-0x5000cca27ec7acd6  ONLINE       0     0     0
            wwn-0x5000cca27ec75995  ONLINE       0     0     0
            wwn-0x5000cca267f7dc49  ONLINE       0     0     0
            wwn-0x5000cca27ec7d366  ONLINE       0     0     0
            wwn-0x5000cca27ec7202f  ONLINE       0     0     0
            wwn-0x5000cca27ec7a215  ONLINE       0     0     0
            wwn-0x5000cca27ec7c4bf  ONLINE       0     0     0
            wwn-0x5000cca27ec7c485  ONLINE       0     0     0

---------------------------------------------------------------------------------------------------------------------------------------------------------

Start of instructions

0. Preparation
0a. Enable autoexpand

zpool set autoexpand=on tank

This ensures that the pool will automatically use the full capacity of the new HDDs after the resilvering is complete.

0b. Export scratch pool

zpool export scratch3

Cleanly detaches the pool from the system.
After exporting, the pool is not available for other operations, but the data remains intact.

0c. Change the recordsize of the tank from 128k to 1mb

zfs set recordsize=1M tank

This reduces overhead and increases the usable capacity in the pool.
Only affects newly written data.

1. Install new HDD

Attach the new 26 TB HDD to a free SATA port.
Check whether the controller recognizes the disk:

lsscsi -g

Or check in the Proxmox GUI.

2. Determine the path of the new HDD

ls -l /dev/disk/by-id/ | grep <part_of_the_wwn_or_model>

3. Delete/prepare the new HDD

wipefs -a /dev/disk/by-id/<id_of_the_new_disk>

Removes old partition tables and signatures so that zpool replace works properly.

4. Online replacement of a 10 TB HDD

zpool replace tank <old_wwn> /dev/disk/by-id/<new_drive>

Example:

zpool replace tank wwn-0x5000cca27ec7c0b0 /dev/disk/by-id/wwn-0x6000cca27xxxxxxxx

To Check the progress of the resilvering:

zpool status tank

5. Replace the front HDD

Once the resilvering is complete, remove the old 10 TB disk from the front and install the new 26 TB disk.

Check the status again:

zpool status tank

Everything must be ONLINE.

6. Repeat

Repeat steps 1–5 until all 8 HDDs of a VDEV have been replaced.

7. Check integrity

After all replacement processes:

zpool scrub tank

Checks the entire pool and initializes the new storage area correctly.

8. Re-mount scratch pool

zpool import scratch3

Check:

zpool status scratch3


Thanks for reading.
Greetings Stumpy
 
Bad idea as your new disks are 2.6x in size as the previous once you will end up after all resilvering and rebalancing that 2.6x times any writes and any reads hit your biggest vdev and as it's a raidz(2) you end up with the iops of a single disk. But nobody will hold you back, get the fun which will take endless if your pool is already full until done if you want that really
:cool:
 
Bad idea as your new disks are 2.6x in size as the previous once you will end up after all resilvering and rebalancing that 2.6x times any writes and any reads hit your biggest vdev and as it's a raidz(2) you end up with the iops of a single disk. But nobody will hold you back, get the fun which will take endless if your pool is already full until done if you want that really
:cool:
do you mean only the rebalancing or the replacement of the vdev itself?
 
Is it faster if I do the replacement while the old HDD is still installed?

Probably not, I am not sure. But it is by far more safe!

The risk lies always in the rebuild-process because the vdev has to actively read all drives to reconstruct the data for the new one. That's (a little, little bit) dangerous.

If you "replace" the drive while the old one still being connected the required data for the new drive is available twice: once via the reconstruction mechanism as before and a second time plainly from the old drive. (I am not sure how that replacement is actually handled...)

For a RaizZ2 this might look like nitpicking, as there is still one drive redundancy available when one old disk is removed. On the other hand ...Z2 is present for a reason - and I would always choose the safer path.