Clonning a filled ssd drive used as ZFS for a bigger one

FoxtrotZulu

New Member
Jun 8, 2023
23
1
3
Hi. Two days ago after a reboot of the Proxmox server, one of my VMs didn't start back and shows me an error, which I figured out was because of one 1TB SSD got filled (zfs). This specific VM was the only one using that drive as storage.

I got a new 2TB to replace the other one (which I will also use later on).

I remove the drive and cloned the 1TB SSD into the new 2TB SSD on a different PC, knowing in advance that I will be suballocating space, I thought I will be able to fix that later. Unfortunaltelly, I couldn't manage to solve it yet.

ATM: I have put back the 1TB on the Proxmox Server, repartitioned the 2TB SSD, and add via USB with an external enclosure. This is what lsblk shows:

root@mainsrv:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 894.3G 0 disk
├─sda1 8:1 0 894.2G 0 part
└─sda2 8:2 0 8M 0 part
zd0 230:0 0 832G 0 disk
└─zd0p1 230:1 0 832G 0 part
nvme0n1 259:0 0 476.9G 0 disk
├─nvme0n1p1 259:1 0 1007K 0 part
├─nvme0n1p2 259:2 0 512M 0 part /boot/efi
└─nvme0n1p3 259:3 0 476.4G 0 part
├─pve-swap 253:0 0 8G 0 lvm [SWAP]
├─pve-root 253:1 0 96G 0 lvm /
├─pve-data_tmeta 253:2 0 3.6G 0 lvm
│ └─pve-data-tpool 253:4 0 349.3G 0 lvm
│ ├─pve-data 253:5 0 349.3G 1 lvm
│ ├─pve-vm--100--disk--0 253:6 0 100G 0 lvm
│ ├─pve-vm--200--disk--0 253:7 0 50G 0 lvm
│ ├─pve-vm--300--disk--0 253:8 0 150G 0 lvm
│ └─pve-vm--400--disk--0 253:9 0 32G 0 lvm
└─pve-data_tdata 253:3 0 349.3G 0 lvm
└─pve-data-tpool 253:4 0 349.3G 0 lvm
├─pve-data 253:5 0 349.3G 1 lvm
├─pve-vm--100--disk--0 253:6 0 100G 0 lvm
├─pve-vm--200--disk--0 253:7 0 50G 0 lvm
├─pve-vm--300--disk--0 253:8 0 150G 0 lvm
└─pve-vm--400--disk--0 253:9 0 32G 0 lvm


I've reading the docs and different websites but couldn't find something that helps me.
Any indication to where to look for will be appreciated!
Thanks in advance.
 
Hello, did you try to expand your partition with parted?

First, install parted on PVE with apt install parted

Then:

Bash:
parted /dev/<disk-id> # E.g.: /dev/sda (chose the 2TB disk)

# Run print command
(parted) print


It probably will tell you a message that the disk is bigger than the partition table says and if it should fix the partition table accordingly. Press "F" if the message appears.

Then, check the partition number you want to expand (e.g.: "1")

Bash:
(parted) resizepart 1 100%
(parted) print
(parted) quit

After that, your partition will use the entire drive. But you still need to increase the ZFS pool. So:

First, disable "autoexpand" in your pool.

Run this command to see if it's enabled.
Bash:
zpool get autoexpand

# In case of enabled, run to disable:
zpool set autoexpand=off <pool>

Then get the drive ID using this command:
Bash:
ls -la /dev/disk/by-id/

And run:
Bash:
zpool online -e <pool> /dev/disk/by-id/<new zfs partition>
zpool list

It should expand your ZFS pool.
 
It's also strange that you got a sda1+sda2. Usually it's sda1+sda9. So you probalby partitioned that disk yourself?
 
  • Like
Reactions: FoxtrotZulu
Hello, did you try to expand your partition with parted?

First, install parted on PVE with apt install parted

Then:

Bash:
parted /dev/<disk-id> # E.g.: /dev/sda (chose the 2TB disk)

# Run print command
(parted) print


It probably will tell you a message that the disk is bigger than the partition table says and if it should fix the partition table accordingly. Press "F" if the message appears.

Then, check the partition number you want to expand (e.g.: "1")

Bash:
(parted) resizepart 1 100%
(parted) print
(parted) quit

After that, your partition will use the entire drive. But you still need to increase the ZFS pool. So:

First, disable "autoexpand" in your pool.

Run this command to see if it's enabled.
Bash:
zpool get autoexpand

# In case of enabled, run to disable:
zpool set autoexpand=off <pool>

Then get the drive ID using this command:
Bash:
ls -la /dev/disk/by-id/

And run:
Bash:
zpool online -e <pool> /dev/disk/by-id/<new zfs partition>
zpool list

It should expand your ZFS pool.
Thanks for the prompt and detailed answer.

When I run 'parted print':

(parted) print
Model: sage 3639S (scsi)
Disk /dev/sdb: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number Start End Size File system Name Flags

(parted)

If I try to make a new partition, I don't know what to enter on "end"


edit: forgot to mention that there was no message saying that the disk is bigger than the partition table says and if it should fix the partition table accordingly.
 
Last edited:
It's also strange that you got a sda1+sda2. Usually it's sda1+sda9. So you probalby partitioned that disk yourself?
After my first clone attempt, and seeing that it was not working, I repartitioned on a different PC.
 
Disk Flags:

Number Start End Size File system Name Flags

(parted)
Ohhh it seems that you don't have any partitions. Could you try to use zpool replace to replace the 1TB drive by the new one in this pool?

So, you can do something like:

Bash:
# List disks
# Find new and old disk ID
zpool status
ls -la /dev/disk/by-id/

# Copy partitions table to the new drive
sgdisk /dev/disk/by-id/<old device> -R /dev/disk/by-id/<new device>
sgdisk -G /dev/disk/by-id/<new device>

# Replace ZFS partition
zpool replace -f <pool> <old zfs partition> <new zfs partition>

# Monitor resilvering process
watch zpool status -v

Then, follow my previous tutorial to expand the partition :)
 
  • Like
Reactions: FoxtrotZulu
Ohhh it seems that you don't have any partitions. Could you try to use zpool replace to replace the 1TB drive by the new one in this pool?

So, you can do something like:

Bash:
# List disks
# Find new and old disk ID
zpool status
ls -la /dev/disk/by-id/

# Copy partitions table to the new drive
sgdisk /dev/disk/by-id/<old device> -R /dev/disk/by-id/<new device>
sgdisk -G /dev/disk/by-id/<new device>

# Replace ZFS partition
zpool replace -f <pool> <old zfs partition> <new zfs partition>

# Monitor resilvering process
watch zpool status -v

Then, follow my previous tutorial to expand the partition :)
zpool status:

pool: ssd960
state: ONLINE
scan: scrub repaired 0B in 00:41:56 with 0 errors on Sun May 14 01:05:57 2023
config:

NAME STATE READ WRITE CKSUM
ssd960 ONLINE 0 0 0
ata-KINGSTON_SA400S37960G_50026B778439F033 ONLINE 0 0 0

errors: No known data errors


should I now run:
Code:
ls -la /dev/sdb
?

also no issues with loosing anything on the orignal drive correct?
 
also no issues with loosing anything on the orignal drive correct?
It's important to always do a backup :) but if something goes wrong, the 1TB SSD shouldn't be changed, so you can reimport the zpool if necessary.

should I now run:

Code:
ls -la /dev/sdb
?
To replace disks, we always need to use an unique identifier of the disk. the /dev/sd* can change if you change the internal sata connections, so it's not reliable. If you get the right disk-id, it won't change (as it's linked with the serial number of the device).

so, for your case, you can use:
pool: ssd960
old device: ata-KINGSTON_SA400S37960G_50026B778439F033
new device: You need to discover using ls -la /dev/disk/by-id/
old zfs partition: ata-KINGSTON_SA400S37960G_50026B778439F033
new zfs partition: You need to discover using ls -la /dev/disk/by-id/
 
  • Like
Reactions: FoxtrotZulu
so many thanks for your time.

By the way... terrible choice for ZFS as it is not only missing a power-loss protection but also uses QLC NAND. So even way slower and less durable than the common TLC consumer SSDs.
I used one that I have, that is why I bought a Samsung 870 EVO now.
 
It's important to always do a backup :) but if something goes wrong, the 1TB SSD shouldn't be changed, so you can reimport the zpool if necessary.


To replace disks, we always need to use an unique identifier of the disk. the /dev/sd* can change if you change the internal sata connections, so it's not reliable. If you get the right disk-id, it won't change (as it's linked with the serial number of the device).

so, for your case, you can use:
pool: ssd960
old device: ata-KINGSTON_SA400S37960G_50026B778439F033
new device: You need to discover using ls -la /dev/disk/by-id/
old zfs partition: ata-KINGSTON_SA400S37960G_50026B778439F033
new zfs partition: You need to discover using ls -la /dev/disk/by-id/
Code:
root@mainsrv:/dev# ls -la /dev/disk/by-id/
total 0
drwxr-xr-x 2 root root 620 Jun  8 13:14 .
drwxr-xr-x 8 root root 160 Jun  8 13:14 ..
lrwxrwxrwx 1 root root   9 Jun  8 13:14 ata-KINGSTON_SA400S37960G_50026B778439F033 -> ../../sda
lrwxrwxrwx 1 root root  10 Jun  8 13:14 ata-KINGSTON_SA400S37960G_50026B778439F033-part1 -> ../../sda1
lrwxrwxrwx 1 root root  10 Jun  8 13:14 ata-KINGSTON_SA400S37960G_50026B778439F033-part2 -> ../../sda2
lrwxrwxrwx 1 root root   9 Jun  8 13:30 ata-Samsung_SSD_870_EVO_2TB_S6PNNS0W213686T -> ../../sdb
lrwxrwxrwx 1 root root  10 Jun  8 13:14 dm-name-pve-root -> ../../dm-1
lrwxrwxrwx 1 root root  10 Jun  8 13:14 dm-name-pve-swap -> ../../dm-0
lrwxrwxrwx 1 root root  10 Jun  8 13:14 dm-name-pve-vm--100--disk--0 -> ../../dm-6
lrwxrwxrwx 1 root root  10 Jun  8 13:14 dm-name-pve-vm--200--disk--0 -> ../../dm-7
lrwxrwxrwx 1 root root  10 Jun  8 13:14 dm-name-pve-vm--300--disk--0 -> ../../dm-8
lrwxrwxrwx 1 root root  10 Jun  8 13:14 dm-name-pve-vm--400--disk--0 -> ../../dm-9
lrwxrwxrwx 1 root root  10 Jun  8 13:14 dm-uuid-LVM-sLCgVzFpXLWRFau2zaymp2qdWyHUP3VgahHdZz9aYcV3lE4Ys334PTTNPR22tPf8 -> ../../dm-1
lrwxrwxrwx 1 root root  10 Jun  8 13:14 dm-uuid-LVM-sLCgVzFpXLWRFau2zaymp2qdWyHUP3VgBOVBSwfLh9Hq4jMaY9CDp0XGrR66neAl -> ../../dm-9
lrwxrwxrwx 1 root root  10 Jun  8 13:14 dm-uuid-LVM-sLCgVzFpXLWRFau2zaymp2qdWyHUP3Vgd63fMHWWFddh2HDiIjnSpcZ0CHiCcckm -> ../../dm-0
lrwxrwxrwx 1 root root  10 Jun  8 13:14 dm-uuid-LVM-sLCgVzFpXLWRFau2zaymp2qdWyHUP3VggEhSc6fK1I40fpmI5Gy28f95c84Ndc3C -> ../../dm-8
lrwxrwxrwx 1 root root  10 Jun  8 13:14 dm-uuid-LVM-sLCgVzFpXLWRFau2zaymp2qdWyHUP3VgOLT1LB1NVK0G2BE3RACNfvxXMH476UBR -> ../../dm-6
lrwxrwxrwx 1 root root  10 Jun  8 13:14 dm-uuid-LVM-sLCgVzFpXLWRFau2zaymp2qdWyHUP3VgZ3ZH30OIK1qx10OwCKUVADuNqrwqkzeE -> ../../dm-7
lrwxrwxrwx 1 root root  15 Jun  8 13:14 lvm-pv-uuid-3P2ysl-hI6G-E0r9-4pPU-WSaV-ef3C-ikcchj -> ../../nvme0n1p3
lrwxrwxrwx 1 root root  13 Jun  8 13:14 nvme-eui.002538d22149df1e -> ../../nvme0n1
lrwxrwxrwx 1 root root  15 Jun  8 13:14 nvme-eui.002538d22149df1e-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root  15 Jun  8 13:14 nvme-eui.002538d22149df1e-part2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root  15 Jun  8 13:14 nvme-eui.002538d22149df1e-part3 -> ../../nvme0n1p3
lrwxrwxrwx 1 root root  13 Jun  8 13:14 nvme-SAMSUNG_MZVLQ512HBLU-00BH1_S671NJ0T217231 -> ../../nvme0n1
lrwxrwxrwx 1 root root  15 Jun  8 13:14 nvme-SAMSUNG_MZVLQ512HBLU-00BH1_S671NJ0T217231-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root  15 Jun  8 13:14 nvme-SAMSUNG_MZVLQ512HBLU-00BH1_S671NJ0T217231-part2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root  15 Jun  8 13:14 nvme-SAMSUNG_MZVLQ512HBLU-00BH1_S671NJ0T217231-part3 -> ../../nvme0n1p3
lrwxrwxrwx 1 root root   9 Jun  8 13:30 wwn-0x5002538f332221ca -> ../../sdb
lrwxrwxrwx 1 root root   9 Jun  8 13:14 wwn-0x50026b778439f033 -> ../../sda
lrwxrwxrwx 1 root root  10 Jun  8 13:14 wwn-0x50026b778439f033-part1 -> ../../sda1
lrwxrwxrwx 1 root root  10 Jun  8 13:14 wwn-0x50026b778439f033-part2 -> ../../sda2

so I run:

Code:
root@mainsrv:/dev# sgdisk /dev/disk/by-id/KINGSTON_SA400S37960G_50026B778439F033 -R /dev/disk/by-id/ata-Samsung_SSD_870_EVO_2TB_S6PNNS0W213686T
Problem opening /dev/disk/by-id/KINGSTON_SA400S37960G_50026B778439F033 for reading! Error is 2.
The specified file does not exist!

??
 
so I run:

Code:
root@mainsrv:/dev# sgdisk /dev/disk/by-id/KINGSTON_SA400S37960G_50026B778439F033 -R /dev/disk/by-id/ata-Samsung_SSD_870_EVO_2TB_S6PNNS0W213686T
Problem opening /dev/disk/by-id/KINGSTON_SA400S37960G_50026B778439F033 for reading! Error is 2.
The specified file does not exist!

??
you missed the ata- for the ata-KINGSTON_SA400S37960G_50026B778439F033 device :)
 
  • Like
Reactions: FoxtrotZulu
:rolleyes::rolleyes::rolleyes:

thanks
Code:
root@mainsrv:/dev# sgdisk /dev/disk/by-id/ata-KINGSTON_SA400S37960G_50026B778439F033 -R /dev/disk/by-id/ata-Samsung_SSD_870_EVO_2TB_S6PNNS0W213686T
The operation has completed successfully.
root@mainsrv:/dev# sgdisk -G /dev/disk/by-id/ata-Samsung_SSD_870_EVO_2TB_S6PNNS0W213686T
The operation has completed successfully.

How to know the new zfs pool name?

Code:
root@mainsrv:/dev# zpool status
  pool: ssd960
 state: ONLINE
  scan: scrub repaired 0B in 00:41:56 with 0 errors on Sun May 14 01:05:57 2023
config:

        NAME                                          STATE     READ WRITE CKSUM
        ssd960                                        ONLINE       0     0     0
          ata-KINGSTON_SA400S37960G_50026B778439F033  ONLINE       0     0     0

errors: No known data errors
 
How to know the new zfs pool name?

Code:
root@mainsrv:/dev# zpool status
  pool: ssd960
 state: ONLINE
  scan: scrub repaired 0B in 00:41:56 with 0 errors on Sun May 14 01:05:57 2023
config:

        NAME                                          STATE     READ WRITE CKSUM
        ssd960                                        ONLINE       0     0     0
          ata-KINGSTON_SA400S37960G_50026B778439F033  ONLINE       0     0     0

errors: No known data errors

The zpool name doesn't change :)
For your case, the replace command will be:

Bash:
zpool replace -f ssd960 ata-KINGSTON_SA400S37960G_50026B778439F033 ata-Samsung_SSD_870_EVO_2TB_S6PNNS0W213686T

OBS: I noticed your drive has 2 partitions and the ZFS pool is in the partition 1. If you continue, probably you will face the same issue, because between the zfs partition and the free space, you have an 8MB partition.

I'm not sure if you can do it, but you can try to delete the partition 2 after the resilvering process. If someone has more expertise on it, they can confirm. But you can try in your side, if it fails, you still have the 1TB drive unchanged :)
If it's possible, you can delete the partition 2 and expand everything to the first partition, following my first tutorial :)
 
  • Like
Reactions: FoxtrotZulu
I used one that I have, that is why I bought a Samsung 870 EVO now.
Still no power-loss protection and not great TBW/DWPD, so not recommeded for ZFS, but at least not as worse because it got TLC NAND.
I'm not sure if you can do it, but you can try to delete the partition 2 after the resilvering process. If someone has more expertise on it, they can confirm. But you can try in your side, if it fails, you still have the 1TB drive unchanged :)
If it's possible, you can delete the partition 2 and expand everything to the first partition, following my first tutorial :)
I usually boot a gparted ISO and then use that to move the partition to the end of the disk. So that there would be all the unallocated space between partition 1 and 2 instead of at the end after partition 2, so you can extend partition 1.
 
Last edited:
  • Like
Reactions: FoxtrotZulu
The zpool name doesn't change :)
For your case, the replace command will be:

Bash:
zpool replace -f ssd960 ata-KINGSTON_SA400S37960G_50026B778439F033 ata-Samsung_SSD_870_EVO_2TB_S6PNNS0W213686T

OBS: I noticed your drive has 2 partitions and the ZFS pool is in the partition 1. If you continue, probably you will face the same issue, because between the zfs partition and the free space, you have an 8MB partition.

I'm not sure if you can do it, but you can try to delete the partition 2 after the resilvering process. If someone has more expertise on it, they can confirm. But you can try in your side, if it fails, you still have the 1TB drive unchanged :)
If it's possible, you can delete the partition 2 and expand everything to the first partition, following my first tutorial :)
seems I will keep bothering you!

Code:
root@mainsrv:/dev# zpool replace -f ssd960 ata-KINGSTON_SA400S37960G_50026B778439F033 ata-Samsung_SSD_870_EVO_2TB_S6PNNS0W213686T
invalid vdev specification
the following errors must be manually repaired:
/dev/disk/by-id/ata-Samsung_SSD_870_EVO_2TB_S6PNNS0W213686T-part1 is part of active pool 'ssd960'
 
Still no power-loss protection and not great TBW/DWPD, so not recommeded for ZFS, but at least not as worse because it got TLC NAND. Recommended would be something like a Samsung PM883/PM893/PM897.

I usually boot a gparted ISO and then use that to move the partition to the end of the disk. So that there would be all the unallocated space between partition 1 and 2 instead of at the end after partition 2, so you can extend partition 1.
what will be a good choice?
 
seems I will keep bothering you!

Code:
root@mainsrv:/dev# zpool replace -f ssd960 ata-KINGSTON_SA400S37960G_50026B778439F033 ata-Samsung_SSD_870_EVO_2TB_S6PNNS0W213686T
invalid vdev specification
the following errors must be manually repaired:
/dev/disk/by-id/ata-Samsung_SSD_870_EVO_2TB_S6PNNS0W213686T-part1 is part of active pool 'ssd960'
Could you run zpool status ?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!