Replacing software raid disk with regular and ZFS partitions, PVE 9.1.5

Sami Vakkuri

New Member
Jun 5, 2025
9
0
1
Finland
www.atva.fi
One of two SSDs broke on our server - I did not realize this at first because everything continued to work without problems (server was not under great load, so I did not see any performance drops). Interesting enough, Proxmox did not inform about this breakdown anyhow, I was just checking node settings and realized that ZFS pool is in "DEGRADED" state, and under the details there was log entry that one SSD disk "has been removed" - well turned out that SSD just died.

Anyhow, situation is now fixed on a hardware side: There is a new identical SSD to replace the broken one and after powering server up, new drive was detected by Proxmox.

However, I tried to follow the leads how can I actually attach this new drive as RAID member so that Proxmox could use it as it was using the old drive. I was only able to find answers in situations where whole system is installed on ZFS, but in our case only VMs are running from ZFS: Proxmox, swap and boot partitions are just regular ones (see attached images). I know that I can replace ZFS-pool disk with zpool replace -command, but how do I re-create all partitions on a new disk before ZFS-pool replacement?

I was only able to "Initialize Disk with GPT" ... no clue how to replace other partitions. As an example, I attached image from other server where there is identical hw/partition structure (marked with green)
 

Attachments

  • pve-raid-replace-ssd-1.png
    pve-raid-replace-ssd-1.png
    67.2 KB · Views: 7
  • pve-raid-replace-ssd-2.png
    pve-raid-replace-ssd-2.png
    73.1 KB · Views: 7
  • pve-raid-replace-ssd-3.png
    pve-raid-replace-ssd-3.png
    83.8 KB · Views: 7
  • pve-raid-replace-ssd-4.png
    pve-raid-replace-ssd-4.png
    46 KB · Views: 6
PVE should email the root user about such ZFS events. Is the email address of the root user correct? Maybe zfs-zed is not installed or the configuration is incorrect?
Your old (and the still existing) drive use partition 5 of the drive for the pool. Who did the partitioning before and why is it partitioned this way? Usually you use the whole drive or the third partition (typically a boot pool). I cannot really help with such a non-standard setup.
After partitioning, you can use this section in the manual to replace the broken/missing one: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysadmin_zfs_change_failed_dev
 
Oops, for emails it seemed that I was configured custom Notification target (I am not using email address for root user), since I am sending emails via Microsoft 365 account, but I forgot to change Notification Matcher to use this custom target! So this is now fixed.

Partitioning came from OVHcloud template, however according to my own documentation I did make changes to partitioning:
  • Customized partition configuration
    • Removed 4th partition (/var/lib/vz)
    • Changed swap partition size 1024 > 8192
    • Added partition, zfs (/var/lib/vz)
This was probably because I wanted to reserve more SWAP for the system BUT notice that I did not change the original 4th partition file system, it was set to be ZFS.

Since this server is now emptied from production services because of the incident, I can start from scratch with this by reinstalling the system. I attached image from OVHcloud system installation wizard partitioning section (nothing changed for now from defaults), what would you recommend to change here for PVE9?

Server hardware:
AMD Ryzen 9 5900X - 12c/24t - 3.7 GHz/4.8 GHz
128 GB ECC 2666 MHz
2×512 GB SSD NVMe, Soft RAID (Samsung MZVL2512HCJQ-00B07)
 

Attachments

  • pve9-ovhcloud.png
    pve9-ovhcloud.png
    48.2 KB · Views: 6