Replacement faulty HDD which is part of a ZFS data volume
For years i am using vmware but since the took-over by broadsoft i'm finding my way over to ProxMox.
First step is running an older HPE ML30 Gen 10 (non-hot-swap) for private use.
The ML30 is build upon
64Gb internal memory
1x NVME (HPE VR000480KXLX) 480 Gb for pve with periodic backup
2x SSD (Seagate_IronWolf_ZA2000NM10002) in a LFS volume for VM's OS
2x HDD (ST6000NT001-3M1101) in a LFS volume only for data use
Running one linux lxc's and 3 windows vm's
PVE
prompt:~# pveversion
pve-manager/9.1.6/71482d1833ded40a (running kernel: 6.17.13-1-pve)
Running on the NVME
ZFS
Memory capped with
options zfs zfs_arc_max=8589934592
No minimum set
One of the HDD (ST6000NT001) is definitly failing with 2215 reallocated sectors and 212 pending sectors and needs to be replaced.
I ordered a new HDD and waiting for delivery. But then, how to replace the disk (linux newbie)?
I searched the forum and internet general and found bit's and pieces so i asked Claude for advice (i know, never trust AI)
Claude has produced the following manual based on information i send, i would sincerely appriciate it if someone would give this a check.
Thnx in advance, edwin.
1. Verify Current Pool and Backup Status
Confirm pool health and check the last backup timestamp:
zpool status
pvesm status
Expected: VMDATA-HDD-ZFS shows ONLINE with both mirror drives present and errors: No known data errors.
!! Perform, and verify, a backup of PVE, XLC, VM's before continue
2.Gracefully Shut Down All VMs and Containers
The ML30 Gen10 does not support hot-swap. All VMs and containers must be stopped before powering down the server.
# List all running VMs
qm list
# Shutdown each running VM (repeat for each VMID)
qm shutdown <VMID>
# List all running containers
pct list
# Shutdown each running container (repeat for each CTID)
pct shutdown <CTID>
Wait until all VMs and containers report stopped before proceeding.
3. Power Down the Server
Once all VMs and containers are stopped, shut down the server:
shutdown -h now
Wait for the server to fully power off before opening the chassis.
!! DO NOT pull the drive while the server is running — the ML30 Gen10 does not support hot-swap and doing so risks data loss or hardware damage. !!
4. Physically Replace the Drive
With the server powered off:
1. Locate sdc — the Seagate ST6000NT001 with S/N xxxxxx
2. Remove the failing drive from its bay
3. Insert the new 6TB (or larger) replacement drive
4. Power the server back on
5. Identify the New Drive
After the server has booted, identify the device name assigned to the new drive:
lsblk -o NAME,SIZE,TYPE,ROTA,TRAN,MODEL
The new drive appears as a new sdX with no partitions. Use this device name in the commands below
(example: /dev/sde).
6. Wipe the New Drive
Clear (just to make sure) any old partition tables or ZFS labels to prevent import conflicts:
wipefs -a /dev/sdX
Replace sdX with your actual new drive device name.
7. Replace the Failing Drive in ZFS
Issue the ZFS replace command using the exact serial-based identifier from your pool:
zpool replace VMDATA-HDD-ZFS ata-ST6000NT001-3M1101_WX00WVSL /dev/sdX
ZFS immediately begins resilvering — copying all data from the healthy sda to the new drive. The pool stays fully online and accessible during this process.
!! If sdc shows as UNAVAIL or REMOVED in zpool status, use this alternative command instead: !!
zpool replace VMDATA-HDD-ZFS sdc /dev/sdX
8. Monitor Resilver Progress
Watch the resilver in real time:
watch -n 5 zpool status
With ~3.9 TB of data to copy, expect 10–14 hours. Do NOT power off the server during resilvering — doing so will restart it from scratch.
!! Resilvering should and with something like this: SUCCESS: The scan line will read: resilvered X.XG in HH:MM:SS with 0 errors
9. Run a Full Scrub After Resilver Completes
Once resilvering is complete, run a scrub to verify full data integrity across both drives:
zpool scrub VMDATA-HDD-ZFS
Monitor scrub progress:
watch -n 10 zpool status
A clean scrub with 0 errors confirms your mirror is fully healthy and protected again.
10. Optional: Upgrade Pool Features ( !! NOT SURE IF I SHOULD DO THIS !! )
The pool reports some supported features are not yet enabled. After a successful scrub you may optionally upgrade:
zpool upgrade VMDATA-HDD-ZFS
Note: After upgrading, the pool may not be importable on older ZFS versions. Only do this if you will not need to import it on older systems.
For years i am using vmware but since the took-over by broadsoft i'm finding my way over to ProxMox.
First step is running an older HPE ML30 Gen 10 (non-hot-swap) for private use.
The ML30 is build upon
64Gb internal memory
1x NVME (HPE VR000480KXLX) 480 Gb for pve with periodic backup
2x SSD (Seagate_IronWolf_ZA2000NM10002) in a LFS volume for VM's OS
2x HDD (ST6000NT001-3M1101) in a LFS volume only for data use
Running one linux lxc's and 3 windows vm's
PVE
prompt:~# pveversion
pve-manager/9.1.6/71482d1833ded40a (running kernel: 6.17.13-1-pve)
Running on the NVME
ZFS
Memory capped with
options zfs zfs_arc_max=8589934592
No minimum set
One of the HDD (ST6000NT001) is definitly failing with 2215 reallocated sectors and 212 pending sectors and needs to be replaced.
I ordered a new HDD and waiting for delivery. But then, how to replace the disk (linux newbie)?
I searched the forum and internet general and found bit's and pieces so i asked Claude for advice (i know, never trust AI)
Claude has produced the following manual based on information i send, i would sincerely appriciate it if someone would give this a check.
Thnx in advance, edwin.
1. Verify Current Pool and Backup Status
Confirm pool health and check the last backup timestamp:
zpool status
pvesm status
Expected: VMDATA-HDD-ZFS shows ONLINE with both mirror drives present and errors: No known data errors.
!! Perform, and verify, a backup of PVE, XLC, VM's before continue
2.Gracefully Shut Down All VMs and Containers
The ML30 Gen10 does not support hot-swap. All VMs and containers must be stopped before powering down the server.
# List all running VMs
qm list
# Shutdown each running VM (repeat for each VMID)
qm shutdown <VMID>
# List all running containers
pct list
# Shutdown each running container (repeat for each CTID)
pct shutdown <CTID>
Wait until all VMs and containers report stopped before proceeding.
3. Power Down the Server
Once all VMs and containers are stopped, shut down the server:
shutdown -h now
Wait for the server to fully power off before opening the chassis.
!! DO NOT pull the drive while the server is running — the ML30 Gen10 does not support hot-swap and doing so risks data loss or hardware damage. !!
4. Physically Replace the Drive
With the server powered off:
1. Locate sdc — the Seagate ST6000NT001 with S/N xxxxxx
2. Remove the failing drive from its bay
3. Insert the new 6TB (or larger) replacement drive
4. Power the server back on
5. Identify the New Drive
After the server has booted, identify the device name assigned to the new drive:
lsblk -o NAME,SIZE,TYPE,ROTA,TRAN,MODEL
The new drive appears as a new sdX with no partitions. Use this device name in the commands below
(example: /dev/sde).
6. Wipe the New Drive
Clear (just to make sure) any old partition tables or ZFS labels to prevent import conflicts:
wipefs -a /dev/sdX
Replace sdX with your actual new drive device name.
7. Replace the Failing Drive in ZFS
Issue the ZFS replace command using the exact serial-based identifier from your pool:
zpool replace VMDATA-HDD-ZFS ata-ST6000NT001-3M1101_WX00WVSL /dev/sdX
ZFS immediately begins resilvering — copying all data from the healthy sda to the new drive. The pool stays fully online and accessible during this process.
!! If sdc shows as UNAVAIL or REMOVED in zpool status, use this alternative command instead: !!
zpool replace VMDATA-HDD-ZFS sdc /dev/sdX
8. Monitor Resilver Progress
Watch the resilver in real time:
watch -n 5 zpool status
With ~3.9 TB of data to copy, expect 10–14 hours. Do NOT power off the server during resilvering — doing so will restart it from scratch.
!! Resilvering should and with something like this: SUCCESS: The scan line will read: resilvered X.XG in HH:MM:SS with 0 errors
9. Run a Full Scrub After Resilver Completes
Once resilvering is complete, run a scrub to verify full data integrity across both drives:
zpool scrub VMDATA-HDD-ZFS
Monitor scrub progress:
watch -n 10 zpool status
A clean scrub with 0 errors confirms your mirror is fully healthy and protected again.
10. Optional: Upgrade Pool Features ( !! NOT SURE IF I SHOULD DO THIS !! )
The pool reports some supported features are not yet enabled. After a successful scrub you may optionally upgrade:
zpool upgrade VMDATA-HDD-ZFS
Note: After upgrading, the pool may not be importable on older ZFS versions. Only do this if you will not need to import it on older systems.