ProxMox booting and zfs HD replacement

RainerM

Member
May 12, 2024
19
3
8
Hi all,
I have several problems to replace a HD which is handed over by ProxMox to an OpenMediaVault (VM500) which handles the HDs by zfs.

When ProxMox (8.4.19) boots, on the console I get the message:
Bash:
Port3: ST12000VN0007-2GS116
       S.M.A.R.T Status Bad, Backup and Replace
       Press F1 to continue
(Thats how I first saw the problem)

After pressing F1 boot continues without showing any problems.

On ProxMox: Datacenter ==> svProx1 ==> Disks:
/dev/sdd ... ST12000VN0007-2GS116 ... with Serial ZJV5JAHP <== Shows S.M.A.R.T FAILED!
But no details about the failure are shown and the S.M.A.R.T. values look more or less normal.

Any idea how to find the (detailed) reason of the disk failute ?



FYI: Below svProx1 is the name of the ProxMox server, sv4000 is the name of the OpenMediaVault (VM500) server.

On svProx1, 5* HD's are handed over to OpenMediaVault, which is managing them exclusively by zfs.
ProxMox handed the 5 disks over by
Bash:
qm set 500 -scsil /dev/disk/by-id/ata-ST12000VN0007-2GS116_ZJV205SW
qm set 500 -scsil /dev/disk/by-id/ata-ST12000VN0007-2GS116_ZJV205SW
qm set 500 -scsil /dev/disk/by-id/ata-ST12000VN0007-2GS116_ZJV54B6L
qm set 500 -scsil /dev/disk/by-id/ata-ST12000VN0007-2GS116_ZJV56CPE
qm set 500 -scsil /dev/disk/by-id/ata-ST12000VN0007-2GS116_ZJV5JAHP                                   <== This HD is faulty now
qm set 500 -scsil /dev/disk/by-id/ata-ST12000VN0007-2GS116_ZJV5LWM0

On OpenMediaValut (sv4000) all these HDs were included in one zfs pool 'ZFSRaidZ2'
Code:
sv4000 => ZFS => Pool hinzufügen ZFSRaidZ2 (RAIDZ2 Pool)
Name:        ZFSRaidZ2
Devices:     (all disks)
Mount Point: /ZFSRaidZ2

###--- First idea. I already tried to replace the faulty HD:

Just replaced the faulty HD with a new one, hoping zfs would automatically resilver the disk.
But the OpenMediaVault (VM500) just didn't start anymore, just displaying:
HTML:
Error: start failed: QEMU Exited with code 1
Details: (kvm: -drive file=/dev/disk/by-id/ata-ST12000VN0007-2GS116_ZJV5JAHP,if=none,id=drive-scsi4,format=raw,cache=none,aio=io_uring,detect-zeroes=on: Could not open '/dev/disk/by-id/ata-ST12000VN0007-2GS116_ZJV5JAHP': No such file or directory
TASK ERROR: start failed: QEMU exited with code 1)

I then replaced the new disk again with the faulty one and booted again.
All booted but my first try to replace disk had just failed.



###--- Second idea. My new intention to replace HD:
According to Oracle, the script to replace a disk is to plainly enter:
Bash:
zpool replace ZFSRaidZ2 scsi-0QEMU_QEMU_HARDDISK_drive-scsi4

Since I'm not 100% familiar with zfs, I rather ask for advice before damaging the zfs pool.

I think that would be the correct way, if the zfs pool would be handled directly by ProxMox.

Would that also be the correct way to replace a HD on a ProxMox Server where the HDs are handed over to OpenMediaVault to manage them with zfs ?
That would be the easiest way, if it would work, but I'm afaid I need to involve more manual steps.
Could I damage the zfs pool that way ?
Any suggestions ?



Here, what I did to identify the faulty zfs HD on svProx1 and sv4000:

# Identify HD on svProx1 by Serial number shown on svProx1 UI.
Bash:
root@svProx1:~# lsblk -o NAME,FSTYPE,FSVER,LABEL,FSAVAIL,PARTUUID,PTUUID,SERIAL

NAME      FSTYPE          FSVER    LABEL                     FSAVAIL  PARTUUID                                                         PTUUID                                                              SERIAL

sdd                                                                                                                                                                            9be172dd-2d12-1440-a227-48b5e8af1123 ZJV5JAHP           <== defekt disk,

├─sdd1  zfs_member  5000     ZFSRaidZ2                               5faa4d32-79a5-864b-9ebf-261e0d890e32 9be172dd-2d12-1440-a227-48b5e8af1123                             <== PARTUUID sdd1

└─sdd9                                                                                         6af2a07c-c072-2a48-8a39-203b99406825 9be172dd-2d12-1440-a227-48b5e8af1123                              <== PARTUUID sdd9

# Identify the mount point by PTUUID on sv4000.
Bash:
root@sv4000:~# lsblk -o NAME,FSTYPE,FSVER,LABEL,FSAVAIL,PARTUUID,PTUUID
NAME   FSTYPE             FSVER    LABEL     FSAVAIL PARTUUID                                                        PTUUID
sde                                                                                                                                                           9be172dd-2d12-1440-a227-48b5e8af1123
├─sde1 zfs_member  5000      ZFSRaidZ2              5faa4d32-79a5-864b-9ebf-261e0d890e32 9be172dd-2d12-1440-a227-48b5e8af1123             <== PARTUUID von sde1
└─sde9                                                                         6af2a07c-c072-2a48-8a39-203b99406825 9be172dd-2d12-1440-a227-48b5e8af1123             <== PARTUUID von sde9

On sv4000 the defect HD is handled as sde.

Bash:
# On sv4000 identify the disk-id used by zfs by its mount point
root@sv4000:~# ls -l /dev/disk/by-id | grep sde[1,9]
lrwxrwxrwx 1 root root 10 Jun  2 15:17 scsi-0QEMU_QEMU_HARDDISK_drive-scsi4-part1 -> ../../sde1                            <== Name: scsi-0QEMU_QEMU_HARDDISK_drive-scsi4 von sde1
lrwxrwxrwx 1 root root 10 Jun  2 15:17 scsi-0QEMU_QEMU_HARDDISK_drive-scsi4-part9 -> ../../sde9                            <== Name: scsi-0QEMU_QEMU_HARDDISK_drive-scsi4 von sde9

scsi disk-id is scsi4

Bash:
# Show zfs status on sv4000 (Sadly HDs internal serial numbers are not shown)
ZPOOL_SCRIPTS_AS_ROOT=1 zpool status -c serial
  pool: ZFSRaidZ2
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: resilvered 538M in 00:26:06 with 0 errors on Mon May 11 12:40:13 2026
config:

        NAME                                      STATE     READ WRITE CKSUM  serial
        ZFSRaidZ2                                 ONLINE       0     0     0
          raidz2-0                                   ONLINE       0     0     0
            scsi-0QEMU_QEMU_HARDDISK_drive-scsi1  ONLINE       0     0     0       -
            scsi-0QEMU_QEMU_HARDDISK_drive-scsi2  ONLINE       0     0     0       -
            scsi-0QEMU_QEMU_HARDDISK_drive-scsi3  ONLINE       0     0     0       -
            scsi-0QEMU_QEMU_HARDDISK_drive-scsi4  ONLINE       0     0     0       -
            scsi-0QEMU_QEMU_HARDDISK_drive-scsi5  ONLINE       0     0     0       -

errors: No known data errors

Any suggestions ?
Any better ways ?



###--- Third idea to replace faulty HD by manual steps:

# On svProx1 identify the qm handed over scsi disk, by checking the VM500 configuration
Bash:
root@svProx1: more /etc/pve/qemu-server/500.conf
#http://172.16.1.4/#/login
agent: 1
boot: order=scsi0;ide2;net0
cores: 4
cpu: x86-64-v2-AES
ide2: none,media=cdrom
memory: 16384
meta: creation-qemu=8.1.5,ctime=1717161677
name: sv4000
net0: virtio=BC:24:11:30:8E:AB,bridge=vmbr0,firewall=1
net1: virtio=BC:24:11:55:18:BF,bridge=vmbr1,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: local-lvm:vm-500-disk-0,iothread=1,size=32G
scsi1: /dev/disk/by-id/ata-ST12000VN0007-2GS116_ZJV54B6L,size=11176G
scsi2: /dev/disk/by-id/ata-ST12000VN0007-2GS116_ZJV205SW,size=11176G
scsi3: /dev/disk/by-id/ata-ST12000VN0007-2GS116_ZJV56CPE,size=11176G
scsi4: /dev/disk/by-id/ata-ST12000VN0007-2GS116_ZJV5JAHP,size=11176G                                       <= Faulty disk identified by serial number
scsi5: /dev/disk/by-id/ata-ST12000VN0007-2GS116_ZJV5LWM0,size=11176G
scsihw: virtio-scsi-single
smbios1: uuid=d242a868-6726-40fb-a9bc-ebd0a8da5dcb
sockets: 1
vmgenid: 86da8a55-11a5-40b2-b93f-4ec31902b01a

# Replace HD by these steps:
# First step: Detach HD from sv4000 zfs
Bash:
root@sv4000: zpool detach ZFSRaidZ2 scsi-0QEMU_QEMU_HARDDISK_drive-scsi4

# Then remove HD form svProx1 handover to sv4000 (eventually with force)
Bash:
root@svProx1: qm unlink 500 --idlist /dev/disk/by-id/ata-ST12000VN0007-2GS116_ZJV5JAHP [--force 1]

# ==> shutdown svProx1
# ==> swap ZJV5JAHP disk against new disk
# ==> boot and identify the new disk by serial number as in steps above
# ==> handover new HD to sv4000

# Again hand the new disk over to sv4000 by qm
Bash:
root@svProx1: qm set 500 -scsil /dev/disk/by-id/ata-... (new HD)

# Attach the new handed over disk to 'ZFSRaidZ2'
Bash:
root@sv4000: zpool atach ZFSRaidZ2 scsi-... new HD
# DO NOT USE add - root@sv4000: zpool add ZFSRaidZ2 scsi-... (new HD) - will create a new pool.

Any step(s) missing ?
Anything else missing ?
Possibility to damage ZFSRaidZ2 pool ?
Any suggestions or better ways ?

Thanks
Rainer
 
Last edited:
Please edit your post and put all you commands and output in CODE blocks. Makes it much easier to read.

If your system has a bad drive but that drive is not required to boot, it will throw an error and continue booting.

You can confirm the status of the disk with smartctl -a /dev/disk/by-id/ata-ST12000VN0007-2GS116_ZJV5JAHP in Proxmox.

If you replace the disk but the VM still has the old disk that is not in the system anymore in the config, the VM will not start (first idea).
So, also replace the old disk with the new one in the VM config and the VM should start with the new disk instead of the old one.

As your pool is RAIDZ2 it will start but with the old disk missing (UNAVAIL).
Doublecheck with zpool status in the VM.

You can then replace this missing disk with the replacement disk in the VM: zpool replace ZFSRaidZ2 <olddisk> <newdisk>
<olddisk> and <newdisk> should be the disks/devices as they are known in the VM.
 
  • Like
Reactions: Johannes S
I modified my first post by inserting 'CODE' sections. Thank you for the advise.

I checked the smartctl status of the HD and attached the output as text file. (svProx - smartctl - ZJV5JAHP.txt)
Is the HD completely damaged or is there a way to 'restore' it somehow ?
Any idea ?

I wonder whether the faulty HD is still in use by zpool on the sv4000 OpenMediaVault server as the status is:

Bash:
root@sv4000:~# zpool status
  pool: ZFSRaidZ2
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: resilvered 538M in 00:26:06 with 0 errors on Mon May 11 12:40:13 2026
config:

        NAME                                      STATE     READ WRITE CKSUM
        ZFSRaidZ2                                 ONLINE       0     0     0
          raidz2-0                                ONLINE       0     0     0
            scsi-0QEMU_QEMU_HARDDISK_drive-scsi1  ONLINE       0     0     0
            scsi-0QEMU_QEMU_HARDDISK_drive-scsi2  ONLINE       0     0     0
            scsi-0QEMU_QEMU_HARDDISK_drive-scsi3  ONLINE       0     0     0
            scsi-0QEMU_QEMU_HARDDISK_drive-scsi4  ONLINE       0     0     0
            scsi-0QEMU_QEMU_HARDDISK_drive-scsi5  ONLINE       0     0     0

errors: No known data errors

To me it looks as if only svProx1 wouldn't start, but after beeing helped over with F1 all VM would run without seeing any problem.
What do you think ?



I'm sorry, I wasn't clear about installations and boots on svProx1 and sv4000 in my first post.
svProx1 is the ProxMox server and booted without any problem (well some times with pressing F1), after I had swapped the failing HD against a new one or back.

sv4000 is a VM (ID 500) on svProx1 and is installed on a seperate 10G HD on svProx1.
This HD is used to hold all the VMs and CTs.

The 5*SATA HDs are handed through to sv4000 only by 'qm' and only sv4000 is managing them by zfs.
They are used exclusively by sv4000, and the original plan was to be able to transfer this data storage seamlessly to another OpenMediaVault in the event of a failure of this ProxMox server.

What had happened after I plainly replaced the failed HD by a new, similar one.
After the disk replacement, svProx start up and the boot problem only occurred on sv4000, which failed to start up and displayed the following message on svProx UI:
HTML:
Error: start failed: QEMU Exitde with code 1
Details: (kvm: -drive file=/dev/disk/by-id/ata-ST12000VN0007-2GS116_ZJV5JAHP,if=none,id=drive-scsi4,format=raw,cache=none,aio=io_uring,detect-zeroes=on: Could not open '/dev/disk/by-id/ata-ST12000VN0007-2GS116_ZJV5JAHP': No such file or directory
TASK ERROR: start failed: QEMU exited with code 1)

I then swapped the failed HD back and also sv4000 started without any problems.



Now to your suggestions regarding the HD replacement.

Since I'm quite sure scsi-0QEMU_QEMU_HARDDISK_drive-scsi4 is the faulty HD on the sv4000 VM, you suggested to:
So, also replace the old disk with the new one in the VM config and the VM should start with the new disk instead of the old one.

How can I replace the HD in the VM config on a running system and make sv4000 start up without the problem described above ?
I think you mean the VM config (/etc/pve/qemu-server/500.conf. Pls see output on first post) on svProx1. Correct ?
Please advise.

You then suggest to use these steps to replace the HD in zpool:
1 - root@sv4000: zpool replace ZFSRaidZ2 scsi-0QEMU_QEMU_HARDDISK_drive-scsi4 <newdisk>.
2 - shutdown svProx
3 - swap old against new disk (How when the HD was already swapped. Or maybe not nessecary anymore)
4 - start svProx
5 - (Auto)Start sv4000
6 - check zpool status

Again, how can I make sure sv4000 will then boot with the new HD (see above) ?
Can the 'zpool replace' command damage the pool ?
Is there any way to stop the ’replace’ command in case the sv4000 fails to boot up again and I have to swap the faulty hard drive back ?

Then, how should I know the name of the new handed through drive in advanve ?
It might again become scsi-0QEMU_QEMU_HARDDISK_drive-scsi4 again or it could get scsi-0QEMU_QEMU_HARDDISK_drive-scsi6 assigend as name.

Or should I just issue:
1 - root@sv4000: zpool replace ZFSRaidZ2 scsi-0QEMU_QEMU_HARDDISK_drive-scsi4
But how will zpool identify the new drive to use, since it's not assigned ?

Please advise.
Thanks
Rainer
 

Attachments

Is the HD completely damaged or is there a way to 'restore' it somehow ?

Code:
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.

1 Raw_Read_Error_Rate 0x000f   042   031   044    Pre-fail  Always FAILING_NOW 67378645
This disk is toast, no way to restore.

To me it looks as if only svProx1 wouldn't start, but after beeing helped over with F1 all VM would run without seeing any problem.
Proxmox (svProx1) shows you the error and wants you to hit F1 before it continues, but as the Proxmox Operating System itself is not installed on the failing drive, it can continue booting.

In the VM (sv4000) all looks fine because it cannot access the SMART data through KVM, it only sees the virtualised scsi disk.

Again, how can I make sure sv4000 will then boot with the new HD (see above) ?

I don't know if your system supports hotswappable disks, so I assume you'll have to turn it off to physically replace the disk.

1. Set "Options/Start at boot" to "No" for the sv4000 VM so it doesn't boot up when you boot svProx.
2. Shutdown svProx and physically replace the faulting disk with the replacement disk.
3. Boot svProx.
It should continue to boot without pressing F1 because there are no faulting disks anymore.
It should not autostart sv4000 because you disabled that.
4. Remove the faulting disk scsi4 from the sv4000 VM configuration (either in the GUI or in (/etc/pve/qemu-server/500.conf).
5. Add the replacement disk as scsi6 in the sv4000 VM configuration (either in the GUI or in (/etc/pve/qemu-server/500.conf).
You should now have scsi1, 2, 3, 5 and 6 in your sv4000 VM config.
6. Start sv4000 VM manually.

OpenMediaVault may have options to restore the zfs array from here, but I am not familiar with OMV.
Otherwise:

7. Have a look at zpool status inside sv4000.
It will be DEGRADED with scsi 1,2,3 and 5 present and scsi4 UNAVAIL.
8a. Replace disk:
zpool replace ZFSRaidZ2 scsi-0QEMU_QEMU_HARDDISK_drive-scsi4 scsi-0QEMU_QEMU_HARDDISK_drive-scsi6
Your pool will start resilvering.
Instead, you can also set scsi6 as a hotspare and ZFS will do the replacing for you:
zpool add ZFSRaidZ2 spare scsi-0QEMU_QEMU_HARDDISK_drive-scsi6

9. Turn "Options/Start at boot" back to "Yes" so sv4000 starts automatically on the next boot.


Is there any way to stop the ’replace’ command in case the sv4000 fails to boot up again and I have to swap the faulty hard drive back ?
You should never have to swap the faulty drive back in once you removed it from the sv4000 configuration (step 4).

Just for the record; once resilvering you can detach the new disk from the pool again. This stops the resilvering immediately.
zpool detach ZFSRaidZ2 scsi-0QEMU_QEMU_HARDDISK_drive-scsi6

Having backups is obviously always preferred.
If you are uncertain, post the output of 7. zpool status before doing anything to the pool.
 
Thank you very much for this very good explanation.
Really makes sense.

I will try to replace the HD some time next week, after first rsyncing all content to another server.

I just had a look into 500.conf and scsi4 is shown with a max. size of 11176G.
Bash:
root@svProx1:~# more /etc/pve/qemu-server/500.conf | grep ZJV5JAHP
scsi4: /dev/disk/by-id/ata-ST12000VN0007-2GS116_ZJV5JAHP,size=11176G

On the Internet the size is said to be a value somehow provided by ProxMox.
I couldn't find a way to calculate it manually.
Bash:
smartctl -a /dev/disk/by-id/ata-ST12000VN0007-2GS116_ZJV56CPE | grep 11176

Since my new disk is not absolutely the same, but only a similar one with about the same size.
I intent to follow your suggestions, but add and remove the handover to sv4000 again by qm.

First hand the newest disk over to sv4000 as scsi6 by
Bash:
qm set 500 -scsi6 /dev/disk/by-id/ata-... (new HD)

By that order I will avoid to reuse scsi4 as a handover to sv4000 even by accident.
Then use qm
Bash:
qm unlink 500 --idlist /dev/disk/by-id/ata-ST12000VN0007-2GS116_ZJV5JAHP [--force 1]
to remove the scsi4 entry in 500.conf or do it manually.

After sv4000 then hopefully starts again, I will check for the newest HD beeing handed over to sv4000, then check for the status of the zpool.
If everything looks as proposed, I will enter the 'zpool replace' command and again check for the zpool status.

But just fo my curiosity.
What would actually happen, if I would again hand the new disk over as scsi4 to sv4000.

After then rebooting sv4000, would zfs automatically start resilvering the new HD since it showed up on the old location in the pool but beeing new disk ?

Or would I at least need to start that by entering
Bash:
zpool replace ZFSRaidZ2 scsi-0QEMU_QEMU_HARDDISK_drive-scsi4

Or would I run into problems ?
Any idea ?

Thank you very much for your great support.
I'll keep you posted.
 
  • Like
Reactions: daanw
What would actually happen, if I would again hand the new disk over as scsi4 to sv4000.
As the replacement disk is empty, zfs doesn't know what you want it to do with it. You might trick the pool into resilvering scsi4 directly when you first create zfs partitions on the drive, but I wouldn't bet my data on it.

After the resilver using replace or hotspare as described before completes, you should be able to relabel the scsi devices back to scsi1,2,3,4,5 again, giving the same end result.

zpool replace ZFSRaidZ2 scsi-0QEMU_QEMU_HARDDISK_drive-scsi4
zfs replace always needs 2 arguments, being a drive to replace and a drive to replace it with.
 
Last edited:
For myself, I prepared a document (sv4000 & svProx - HD Replacement steps.txt) with all steps how to replace the faulty HD, which is handed over to OpenMediaVault zfs running on ProxMox.
(In advance I made sure all files stored on zfs were backed up via rsync to another server.)

Going through my document, I first checked svProx1 UI showing the HDs.
But the HD now actually again shows 'PASSED'.

smartctl now shows 'In_the_past' at the 'WHEN_FAILED' field
Bash:
smartctl -a /dev/disk/by-id/ata-ST12000VN0007-2GS116_ZJV5JAHP
...
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   046   031   044    Pre-fail  Always   In_the_past 166409619
...

while the 'WHEN_FAILED' field showed 'FAILING_NOW' when the HD showed 'FAILED' at the UI.
Bash:
smartctl -a /dev/disk/by-id/ata-ST12000VN0007-2GS116_ZJV5JAHP
...
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   042   031   044    Pre-fail  Always   FAILING_NOW 67378645
...

I then checked the zpool status -s:
Bash:
 ZPOOL_SCRIPTS_AS_ROOT=1 zpool status -c serial
  pool: ZFSRaidZ2
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: resilvered 538M in 00:26:06 with 0 errors on Mon May 11 12:40:13 2026                             <== all on a sudden resilvered
config:

        NAME                                      STATE     READ WRITE CKSUM  serial
        ZFSRaidZ2                                 ONLINE       0     0     0
          raidz2-0                                ONLINE       0     0     0
            scsi-0QEMU_QEMU_HARDDISK_drive-scsi1  ONLINE       0     0     0       -
            scsi-0QEMU_QEMU_HARDDISK_drive-scsi2  ONLINE       0     0     0       -
            scsi-0QEMU_QEMU_HARDDISK_drive-scsi3  ONLINE       0     0     0       -
            scsi-0QEMU_QEMU_HARDDISK_drive-scsi4  ONLINE       0     0     0       -
            scsi-0QEMU_QEMU_HARDDISK_drive-scsi5  ONLINE       0     0     0       -

errors: No known data errors
I found that zfs meanwhile resilvered a large section.

Just an idea, but maybe zfs became aware of the problem, when I copied all files to another server ?

I searched the Internet, but didn't find anything whether to keep or better replace the problematic HD.
Any idea if the HD is now again in a state to further be used ?
 

Attachments

Great that ZFS was able to resilver the pool!

Looking at the SMART output, the drive has corrected the errors by reallocating the bad sectors.
If the numbers remain stable you might be able to keep using the drive without problems. If the numbers go up, this is a clear indication of a failing drive.
Just make sure you have a spare drive ready and meticulously do you backups (as you should anyway).

If you don't feel comfortable with this, you can obviously still replace the disk.
 
Last edited:
Thanks for your fast reply.
It does indeed make sense to monitor the SMART reports for the time being and wait to see if anything changes.

My spare hard drive is ready, and the steps for replacing the hard drive are set out in my document, so I won’t need to look them up again.
Once a week all data is rsynced to a similar server.
All prepared.

I created CheckHD.sh
Bash:
(
  smartctl -a /dev/disk/by-id/ata-ST12000VN0007-2GS116_ZJV5JAHP | grep Reallocated_Sector_Ct
  smartctl -a /dev/disk/by-id/ata-ST12000VN0007-2GS116_ZJV5JAHP | grep Reported_Uncorrect
  smartctl -a /dev/disk/by-id/ata-ST12000VN0007-2GS116_ZJV5JAHP | grep Current_Pending_Sector
  smartctl -a /dev/disk/by-id/ata-ST12000VN0007-2GS116_ZJV5JAHP | grep Offline_Uncorrectable
  echo '#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-'
  smartctl -a /dev/disk/by-id/ata-ST12000VN0007-2GS116_ZJV5JAHP
) > "ZJV5JAHP-$(date +%Y%m%d).txt"
That makes it easy to just diif the files to get the numbers and their differences.