[SOLVED] NVME disk "Available Spare" problem.

Nov 19, 2020
89
15
13
25
Hi all!

I installed ProxMox in 2021 & till now it just works. My system is: PVE 6.4-15 - last from 6.4 branche. I know it is obsolete, but for my needs (home server) it just works, without any problem. It's there and just works... I almost forgot that I have it. Regarding operation, performance everything is great, but hardware...

Boot disc (/dev/nvme0n1 - Samsung SSD 970 EVO) is signaling me, that there is only 26% of "Available Spare" (sectors i presume) left on it:

1715771030237.png

This disc is system only / boot disc. For VM's I have separated ZFS raidz pool (4 SSD in kind of RAID setup) and also one additional SSD (connected via USB) just for VM backups. Other discs are feeling comfortably and without problems (below). But I believe that I need to replace a boot device (/dev/nvme) ASAP.

1715771126503.png

1715771875772.png

My plan is: I need to buy a new NVME disc - probably same as this one (if that is no ok, please suggest a better model) and then try to binary duplicate current disc to new one. Then put a new one into server and everything would be fine. This would be optimal, but I don't know if it will work? Are there any problems, that can occure during that process? Or, what is a best and proven way to replace a ProxMox NVME system disc?

Thanks!
 
Last edited:
What I would do after you get a new NVMe at least as large as old one:


1. Shutdown node.
2. Attach new NVMe to node (another slot or maybe with a NVMe 2 USB adapter etc.)
3. Boot up with Live Linux media (almost any Gparted, SystemRescue etc.) without mounting any HDD/SDD etc.
4. dd from old /dev NVMe 2 new one. MAKE SURE YOU CORRECTLY IDENTIFY OLD & NEW (If you don't its probably game over!).
5. Shutdown node.
6. Remove old NVMe (DO NOT DISCARD/ERASE in case something goes wrong + then you can simply reinsert).
7. Insert new NVMe
8. Boot node & you should be good to go.

I've never had success with Clonezilla when dealing with PVE - so just use dd - done so myself many times successfully.

Alternatively you could also make a zipped dd image of the original NVMe - as then you would always in the future be able to fully revert back to your currently working PVE OS, I do this regularly.
 
Regarding whether or not you need to change the drive - IDNK

What did available spare show last time you looked?

I would think the important part is Percentage used - which shows 0% - so at least S.M.A.R.T believes its still got a full life ahead!

I do know you can NEVER (accurately) rely on S.M.A.R.T. data.
 
> My plan is: I need to buy a new NVME disc - probably same as this one (if that is no ok, please suggest a better model) and then try to binary duplicate current disc to new one

I would go with a Pro model instead of EVO, and look at the TBW rating - the higher the better.

Otherwise check ebay and see if they have refurb Enterprise SSD that fits your sizing needs
 
  • Like
Reactions: GazdaJezda
> Regarding whether or not you need to change the drive - IDNK

I checked the smart stats on my 2x (new) nvme's and they're both at 100% for spare threshold, so yeah it's probably a good idea to replace the drive. If you have a free slot or an adapter, you can still put it into secondary-storage use for backups or whatev until it dies.

PROTIP - if you're not running a cluster, turn off cluster services. This will limit writes to the OS drive. You may also want to setup zram and log2ram.
Also set ' noatime ' on all filesystems and ' atime=off ' on ZFS.
 
  • Like
Reactions: GazdaJezda
dd if=/dev/zero of=/root/zeroes bs=$((1024*1024)); rm -f /root/zeroes
fstrim -v /

Wait then 5 Minutes or so and reboot once, just to get sure and check with smartctl again :)
It has only 778GB Written, that SSD is basically brand new, lol
 
Last edited:
  • Like
Reactions: GazdaJezda
Regarding whether or not you need to change the drive - IDNK

What did available spare show last time you looked?

I would think the important part is Percentage used - which shows 0% - so at least S.M.A.R.T believes its still got a full life ahead!

I do know you can NEVER (accurately) rely on S.M.A.R.T. data.

Yes, I also don't know if it is needed to do something, but googling tell me, that when that metric fall below 10, then is critical. I was aware of disc wearing (SSD's) when I install ProxMox (switch from ESXi). So i regulary watch (doing screenshots of Disks section in ProxMon android app) for those numbers. During last few years it goes (when I look and value differs from last noted):

2022-10-07: 77 %
2023-10-15: 52 %
2023-11-10: 48 %
2023-12-22: 47 %
2024-01-16: 39 %
2024-04-11: 26 %

So, i believe I need to do something now, since I really need that server keep on running.

Alternatively you could also make a zipped dd image of the original NVMe - as then you would always in the future be able to fully revert back to your currently working PVE OS, I do this regularly.

If I imagine that, doing that, i will make a snapshot of current nvme boot disc and have it ready for later extraction to new disc? If yes, can you please tell a little bit more (how can I do that)? Can it be done without live CD (my server do not have a CD/DVD unit)? I would like to do that asap and store it. Then I will buy a new disc & restore that snap to it and try switching it physically on a server. That sounds almost a perfect solution :)

Thank you.
 
dd if=/dev/zero of=/root/zeroes bs=$((1024*1024)); rm -f /root/zeroes
fstrim -v /

Wait then 5 Minutes or so and reboot once, just to get sure and check with smartctl again :)
It has only 778GB Written, that SSD is basically brand new, lol

Yes, it was brand new when put in computer (now is 3 and a half years old and running 24/7). Only ProxMox was installed onto. If I understand you correctly, you suggest I run following commands:
  • dd if=/dev/zero of=/root/zeroes bs=$((1024*1024))
  • rm -f /root/zeroes
  • fstrim -v /
  • shutdown (& power off server completely, wait few minutes then restart it and check again)
If that is correct I can do that later today.
 
So, i believe I need to do something now, since I really need that server keep on running.
It looks like it will soon fail.

IN MY OPINION YOU NEED TO BACKUP STRAIGHT AWAY.
I WOULDN'T DO ANYTHING ELSE. (DO NOT DO WHAT IS SUGGESTED IN THE POST/S ABOVE Of TESTING THE DRIVE FURTHER - YOU MAY KILL THE DRIVE WITH THESE TEST/S)

Can it be done without live CD (my server do not have a CD/DVD unit)? I would like to do that asap and store it.
No you need some Live Linux Media, USB will do. (Same way you installed Proxmox in the first place - without CD/DVD unit?). You can only do it when the NVMe is unmounted & not in OS use.

You'll need another media (except for Live Linux Media) on which to store the zipped image from the failing NVMe.

So what you should do:

1. Shut down node.
2. Boot up with Live media.
3. Attach extra storage media to node.

Then issue following command/s:
Code:
mount /dev/xxx /mnt
#(mount extra storage device)


tmux
#(only optional - to enable leaving the process running)


dd if=/dev/YYY bs=32M status=progress conv=sync,noerror | gzip -c > /mnt/prxmx_node_name$(date +'%Y_%m_%d_%I_%M_%p').img.gz
#(YYY is your failing NVMe system os disk)

This will create a zipped image (name timed-stamped for future ref) of your failing NVMe & store it on your (mounted) storage location.

Good luck.
 
Thank you! Will do that today after work. Will post a result here. Also, I will order a replacement. Just need to create a bootable USB. I don't remember how i installed it, honestly :) I have a SuperMicro board with IPMI, maybe i mount an external CD or similar, really don't remember.
 
Yes, it was brand new when put in computer (now is 3 and a half years old and running 24/7). Only ProxMox was installed onto. If I understand you correctly, you suggest I run following commands:
  • dd if=/dev/zero of=/root/zeroes bs=$((1024*1024))
  • rm -f /root/zeroes
  • fstrim -v /
  • shutdown (& power off server completely, wait few minutes then restart it and check again)
If that is correct I can do that later today.
dont do the dd and rm -f commands separately, du it as one command, exactly as i posted above.
because the first command will write zeroes to your drive (into the zeroes file), till there is absolutely no space left, and the second will delete the zeroes to make space again.

So basically as one command, your drive will be completely full, just for less as one second, which shouldn't cause any issues with services or anything that writes to the root partition, like logs... no "out of space" errors...
If you do it separately, your drive will be simply "out of space" for a longer time, till you delete manually the zeroes file. So the period is simply longer where the drive is full. If you wait to long, it can happen that some service fails etc...

fstrim afterwards marks the blocks where the zeroes were (basically all the free space) as empty.
fstrim will send the cleaning command/discard command direct to the firmware of your nvme, which will clean/mark the blocks as empty by itself.

TBH, The available spare degradation could have several reasons, in my opinion 778GB Written is simply not enough for wearout.
Or in other words, if its indeed wearing, then the drive is definitively defective from beginning.
A refresh cycle, to keep the data alive, shouldn't cause wearout. Reading doesn't cause wearout either.
Its probably even a firmware issue, you may need to update the firmware.
But tbh, i think the most likely case is simply that there is data that was deleted, but the drive simply was never trimmed, or something like that.

However, like others say, spare degradation can be indeed a sign of real drive degradation either. (I would blindly confirm that, if the drive had 50+ TB written, but not with 778GB written)
But as there is a chance, that this could be indeed a degradation, you should backup it, just to be sure.

Cheers
 
  • Like
Reactions: GazdaJezda
Please, just to clarify this and then I will end with 'bugging' :)

My discs are:

1715850846378.png

Problematic disc is: /dev/nvme0n1

Disks: /dev/sda , /dev/sdb , /dev/sdc , /dev/sdd - this are used in ZFS raidz volume.

Backup disc: /dev/sde is already SSD disc, connected via USB holder. I can mount it and use for saving backup there? It already contains VM backups. Is that ok or can it be a problem? If it is fine, then below commands are:

Code:
mount /dev/sde1 /mnt
#(mount extra storage device)

tmux
#(only optional - to enable leaving the process running)

dd if=/dev/nvme0n1 bs=32M status=progress conv=sync,noerror | gzip -c > /mnt/prxmx_node_name$(date +'%Y_%m_%d_%I_%M_%p').img.gz
#(YYY is your failing NVMe system os disk)

Is that correct for use? I'm asking because I have a problem with device naming (not so famiiar with it):
  • /dev/nvme0n1
or
  • /dev/nvme0
Thank you again!

P.S. - when installed ProxMox, I used a "Mount Virtual drive" BIOS option for mounting a ISO drive as bootable. Will use that again to mount Live Linux ISO.
 
Last edited:
  • /dev/nvme0n1
    -> Thats the namespace of the nvme, means the actual disk where data/partitions are on them.
  • /dev/nvme0
    -> Thats the raw disk itself, you can split it into multiple namespaces, if the disk supports it, for passthrough for example, so imagine it as pcie port itself or something, and the namespaces on that are like the disks itself.
Thats the easiest description i could write.
So you actually are mounting/formatting/dd/etc... whatever you do on the "usual" drives, you use the namespaces for that. in your case nvme0n1

Its the same as with nics that support sr-iov, you probably seen those, the start with enp0f0s1, enp0 is the whole PCIe NIC itself, f0 is primary function, means the port of the nic, np0 or s1 is the function, cause you can split the nic port to multiple virtual functions, if you want to passthrough the vf to a vm.
You can do that to avoid using virtio-nic for example, the virtual function of the nic is for example faster as an emulation layer like virtio.
But there are some downsides to that either, like the vmbr won't be able to communicate with a virtual function, without some tweaks.

Same thing for nvme's, you could split them for passthrough reasons either. But i don't know anyone who is doing that.
Cheers
 
Last edited:
  • Like
Reactions: GazdaJezda
Thanks! Will use nvme0n1 for cloning / backing up current disc. I also order a replacement (SAMSUNG 970 EVO PLUS 500GB SSD instead of SAMSUNG 980 PRO 500GB SSD - this one is PCI-E v. 4, my board only support v. 3, my HW seller spot that). Next week will be here.

Best regards!
 
Last edited:
  • Like
Reactions: Ramalama
As I have already pointed out to OP above, I wouldn't do anything like you have suggested - this itself could contribute to drive failure!
Lets simply see in a week or so, after he got his drive and a backup.
Then he can do that without any fear and check smartctl again, or in worst case replace the drive.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!