Bad storage choice for home 'cluster' - fixable or start over?

iGadget

Member
Apr 9, 2020
26
7
8
44
Having started with Proxmox VE for my home / smb environment very enthusiastically on old consumer hardware (the clustering feature is AWESOME, (live) migrating VM's and containers is just pure MAGIC), I'm now forced to reconsider as (presumably) ZFS is killing my SSD(s).
Current situation is a 3-node PVE 'cluster' with each node containing a single SSD (varying between 120GB and 1TB). I chose ZFS when installing the nodes, because of the presumed advantages of being able to use snapshots, ease of expandability etc.

The setup actually runs pretty well for my use case (about 6 VM's / containers currently), however the 250GB Samsung EVO SSD on my first node already has a wear level count of 22% (of which the last 2% was in a matter of weeks). So it seems obvious the current setup will kill the SSD(s) pretty quickly.

Question is, should I start over completely or can I gradually fix/reconfigure (or reinstall) each node without killing the cluster?
Unfortunately, I don't have the budget to buy enterprise-grade hardware, so any config suggestions which would make my consumer SSD's last longer in a small PVE cluster would be very welcome.
 
Look out for second hand enterprise SSDs. There are models with extreme durability out there and because of that you can buy these with nearly all life left for cheap (got my 200GB SSDs with 3560 TBW left for 30€ each). A new 250GB Evo only got 75 TBW or 150 TBW and should even die faster because it can't handle sync writes without a huge write amplification.
 
Last edited:
  • Like
Reactions: iGadget
In average TBW is exceeded by 3-5x times before errors occur.


No need to start over, just add second disk, clone partition layout from old disk and randomize uuids.

Add zfs partition to existing pool to convert it to raid1 and let it resilver.

Setup esp sync for efi bootloader.

https://pve.proxmox.com/wiki/Host_Bootloader#sysboot_systemd_boot_setup

Can all be done while the servers keeps running.


There are some ways to increase lifetime, but they come with other issues.
 
  • Like
Reactions: iGadget
Look out for second hand enterprise SSDs. There are models with extreme durability out there and because of that you can buy these with nearly all life left for cheap (got my 200GB SSDs with 3560 TBW left for 30€ each). A new 250GB Evo only got 75 TBW or 150 TBW and should even die faster because it can't handle sync writes without a huge write amplification.
Thanks for your reply. While I've been unable so far to find the same sizes / prices, I did find a reseller offering used Samsung SM863a's (480GB) drives for €100 a piece. Do you think that's a good price? And would I get away with 1 of those for each node (which would set me back €400 for 4 nodes)?
Alternatively, the same reseller is also offering Intel DC D3-S4510's (480GB) drives for €75 a piece. The Samsung has a better TBW (3,1PB vs. 1,2PB for the Intel) , but would that be worth the extra €25 for my use case?

Update - I got 4 of the Samsung SM863a's (480GB) disks. Now for the migration:

In average TBW is exceeded by 3-5x times before errors occur.
No need to start over, just add second disk, clone partition layout from old disk and randomize uuids.
Add zfs partition to existing pool to convert it to raid1 and let it resilver.
Setup esp sync for efi bootloader.
https://pve.proxmox.com/wiki/Host_Bootloader#sysboot_systemd_boot_setup
Can all be done while the servers keeps running.
There are some ways to increase lifetime, but they come with other issues.
Great, but I don't mind taking each node down one by one to make things simpler. Since the cluster can keep running with 2 machines, that won't cause any downtime either. In the setup you propose, I will keep using the old consumer SSD and will be limited by the size of that drive in a RAID1 setup, right? Since I'm running a cluster (and switching to enterprise drives) I'm not that worried about single-disk failures.

To avoid complexity, my current plan would be to:
1. take down the node and add the second (enterprise) drive
2. boot external tool (Live USB, i.e. CloneZilla) and clone current partitions to new drive
(2a. randomize UUID's, if this is really needed? If so, how do I do that?)
3. Shutdown and remove old drive
4. Boot node to confirm working PVE
5. Perform PVE disk maintenance tasks (which ones are required for this scenario?)
6. Wipe old SSD
7. Optionally - re-add old SSD (depending on the size of the enterprise drives) but now use only for non-write-intensive tasks.

Do you see any downsides / impossiblities in this alternative plan?

And as for non-write-intensive tasks, am I correct in assuming that those are:
- VM template storage
- Backups
- VM storage (depending on the VM's task itself)?
 
Last edited:
Well... My plan certainly is not going to work as I had in mind - the system does not boot from the new SSD after cloning.
I've searched and read a lot of articles and posts on migrating PVE from one disk to another, but none seem to describe my specific situation:

An existing full ZFS setup on a UEFI machine, migrating to a larger SSD.

I'm suspecting UEFI to be part of this issue, since with the old SSD I could choose 'linux boot menu' as one of the boot options in the UEFI setup, whereas with the new SSD, I can only select 'Samsung SM863a'.
So what portion of the old SSD wasn't cloned? Or is there some UEFI setting which also needs to be updated?
Or am I completely missing the point and are there other issues I should focus on?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!