Bad storage choice for home 'cluster' - fixable or start over?

iGadget · Mar 24, 2021

Having started with Proxmox VE for my home / smb environment very enthusiastically on old consumer hardware (the clustering feature is AWESOME, (live) migrating VM's and containers is just pure MAGIC), I'm now forced to reconsider as (presumably) ZFS is killing my SSD(s).
Current situation is a 3-node PVE 'cluster' with each node containing a single SSD (varying between 120GB and 1TB). I chose ZFS when installing the nodes, because of the presumed advantages of being able to use snapshots, ease of expandability etc.

The setup actually runs pretty well for my use case (about 6 VM's / containers currently), however the 250GB Samsung EVO SSD on my first node already has a wear level count of 22% (of which the last 2% was in a matter of weeks). So it seems obvious the current setup will kill the SSD(s) pretty quickly.

Question is, should I start over completely or can I gradually fix/reconfigure (or reinstall) each node without killing the cluster?
Unfortunately, I don't have the budget to buy enterprise-grade hardware, so any config suggestions which would make my consumer SSD's last longer in a small PVE cluster would be very welcome.

Dunuin · Mar 24, 2021

Look out for second hand enterprise SSDs. There are models with extreme durability out there and because of that you can buy these with nearly all life left for cheap (got my 200GB SSDs with 3560 TBW left for 30€ each). A new 250GB Evo only got 75 TBW or 150 TBW and should even die faster because it can't handle sync writes without a huge write amplification.

H4R0 · Mar 24, 2021

In average TBW is exceeded by 3-5x times before errors occur.

No need to start over, just add second disk, clone partition layout from old disk and randomize uuids.

Add zfs partition to existing pool to convert it to raid1 and let it resilver.

Setup esp sync for efi bootloader.

https://pve.proxmox.com/wiki/Host_Bootloader#sysboot_systemd_boot_setup

Can all be done while the servers keeps running.

There are some ways to increase lifetime, but they come with other issues.

iGadget · Mar 30, 2021

Dunuin said:
Look out for second hand enterprise SSDs. There are models with extreme durability out there and because of that you can buy these with nearly all life left for cheap (got my 200GB SSDs with 3560 TBW left for 30€ each). A new 250GB Evo only got 75 TBW or 150 TBW and should even die faster because it can't handle sync writes without a huge write amplification.

Thanks for your reply. While I've been unable so far to find the same sizes / prices, I did find a reseller offering used Samsung SM863a's (480GB) drives for €100 a piece. Do you think that's a good price? And would I get away with 1 of those for each node (which would set me back €400 for 4 nodes)?
Alternatively, the same reseller is also offering Intel DC D3-S4510's (480GB) drives for €75 a piece. The Samsung has a better TBW (3,1PB vs. 1,2PB for the Intel) , but would that be worth the extra €25 for my use case?

Update - I got 4 of the Samsung SM863a's (480GB) disks. Now for the migration:

H4R0 said:
In average TBW is exceeded by 3-5x times before errors occur.
No need to start over, just add second disk, clone partition layout from old disk and randomize uuids.
Add zfs partition to existing pool to convert it to raid1 and let it resilver.
Setup esp sync for efi bootloader.
https://pve.proxmox.com/wiki/Host_Bootloader#sysboot_systemd_boot_setup
Can all be done while the servers keeps running.
There are some ways to increase lifetime, but they come with other issues.

Great, but I don't mind taking each node down one by one to make things simpler. Since the cluster can keep running with 2 machines, that won't cause any downtime either. In the setup you propose, I will keep using the old consumer SSD and will be limited by the size of that drive in a RAID1 setup, right? Since I'm running a cluster (and switching to enterprise drives) I'm not that worried about single-disk failures.

To avoid complexity, my current plan would be to:
1. take down the node and add the second (enterprise) drive
2. boot external tool (Live USB, i.e. CloneZilla) and clone current partitions to new drive
(2a. randomize UUID's, if this is really needed? If so, how do I do that?)
3. Shutdown and remove old drive
4. Boot node to confirm working PVE
5. Perform PVE disk maintenance tasks (which ones are required for this scenario?)
6. Wipe old SSD
7. Optionally - re-add old SSD (depending on the size of the enterprise drives) but now use only for non-write-intensive tasks.

Do you see any downsides / impossiblities in this alternative plan?

And as for non-write-intensive tasks, am I correct in assuming that those are:
- VM template storage
- Backups
- VM storage (depending on the VM's task itself)?

iGadget · Apr 2, 2021

Well... My plan certainly is not going to work as I had in mind - the system does not boot from the new SSD after cloning.
I've searched and read a lot of articles and posts on migrating PVE from one disk to another, but none seem to describe my specific situation:

An existing full ZFS setup on a UEFI machine, migrating to a larger SSD.

I'm suspecting UEFI to be part of this issue, since with the old SSD I could choose 'linux boot menu' as one of the boot options in the UEFI setup, whereas with the new SSD, I can only select 'Samsung SM863a'.
So what portion of the old SSD wasn't cloned? Or is there some UEFI setting which also needs to be updated?
Or am I completely missing the point and are there other issues I should focus on?

Search

Search

Bad storage choice for home 'cluster' - fixable or start over?

iGadget

Member

Dunuin

Distinguished Member

H4R0

Well-Known Member

iGadget

Member

iGadget

Member