SN850X + rsync

olk

New Member
Nov 7, 2024
19
1
3
Hello,
I read that server grade SSDs are recommended - but those drives are expensive.

I plan to install the PVE system (used for development) on a 1TB WD SN850X and the data/images on a 4TB WD SN850X.
The configuration and the images/snapshots are saved each day on a separate NAS (using a RAID) via rsync.

The WD SN850X is a consumer grade SSD with 2GB DRAM and 2400 TBW. Do you know some configuration options that extend the lifetime of the SSD?

ty
Oliver
 
You could buy used enterprise SSD drives. They will still last longer than a usual consumer drive which puts the cost in a different light. For example on servershop24.de you can find enterprise ssd starting at around 240 GB for 30-40 Euro which is more than enough for the boot partition.
You can also use stuff like folder2ram or log2ram to put the directories /var/log, /var/lib/pve-cluster and /var/lib/rrdcached on a ram disk but be aware that you might run into trouble in case of a loss of power.

ZFS also has a higher toll on disks but also features like bitrot-protection, compression, support for storage replication etc. So you might want to use another filesystem for the boot drive ( like XFS or ext4)

This being said and to put things in perspective: In an older german thread Mira of proxmox team pointed out that although enterprise disks are recommended for productive servers the developers also have consumer ssds on some of their developer workstations: https://forum.proxmox.com/threads/richtige-ssd-und-festplatten-konfiguration-für-proxmox-cache-swap-usw.100014/
Mira also pointed out , that for os drives an HDD would also be ok although not recommended but not for the VMs (since VMs won't have good performance on a HDD).


However if you use consumer SSDs or NVMEs you need to plan to replace the boot drive more often, so again the question is whether a used enterprise ssd is actually that much more expensive.

Regards, Johannes
 
Last edited:
  • Like
Reactions: waltar
However if you use consumer SSDs or NVMEs you need to plan to replace the boot drive more often, so again the question is whether a used enterprise ssd is actually that much more expensive.

When should I consider replacing my drive, and what factors support this recommendation?
I ask because I’ve been using a Samsung 960 NVMe as my system drive (with F2FS) on my development machine since 2017, and I haven’t encountered any issues so far.
 
You can monitor the SMART status and wearout of your disks in PVE. Basically if SMART gives errors or wearout nears 100% it indicates that the disk reached a state where it might fail at any moment according to the manufacturer. With other words: At that point you should replace the SSD or expect a failure and be prepared for it.
So like always it's a good idea to have backups and test restore.
 
Last edited:
You can also use stuff like folder2ram or log2ram to put the directories /var/log, /var/lib/pve-cluster and /var/lib/rrdcached on a ram disk but be aware that you might run into trouble in case of a loss of power.

After doing some research (aka google ;) ) I think it's not a good idea to put anything but /var/log on a RAM disk since the other directories are used to save the current state of the PVE host and cluster. So in case of powerloss one might end up with a broken setup which would need a potential time and nerve costing restore from backup or even reinstall.
/var/log should be safe though since although log files might be important for troubleshooting they are not needed for the system to work.

I'm not quite sure about rrdcached: As far I know it's only used for the CPU/RAM usage diagrams in the GUI so it's basically like /var/log. On the other hand the whole point of rrdcached is to cache this data in RAM first before writing it to the disk (see https://oss.oetiker.ch/rrdtool/doc/rrdcached.en.html and https://oss.oetiker.ch/rrdtool/doc/rrdtool.en.html ) so putting it's results on a ram disk seems a little redundant. I found an older thread which discuss how to reconfigure rrdcached to reduce it's write (with the risc of loosing data in case of powerloss): https://forum.proxmox.com/threads/reducing-rrdcached-writes.64473/
 
  • Like
Reactions: olk
I found this HOWTO that reduces the write ops and thus keeps the SSD a little bit longer alive.
Yes that basically repeat what's already in the forum and this thread. The tutorial for disabling the services assumes that one doesn't want to setup a cluster. As soon as one want to use two or more servers together this isn't feasible any more.
log2ram, folder2ram etc should be more robust (as long you don't have a powerloss).
 
Last edited:
After reading this forum and other sources, I decided to replace the 4TB WD SN850X with two 2TB SOLIDIGM P4511 (PLI - power loss imminent protection) drives at nearly the same price.

These two SOLIDIGM drives will be configured in a ZFS-RAID1 setup, and I plan to follow the instructions provided in "Proxmox VE NAS and Gaming VMs - 03 - Proxmox Tweaks" since I’m using a single Proxmox node for development.

While the total TBW (terabytes written) of the SN850X and the P4511 are similar, the sync-write performance of PLP (Power Loss Protection) SSDs, such as the P4511, is significantly superior to that of consumer SSDs without PLP, as highlighted in the article "Ceph, etcd, and the Sync-hole".

For a single Proxmox instance (without clustering or HA), the SN850X would perform well—especially when using ext4 (rather than ZFS), as detailed in the aforementioned tweaks guide.
 
Last edited:
  • Like
Reactions: Johannes S
I've installed PVE 8.2 (ZFS RAID1) on two 2TB SOLIDIGM P4511 SSDs.
On the drives 2.6GB are used but 200GB have been written (write ops) and Intel's std tool tells me an estimated lifetime (based the write ops) for both SSDs of 0.19 years.

This is astonishing! I'll remove the ZFS and install Debian with F2FS on a RAID1 (mdadm) + PVE instead.
 
Last edited:
Warning: do not format the f2fs partitions with `extra_attr` option - the PVE kernels are build without supporting it. You would get errors while mounting the partition during boot: "can't find valid checkpoint"