I am going to be setting up another Proxmox node on a device that doesn't have the option of using ECC memory (HP Elitedesk 805 G6 with a Ryzen processor). I have a few questions:
1. Without ECC memory and I better off going with ZFS or some other file system for the installation (ext4? BTRFS?) and why?
2. If I install with a different file system, what Proxmox-specific features and functionality do I give up/lose in a single node environment? This will not be in a cluster, and I will be running some plain vanilla VMs (Nextcloud, Wordpress, Docker, etc.) and storing data/doing VM backups to a separate Synology NAS.
Thanks in advance
This is my personal advice. 45 years in IT, but all over, not a storage expert...
There have been long discussions about this, a good source is the
TrueNAS forums.
I've always tried to put ECC into my bigger and more important systems, so when power consumption and economy pushed me into NUC and Atom territory, I too worried about the lack of ECC on these smaller systems, which still have 32-64GB of volatile RAM.
My main question was if ZFS was
designed for eager use of RAM e.g. for dedup/compression dictionaries, where cosmic rays flipping bits could do the worst damage, because a single bit flip in a Huffman dictionary could potentially thrash a lot of data and likewise bit flips in file system control structures could do a lot of damage...
My take on the responses was, that ZFS doesn't
increase the risk of non-ECC memory vs. other file systems.
When it comes to dirty buffers, the response was clear, buffers will be written eagerly to logs (NV-RAM or SSD preferred), and logs then emptied transactionally, and not super lazy. While ZFS will use as much RAM as it can get to cache written data, it's still being written before it's purged. It's not like LRU in paging, where data is only written to disk as last resort when RAM gets tight. Actually, I'm not even sure it's a ZFS specific thing, but a Unix/Linux buffer cache paradigm, which does "lazily" writes, but not "latest" writes.
When it comes to in memory control structures, ZFS is simply no different or worse from any other file system.
When it comes to dedup and compression dictionaries, again ZFS is
no different or worse from any other dedup and compress (e.g. VDO).
But it also doesn't seem to be any
better. And some of the ECC recommendations might have originated from the fact that compression and dedup add certain inherent risks.
Dedup basically comes near free when you do compression and just add a strong hash. If you have two blocks with the same (strong) hash, that means it's a duplicate so instead of storing that block, you just store a reference. If a flipped bit has you misidentify blocks, a 100 logical copies of that block will be read back wrong, where without dedup, only one would be.
The main benefits of dedup can be questioned anyway, because toms of duplicate data indicate an issue elsewhere, I've always thought that it was mainly because this type of dedup came basically for free with compression, that it was sold and reasoned with VDI use cases.
But if the RAM bit flip occurs within an extremely popular dedup block or within its hash entry, it's fan-out or knock-on effect will also be "popular".
For the compression the short version is that you maintain a dictionary, where you map bit strings into other bit strings and the compression comes from the fact that use shorter output strings for frequent input strings and use long translations for really rare inputs. Again, if a bit flips in that dictionary, that impacts every replacement being done, could be near zero effect on a rare string, could be near complete garbage on the highest frequence short string.
I believe it's these "nighmare scenarios", which had the ZFS designers feel glad about ECC RAM, which Sun servers most likely had anyway.
Now in both cases the additional risk of these "forever data structures" in RAM could be compensated, by periodically empyting them and have them rebuilt them from scratch, e.g. during idle times. Now that would cause some overhead and perhaps even some efficiency loss e.g. if the new compression dictionary isn't quite as effective as an older one. It could also result in a temporary loss of dedup detection, but that doesn't cause any
functional issues, as these file systems dedups aren't guaranteed to detect all duplicates anyway. But it woulde reduce the risk of RAM bit rot and that's why in a system without ECC I'd like to have that for peace of mind. Even then even ECC isn't a totally immune to bit flips or row-hammer attacks.
AFAIK, that sort of thing is not being done in ZFS (or VDO), but I haven't read the source code to check, either.
But then bitrot isn't just a RAM thing, it's also a disk (or SSD issue) and there ZFS is leading with checksums for on-disk structures, for metadata and file data. I believe for all the current Unix file systems only btrfs is trying to catch up, the usual ext4 or xfs are not doing any of that.
But, I believe CEPH does and on top it will give you scale-out storage cluster facilties that ZFS can't provide unless you add something like Lustre on top (my impression is that this not a home-lab thing, while CEPH starts working with just three nodes).
tl;dr
ECC RAM is wonderfully cheap these days, getting hardware enabling its use, can still be a major headache [expletives deleted].
Single system: no need not to use ZFS, you may want to not use compression and dedup if you don't have ECC.
Dual systems: ZFS allows for near real-time forwarding of data to a standby, which can do wonders for your point-in-time recovery
Three or more nodes: CEPH can give you true fault tolerance, which beats ZFS in that regard