Correction of ZFS write amplification

kyprijan

New Member
Jun 8, 2023
1
0
1
Hi, i have this equipment

4 hdd 2Tb
2 consumer ssd samsung 1Tb
1 consumer ssd 256Gb

2 ssds of 1TB each are combined into a mirror and a system is installed on them
4 hdd merged into raid10 and 256gb ssd connected to them as l2arc cache

When I did this assembly I did not know about such a phenomenon as "record amplification". In 3 months, wearout was 4% , taking into account the fact that the server load was minimal. Yes, this is not much at a distance, and you can live with it, but I would like to improve this situation. The system is constantly writing something, what if install PVE on HDD raid10 with ssd as l2arc, and ssd mirror to use as storage for virtual machines?

Will the assembly lose performance? Will ssd storage be as efficient as in the current build?

Yes, I know that you can turn off atime, redirect logs to RAM, but this will not help much to reduce write amplification.
 
The Proxmox installation itself does not need anything fast and does indeed write logs and graphs constantly. Running it on (very small) HDDs will be fine and you don't need cache or anything fancy.
The I/O of the VMs and CTs depends very much on what's running inside them. Enterprise SSDs with PLP are preferred since they can handle the many IOPS and (sync) writes properly.
I did disable both pve-ha-services on a single node to reduce the writes and lots of not very useful log messages.
 
Last edited:
I did a lot of testing over the years and never got the ZFS write amplification significantly down.
- try to avoid encryption if possible (doubles write amplification for whatever reason)
- try to avoid CoW on top of CoW
- try to avoid nested filesystems
- don't use consumer SSDs without PLP as these can't cache sync writes so the SSDs can't optimize the writes for less wear
- a raidz1/2/3 isn't great as a VM storage (less IOPS and problem with padding overhead) but total write amplification will be lower, as not everythign will have to be written twice (5 disk raidz1 will only write those additional +25% parity data instead of +100% for a full copy of everything)
- the biggest problem is small randsom sync writes (so try to avoid running databases)
- write amplification....amplifies... so every small bit of data that you avoid to write will save tons of wear...

So there isn't really much you can do except for trying to avoid writes in the first place (disable logging and so on...). If you are fine with the performance I would just continue using them and as soon as one of the disks fails I would get a pair of proper (mixed-workload enterprise) SSDs that can handle those writes. Write amplification isn't really a big concern anymore when your 1TB enterprise SSD is rated to survive 20750 TB of writes and not for example just the 360TB a 1TB consumer QLC SSD is rated for.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!