Thanks for the input.
Do you have separate SSD ZFS pools for Proxmox and the VMs/LXCs? I'm wondering what the wear on the Proxmox SSDs would be vs. the SSDs holding the VMs/LXCs (with separate HDD pool for the actual storage).
Yes proxmox has its own pair of SSDs and writes are not that bad. But I'm also not using ZFS for my boot drives because I want encryption and ZFS can't do that for boot drives. Because of that I'm using LUKS and mdraid raid1. But like someone already said, proxmox itself should write around 30GB per day. So that are around 11 TB per year and shouldn't be a problem even for small consumer SSDs. But thats the point why you shouldn't install Proxmox onto a USB stick, because 11TB per year is way too much for microSDs/USB sticks.
Whats really creating writes are all the VMs/LXCs and swap partitions ontop of ZFS. Virtualization and ZFS are great for reliability and security but it comes at the price of high write amplification. Here with my setup for each 1 MB of data written inside the VM around 20-30 MB of data are written to the NAND flash of the SSDs to store that 1 MB.
Maybe I'm misunderstanding here, but if the monitoring tool is responsible for so much of the writes and thus wear and the SSDs why do you use it?
Not using monitoring isn't a good idea. If you run alot of VMs and LXCs you just can't manually check each single VM/LXC on a daily basis.
You need to:
- check the logs for errors
- check the logs for signs of attackers
- check that you are not running out of storage capacity
- check that enough RAM is free
- check that all services are running
- check if packages need a important security fix
- check if network is working
- ...
Do that for 20 VM/LXCs and you are wasting 2 hours each day just to verify that everything is working. Thats why I use Zabbix and Graylog to collect metrics and logs. These aren't just collecting them, they also analyse every data if something of interest happens. When something happens I get an email and will look at the dashboard. Thats a nice list where I see all problems of all VMs/LXCs on a single page. Way easier if a monitoring tool automates all of that.
My 200 GB Intel DC 3700 that I use for the sysyem have a TBW of 3700, the 1.92 TB SM883 for the disks have >10000, each paired with a much smaller page size. I don't really care about all the small writes, knowing that the disks were designed to endure that kind of load.
My 970 PROs, on which I plan to run the VMs/LXCs has a TBW of 600. The 870 EVOs I was planning to use for the Proxmox install have only TBW of 150 though.
I also removed all my consumer drives (like my 970 EVOs) and replaced them with second hand Intel S3710 drives. They got over 30 times the write endurance so 750 TB written per year isn't a problem anymore if that SSDs can handle petabytes of writes. And second hand they weren't more expensive than my new consumer drives. But drives like that Intel S4610 aren't that expensive even if you buy them new and they still got alot of write durability (but not as good as the S3710/S3700).
New Intel S4610 480GB: 124,50€ (3000 TBW + Powerloss Protection)
New Samsung 870 EVO 500GB: 62,40€ (300 TBW, no PLP)
New Samsung 970 PRO 512GB: 127,90€ (600 TBW, no PLP)
Buying a pair of S4610 would have been a way better deal than the 970 PROs for the same price, even if the S4610 is only using SATA.
I'm beginning to wonder if I should skip my plan to use Proxmox and just go for an Ubuntu Server with Docker containers instead if it is actually the case that Proxmox will wear out SSDs this fast.
The problem isn't proxmox. It is virtualization, server workloads, parallelization, ZFS, small random or sync writes. If you use a Ubuntu server with Docker it isn't much better as long as you are running the same services and want the same level of reliability and security.
The point is simply that you bought consumer drives that are designed to be used by one person at a time and that person is only doing 1 or 2 things simultanious and isn't using advanced features like copy-on-write filesystems like ZFS.
If you get enterprise/datacenter grade drives these are designed for server workloads and can handle that. You will pay 2 or 3 times more for for such a SSD but if it will last 10 to 30 times longer then thats the way "cheaper" option.
If you didn't opened your SDDs yet I would try to sell them as new and get more appropriate SSDs. If you already used them you can do a test setup and monitor the smart values. If you see that these will die way too fast you can replace them later and use them on another system (I moved 6 consumer SSDs from my servers to my GamingPC) or try to sell them to get atleast some money back.