Proxmox on tiny computer, only two drives 1NVME, 1SATA

cc_lab

New Member
Sep 6, 2020
3
0
1
40
Wanted to get advice on how best to set up drives and partitions to achieve a good set up for my needs. I have it set up now with the nvme drive as the main disk which was set up using the installer, local and local-zfs set up on the nvme. The SSD is currently unused.

I want to have a zfs pool for nas storage as well, running on the same box.

I am leaning towards keeping everything on the nvme and then having the SSD as backup. For this I think I cannot resize the existing zpool that takes the full disk after default install, so would probably need to remove it and repartition to have multiple zpools. Or, open to other ideas if it makes sense to go a completely different way with it.

Any advice on how best to use the two disks?
 
Last edited:
If you use NVMe you are using consumer SSDs? Be aware that zfs might kill your consumer SSDs within months. If it is a consumer SSD I would use xfs or another non copy-on-write file system so the ssd will last longer. Atleast monitor the SSDs with smartctl and look how much data is really written to the NAND flash per day.
 
Last edited:
"within months" is a concern for me. I will keep an eye on this. I had heard that it would shorten the life, but was not concerned thinking would get a few years.

Any suggestions for alternatives? XFS on lvmthin?
 
The problem is the write amplification. I've got for example 3 VMs running doing mainly sync writes. Combined they only write about 1MB/s inside of the VMs to the virtual ext4 filesystem but because I use zfs on the host, which uses copy-on-write, journaling, caching sync writes on disk, parity and so on around 10MB/s is written to the SSDs to store those 1MB/s. And if you need to write 10MB/s of data to the ssd, it will not write 10MB/s it will write much more to the flash to store this data because it got write amplification again. My enterprise SSDs will write 1.8GB for every 1GB I send to them for writing. So the 1MB/s from the VMs will write 18MB/s to the NAND flash of the SSD. 18MB/s is 568TB per year (while idleing) what would kill a consumer SSD like the Samsung Evo 970 M.2 1TB in around 1 year. And with a consumer SSD the write amplification should be much higher, because my enterprise SSDs are build for a low write amplification, so it would only last some months or weeks.
I switched to used Intel S3710 Enterprise SSDs. You get a used (99% life left) 200GB version for around 30€ and 5 of them can handle 18 petabyte of writing before failing and not just 0.6 petabyte like the Samsung Evo 970 M.2 1TB.

The consumer SSDs are really bad if they need to write small files or many small sync writes (like a database does). Not unusual that they will write 20MB to store a file change of 1KB if they can't cache because a sync write is needed. If this happens once a second (like a database does) this will kill the SSD really fast. Good Enterpise SSDs are using way more durable SLC/MLC flash instead of the TLC/QLC flash used in consumer SSDs. And the Intel S3710 200GB SSD is in reality using 360,8 GB MLC flash so there is 80% more flash you can't see to increase the lifetime of the SSD. Consumer SSDs may only use 10-20% more flash as spare flash. And Enterpise SSDs got a capacitor for powerloss protection so they can store data in the RAM cache even if the power supply fails, so they can use caching even if you use sync writes what will optimize write amplification. Consumer SSDs can't cache sync writes because all data in the RAM would be lost if the power supply fails.

You can try your SSDs, maybe you don't got much sync writes and they will work. But you really should run smartctl once a week and write down how much data the SSDs has really written and compare that to the next week and then make a prediction when the TBW of your SSD will be exceeded.

But be aware, Proxmox itself will write once a minute some small files for HA to the disc. It is just some KB of data but if your SSDs needs to write 10 or 20MB each time this will easily cause 20TB per day of writes or something like that.

If you got a filesystem with snapshot capabilities like qcow2/zfs or an aditional abstraction layer like lvm it is using copy-on-write and journaling and will cause a lot of extra write amplification. Just plain XFS without lvm/raid/snapshots might be an option for your SSD and then a daily/weekly backup of your VMs to a local HDD or to a network share so you can restore them if something happens.

I also tried to use HDDs for my VMs so I got no problem with SSD wearing but because of the high write amplification the number of IOPS was also multiplied by 10 so the HDDs just wasn't able to handle all the small writes.

If you are using xfs use "noatime" mount option to prevent a lot of unnessesary writes. If you use ext4 use "noatime" and "nodiratime" and maybe even disablejournaling (but I wouldn't do that).
 
Last edited:
Looks like there is indeed a lot of write amplification going on considering the drive is only 60GB full. I might be okay for a little while though. DUW growing maybe 10-100GB/day with my current usage.

Data Units Read: 1,261,057 [645 GB]
Data Units Written: 3,564,074 [1.82 TB]
Host Read Commands: 12,174,277
Host Write Commands: 97,199,075
Controller Busy Time: 144
Power Cycles: 24
Power On Hours: 1,046
 
Data Units Written: 3,564,074 [1.82 TB]
You should verify that your SSD model uses "Data Units written" to show you the real amount of data written to the NAND flash and not just the data written to the SSD (before the internal write amplification caused by erasing blocks kicks in). For example my Intel S3700 only report "Host_Writes_32MiB" to measure data written and that is without the internal amplification. There is no way to see the real amount of data written to the flash. Only the follow-up model S3710 got an additional SMART value "NAND_Writes_32MiB" which is showing the real amount after amplification.
You could use iostat to monitor how much data is written from proxmox to the ssd and compare it to the values you you get from smartctl. If both numbers are nearly equal the SSD might not show you real values and the real writes could be a multiple of that.