NVME SSD durability under Proxmox

HomeLabNerd

Member
Sep 9, 2023
37
2
8
Probably asked before but I like to know for a homelab how is the NVME SSD durability under Proxmox. Is it for example wise to separate OS from VM storage because of the writes? So two separate NVME SSD's? I do not run CEPH or ZFS.
 
Last edited:
  • Like
Reactions: Kingneutron
Is it for example wise to separate OS from VM storage
Yes, sure. The historical reasoning is still valid: with separate disks you can (for example) reinstall the OS and the other Datastore is not affected at all.

But... before I would do that I would really prefer to have some form of redundancy --> ZFS mirrors. This is much more important (to me!) and much more valuable than having a non-redundant "OS"-disk and a non-redundant "VM"-disk. Possibly without bit-rot-detection and on (too) cheap devices.

...because of the writes?
No. That would make no difference - as long as the same data has to be written somewhere...?


Disclaimer: opinionated post - as most posts are. Here: I do always use ZFS, as long as it is technically feasible. And yes, the recommendation to use "Enterprise-class"-SSDs has a serious background.
 
Yes, sure. The historical reasoning is still valid: with separate disks you can (for example) reinstall the OS and the other Datastore is not affected at all.

Good point!

But... before I would do that I would really prefer to have some form of redundancy --> ZFS mirrors. This is much more important (to me!) and much more valuable than having a non-redundant "OS"-disk and a non-redundant "VM"-disk. Possibly without bit-rot-detection and on (too) cheap devices.

Maybe in the future I will add a ZFS mirror pool. It is still a work in progress and very educational. I really like Proxmox.

No. That would make no difference - as long as the same data has to be written somewhere...?

True.

Disclaimer: opinionated post - as most posts are. Here: I do always use ZFS, as long as it is technically feasible. And yes, the recommendation to use "Enterprise-class"-SSDs has a serious background.

Thank you for the reply and your vision!
 
If you're not running zfs, should be fine. Turn off atime everywhere (including in-VM) and run e.g. log2ram and possibly zram for swap, and turn off cluster services.

I've disabled logging with noatime but I only could find this setting in LXC containers. How is this done with VM's?

I have to look into log2ram. I understand that logs are written to RAM and then daily to disk.I have to check on zram too. I assume that these packages are installed via APT?

It's always a good idea to separate OS + Data, especially since the proxmox installer wipes the target disk.

Good point too.

Thank you for your input!
 
  • Like
Reactions: Kingneutron
Probably asked before but I like to know for a homelab how is the NVME SSD durability under Proxmox. Is it for example wise to separate OS from VM storage because of the writes? So two separate NVME SSD's? I do not run CEPH or ZFS.
As long as you use somewhat good-quality NVMEs, you should be fine. (Samsung, Kingston)
 
  • Like
Reactions: HomeLabNerd
Hi,

You have to look at the endurance specs of your SSDs, especially DWPD ((full) Disk (capacity) Write per day) or the TBW (TerraBytes Written).
For low budget, I prefered to take only 1 SSD for everything, but opted to a quite high end home one with 2400 TBW : https://shop.sandisk.com/products/ssd/internal-ssd/wd-black-sn850x-nvme-ssd?sku=WDS400T2X0E-00BCA0

For an SSD running 24x7, you can convert with formula like:
Code:
TBW = DWPD × Drive Capacity (TB) × 365 × Warranty Years
Or use "Service years" instead of Warranty Years if you want to use them for different periods.

Once you selected your drives, you must then regularly watch the SMART data. I had an unsuited SSDs (for the same price) that I had to replace after few months. So, don't save 50€ on your SSDs or you might have to replace them :)

Ensure the SSDs have Cache (for read) and some SLC cache. That's why I opted for the SN850X. My previous SSD didn't had any of them, so it was terribly slow under load (concurrency), and aging very quickly (some 2% / month).


Real use case - After 10 months of use in a Home Game Servers & Lab - Intensive use + Ceph + CephFS on a 3 nodes cluster with full mesh network for the ceph storage daemons on a 4TB wd_black:

Code:
SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        40 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    5%
Data Units Read:                    260,749,322 [133 TB]
Data Units Written:                 523,270,552 [267 TB]
Host Read Commands:                 2,919,761,751
Host Write Commands:                13,178,553,837
Controller Busy Time:               32,990
Power Cycles:                       92
Power On Hours:                     6,705
Unsafe Shutdowns:                   85
Media and Data Integrity Errors:    0
Error Information Log Entries:      2
Warning  Comp. Temperature Time:    2
Critical Comp. Temperature Time:    0


I wrote 267 TB, which is a bit more than 10% of the 2400 TBW, and the Percentage used is only 5%, which is VERY good.

The unsafe Shutdowns count might make no sense, but it does: I had a poor Tuya Remote Power plug from AliExpress that was powering off-on this server every 5 or 10 minutes, and I took some time to realize the problem and replace the faulty plug.

6,705 hours are about 280 days.

So, whatever SSD you choose and even before the number of SSDs you'll place in your Servers, make sure to:
* Use SSDs with a high TBW or DWPD
* Use SSDs with cache (RAM), and not a pseudo cache made by the driver.
* Use SSDs with power safe write cache on SLC. (Single Layer Cell). This is a small zone of the SSD reserved for those operations.
* Run away from QLC (Quad Layers Cell) if you do not want do loose your money (and your data).
* Monitor your disks.

Professional SSDs will be 3-5x more expensive, but they will have more constant throughput under high load, with better reliability and predictibiilty:
* Have a much faster / bigger SLC write cache
* Have a faster controller to operate under higher loads
* 2-3x more TBW or DWPD (At least 3 DWPD).

In today's market, and since many people do not master those characteristics, you can find crappy SSDs at the same price as excellent ones. Watch out.
 
Last edited:
Hi,

You have to look at the endurance specs of your SSDs, especially DWPD ((full) Disk (capacity) Write per day) or the TBW (TerraBytes Written).
For low budget, I prefered to take only 1 SSD for everything, but opted to a quite high end home one with 2400 TBW : https://shop.sandisk.com/products/ssd/internal-ssd/wd-black-sn850x-nvme-ssd?sku=WDS400T2X0E-00BCA0

For an SSD running 24x7, you can convert with formula like:
Code:
TBW = DWPD × Drive Capacity (TB) × 365 × Warranty Years
Or use "Service years" instead of Warranty Years if you want to use them for different periods.

Once you selected your drives, you must then regularly watch the SMART data. I had an unsuited SSDs (for the same price) that I had to replace after few months. So, don't save 50€ on your SSDs or you might have to replace them :)

Ensure the SSDs have Cache (for read) and some SLC cache. That's why I opted for the SN850X. My previous SSD didn't had any of them, so it was terribly slow under load (concurrency), and aging very quickly (some 2% / month).


Real use case - After 10 months of use in a Home Game Servers & Lab - Intensive use + Ceph + CephFS on a 3 nodes cluster with full mesh network for the ceph storage daemons on a 4TB wd_black:

Code:
SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        40 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    5%
Data Units Read:                    260,749,322 [133 TB]
Data Units Written:                 523,270,552 [267 TB]
Host Read Commands:                 2,919,761,751
Host Write Commands:                13,178,553,837
Controller Busy Time:               32,990
Power Cycles:                       92
Power On Hours:                     6,705
Unsafe Shutdowns:                   85
Media and Data Integrity Errors:    0
Error Information Log Entries:      2
Warning  Comp. Temperature Time:    2
Critical Comp. Temperature Time:    0


I wrote 267 TB, which is a bit more than 10% of the 2400 TBW, and the Percentage used is only 5%, which is VERY good.

The unsafe Shutdowns count might make no sense, but it does: I had a poor Tuya Remote Power plug from AliExpress that was powering off-on this server every 5 or 10 minutes, and I took some time to realize the problem and replace the faulty plug.

6,705 hours are about 280 days.

So, whatever SSD you choose and even before the number of SSDs you'll place in your Servers, make sure to:
* Use SSDs with a high TBW or DWPD
* Use SSDs with cache (RAM), and not a pseudo cache made by the driver.
* Use SSDs with power safe write cache on SLC. (Single Layer Cell). This is a small zone of the SSD reserved for those operations.
* Run away from QLC (Quad Layers Cell) if you do not want do loose your money (and your data).
* Monitor your disks.

Professional SSDs will be 3-5x more expensive, but they will have more constant throughput under high load, with better reliability and predictibiilty:
* Have a much faster / bigger SLC write cache
* Have a faster controller to operate under higher loads
* 2-3x more TBW or DWPD (At least 3 DWPD).

In today's market, and since many people do not master those characteristics, you can find crappy SSDs at the same price as excellent ones. Watch out.
So you advise the SN850x? I guess you also separated OS from storage? I'm also interested in setting up CEPH in the future.

I now have an Lexar 4TB nvme in my Proxmox node. It has TLC and a high TBW but no dram I think..
 
Hi,
I'm very satisfied with the SN850x, definitely a good value. But it's probably not the only one. Check around, check comparisons online in video or texts...
SN850x 4TB is 4GB of DRAM (for read cache), pSLC for write cache (volume is usually not announed), and TLC based. It also exists in 8TB which makes it very interesting if you're looing for performances since performances drop when you use more than 50% of the disk on all those general public hardware.

Maybe first, check some documents like this one:
https://theoverclockingpage.com/202...f-an-nvme-ssd-benchmarks-and-results/?lang=en

Regards,
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!