NVME SSD durability under Proxmox

HomeLabNerd · Dec 14, 2024

Probably asked before but I like to know for a homelab how is the NVME SSD durability under Proxmox. Is it for example wise to separate OS from VM storage because of the writes? So two separate NVME SSD's? I do not run CEPH or ZFS.

UdoB · Dec 14, 2024

HomeLabNerd said:
Is it for example wise to separate OS from VM storage

Yes, sure. The historical reasoning is still valid: with separate disks you can (for example) reinstall the OS and the other Datastore is not affected at all.

But... before I would do that I would really prefer to have some form of redundancy --> ZFS mirrors. This is much more important (to me!) and much more valuable than having a non-redundant "OS"-disk and a non-redundant "VM"-disk. Possibly without bit-rot-detection and on (too) cheap devices.

HomeLabNerd said:
...because of the writes?

No. That would make no difference - as long as the same data has to be written somewhere...?

Disclaimer: opinionated post - as most posts are. Here: I do always use ZFS, as long as it is technically feasible. And yes, the recommendation to use "Enterprise-class"-SSDs has a serious background.

Kingneutron · Dec 14, 2024

If you're not running zfs, should be fine. Turn off atime everywhere (including in-VM) and run e.g. log2ram and possibly zram for swap, and turn off cluster services.

It's always a good idea to separate OS + Data, especially since the proxmox installer wipes the target disk.

HomeLabNerd · Dec 15, 2024

UdoB said:
Yes, sure. The historical reasoning is still valid: with separate disks you can (for example) reinstall the OS and the other Datastore is not affected at all.

Good point!

UdoB said:
But... before I would do that I would really prefer to have some form of redundancy --> ZFS mirrors. This is much more important (to me!) and much more valuable than having a non-redundant "OS"-disk and a non-redundant "VM"-disk. Possibly without bit-rot-detection and on (too) cheap devices.

Maybe in the future I will add a ZFS mirror pool. It is still a work in progress and very educational. I really like Proxmox.

UdoB said:
No. That would make no difference - as long as the same data has to be written somewhere...?

True.

UdoB said:
Disclaimer: opinionated post - as most posts are. Here: I do always use ZFS, as long as it is technically feasible. And yes, the recommendation to use "Enterprise-class"-SSDs has a serious background.

Thank you for the reply and your vision!

HomeLabNerd · Dec 15, 2024

Kingneutron said:
If you're not running zfs, should be fine. Turn off atime everywhere (including in-VM) and run e.g. log2ram and possibly zram for swap, and turn off cluster services.

I've disabled logging with noatime but I only could find this setting in LXC containers. How is this done with VM's?

I have to look into log2ram. I understand that logs are written to RAM and then daily to disk.I have to check on zram too. I assume that these packages are installed via APT?

Kingneutron said:
It's always a good idea to separate OS + Data, especially since the proxmox installer wipes the target disk.

Good point too.

Thank you for your input!

news · Dec 15, 2024

Use zfs set option=value dataset

HomeLabNerd · Dec 15, 2024

news said:
Use zfs set option=value dataset

At the moment I don't use ZFS. It is something that I want to add later. I use local LVM storage.

Kingneutron · Dec 15, 2024

In-vm, with *nix, you set ' noatime ' on all mount entry options except swap

For Windows VMs:
https://github.com/kneutron/ansitest/blob/master/winstuff/noatime.cmd

aabraham · Dec 16, 2024

HomeLabNerd said:
Probably asked before but I like to know for a homelab how is the NVME SSD durability under Proxmox. Is it for example wise to separate OS from VM storage because of the writes? So two separate NVME SSD's? I do not run CEPH or ZFS.

As long as you use somewhat good-quality NVMEs, you should be fine. (Samsung, Kingston)

Johannes S · Dec 16, 2024

aabraham said:
As long as you use somewhat good-quality NVMEs, you should be fine. (Samsung, Kingston)

The Brand doesn't matter as much as the durability. Enterprise SSDs with Power loss protection are recommended for this reason. The Brand isn't as important as PLP

Johannes S · Dec 16, 2024

Kingneutron said:
If you're not running zfs, should be fine. Turn off atime everywhere (including in-VM) and run e.g. log2ram and possibly zram for swap, and turn off cluster services.

For PBS atime is needed though, so PBS vm and datastores are an exception of the rule.

aabraham · Dec 16, 2024

Johannes S said:
The Brand doesn't matter as much as the durability. Enterprise SSDs with Power loss protection are recommended for this reason. The Brand isn't as important as PLP

Hence the usage of "somewhat good-quality". There are so many brands making SSDs and it can be a tad confusing.

francoisd · Dec 16, 2024

Hi,

You have to look at the endurance specs of your SSDs, especially DWPD ((full) Disk (capacity) Write per day) or the TBW (TerraBytes Written).
For low budget, I prefered to take only 1 SSD for everything, but opted to a quite high end home one with 2400 TBW : https://shop.sandisk.com/products/ssd/internal-ssd/wd-black-sn850x-nvme-ssd?sku=WDS400T2X0E-00BCA0

For an SSD running 24x7, you can convert with formula like:

Code:

TBW = DWPD × Drive Capacity (TB) × 365 × Warranty Years

Or use "Service years" instead of Warranty Years if you want to use them for different periods.

Once you selected your drives, you must then regularly watch the SMART data. I had an unsuited SSDs (for the same price) that I had to replace after few months. So, don't save 50€ on your SSDs or you might have to replace them

Ensure the SSDs have Cache (for read) and some SLC cache. That's why I opted for the SN850X. My previous SSD didn't had any of them, so it was terribly slow under load (concurrency), and aging very quickly (some 2% / month).

Real use case - After 10 months of use in a Home Game Servers & Lab - Intensive use + Ceph + CephFS on a 3 nodes cluster with full mesh network for the ceph storage daemons on a 4TB wd_black:

Code:

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        40 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    5%
Data Units Read:                    260,749,322 [133 TB]
Data Units Written:                 523,270,552 [267 TB]
Host Read Commands:                 2,919,761,751
Host Write Commands:                13,178,553,837
Controller Busy Time:               32,990
Power Cycles:                       92
Power On Hours:                     6,705
Unsafe Shutdowns:                   85
Media and Data Integrity Errors:    0
Error Information Log Entries:      2
Warning  Comp. Temperature Time:    2
Critical Comp. Temperature Time:    0

I wrote 267 TB, which is a bit more than 10% of the 2400 TBW, and the Percentage used is only 5%, which is VERY good.

The unsafe Shutdowns count might make no sense, but it does: I had a poor Tuya Remote Power plug from AliExpress that was powering off-on this server every 5 or 10 minutes, and I took some time to realize the problem and replace the faulty plug.

6,705 hours are about 280 days.

So, whatever SSD you choose and even before the number of SSDs you'll place in your Servers, make sure to:
* Use SSDs with a high TBW or DWPD
* Use SSDs with cache (RAM), and not a pseudo cache made by the driver.
* Use SSDs with power safe write cache on SLC. (Single Layer Cell). This is a small zone of the SSD reserved for those operations.
* Run away from QLC (Quad Layers Cell) if you do not want do loose your money (and your data).
* Monitor your disks.

Professional SSDs will be 3-5x more expensive, but they will have more constant throughput under high load, with better reliability and predictibiilty:
* Have a much faster / bigger SLC write cache
* Have a faster controller to operate under higher loads
* 2-3x more TBW or DWPD (At least 3 DWPD).

In today's market, and since many people do not master those characteristics, you can find crappy SSDs at the same price as excellent ones. Watch out.

HomeLabNerd · Dec 18, 2024

francoisd said:
Hi,

You have to look at the endurance specs of your SSDs, especially DWPD ((full) Disk (capacity) Write per day) or the TBW (TerraBytes Written).
For low budget, I prefered to take only 1 SSD for everything, but opted to a quite high end home one with 2400 TBW : https://shop.sandisk.com/products/ssd/internal-ssd/wd-black-sn850x-nvme-ssd?sku=WDS400T2X0E-00BCA0

For an SSD running 24x7, you can convert with formula like:

Code:

TBW = DWPD × Drive Capacity (TB) × 365 × Warranty Years

Or use "Service years" instead of Warranty Years if you want to use them for different periods.

Once you selected your drives, you must then regularly watch the SMART data. I had an unsuited SSDs (for the same price) that I had to replace after few months. So, don't save 50€ on your SSDs or you might have to replace them

Ensure the SSDs have Cache (for read) and some SLC cache. That's why I opted for the SN850X. My previous SSD didn't had any of them, so it was terribly slow under load (concurrency), and aging very quickly (some 2% / month).

Real use case - After 10 months of use in a Home Game Servers & Lab - Intensive use + Ceph + CephFS on a 3 nodes cluster with full mesh network for the ceph storage daemons on a 4TB wd_black:

Code:

SMART/Health Information (NVMe Log 0x02) Critical Warning: 0x00 Temperature: 40 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 5% Data Units Read: 260,749,322 [133 TB] Data Units Written: 523,270,552 [267 TB] Host Read Commands: 2,919,761,751 Host Write Commands: 13,178,553,837 Controller Busy Time: 32,990 Power Cycles: 92 Power On Hours: 6,705 Unsafe Shutdowns: 85 Media and Data Integrity Errors: 0 Error Information Log Entries: 2 Warning Comp. Temperature Time: 2 Critical Comp. Temperature Time: 0

I wrote 267 TB, which is a bit more than 10% of the 2400 TBW, and the Percentage used is only 5%, which is VERY good.

The unsafe Shutdowns count might make no sense, but it does: I had a poor Tuya Remote Power plug from AliExpress that was powering off-on this server every 5 or 10 minutes, and I took some time to realize the problem and replace the faulty plug.

6,705 hours are about 280 days.

So, whatever SSD you choose and even before the number of SSDs you'll place in your Servers, make sure to:
* Use SSDs with a high TBW or DWPD
* Use SSDs with cache (RAM), and not a pseudo cache made by the driver.
* Use SSDs with power safe write cache on SLC. (Single Layer Cell). This is a small zone of the SSD reserved for those operations.
* Run away from QLC (Quad Layers Cell) if you do not want do loose your money (and your data).
* Monitor your disks.

Professional SSDs will be 3-5x more expensive, but they will have more constant throughput under high load, with better reliability and predictibiilty:
* Have a much faster / bigger SLC write cache
* Have a faster controller to operate under higher loads
* 2-3x more TBW or DWPD (At least 3 DWPD).

In today's market, and since many people do not master those characteristics, you can find crappy SSDs at the same price as excellent ones. Watch out.

So you advise the SN850x? I guess you also separated OS from storage? I'm also interested in setting up CEPH in the future.

I now have an Lexar 4TB nvme in my Proxmox node. It has TLC and a high TBW but no dram I think..

francoisd · Dec 18, 2024

Hi,
I'm very satisfied with the SN850x, definitely a good value. But it's probably not the only one. Check around, check comparisons online in video or texts...
SN850x 4TB is 4GB of DRAM (for read cache), pSLC for write cache (volume is usually not announed), and TLC based. It also exists in 8TB which makes it very interesting if you're looing for performances since performances drop when you use more than 50% of the disk on all those general public hardware.

Maybe first, check some documents like this one:
https://theoverclockingpage.com/202...f-an-nvme-ssd-benchmarks-and-results/?lang=en

Regards,

Search

Search

NVME SSD durability under Proxmox

HomeLabNerd

Member

UdoB

Distinguished Member

Kingneutron

Renowned Member

HomeLabNerd

Member

HomeLabNerd

Member

news

Well-Known Member

HomeLabNerd

Member

Kingneutron

Renowned Member

aabraham

Proxmox Staff Member

Johannes S

Renowned Member

Johannes S

Renowned Member

aabraham

Proxmox Staff Member

francoisd

Renowned Member

HomeLabNerd

Member

francoisd

Renowned Member

We value your privacy