Proxmox and SSD's

tgx · Sep 4, 2024

We are nearing making a decision on which platform we will choose to purge VMware from our infrastructure. We are down to two products with Proxmox still in the running. One area of concern we have found during our tests is the rapid degradation of SSD's when running Proxmox. Literally after only a a few days worth of service the SMART wear indicator jumped 20%. We have had these same model SSD's in service with other installed systems for years and have never seen such rapid degradation. Anyone with insight into possible configuration issues that might result in this phenomena? It is a very useful feature to be able to see the SMART registers, but yielding some alarming results.

waltar · Sep 4, 2024

That is nothing special to Proxmox pve as it's the additional write amplication by zfs through the small checksums, fill and parity writes.
Use enterprise ssd's or zfs mirrors instead of raidz configs when using zfs.

esi_y · Sep 4, 2024

tgx said:
Anyone with insight into possible configuration issues that might result in this phenomena?

Are you using:

1) ZFS on root;
2) Clusters (which need to satisfy the extended virtual synchrony for the shared filesystem);
3) HA on constantly evaluating what's going on and updating it in the underlying filesystem mirrored onto the SSD?

Experiment with the combinations and see if this is the source of your issue.

Falk R. · Sep 4, 2024

tgx said:
We are nearing making a decision on which platform we will choose to purge VMware from our infrastructure. We are down to two products with Proxmox still in the running. One area of concern we have found during our tests is the rapid degradation of SSD's when running Proxmox. Literally after only a a few days worth of service the SMART wear indicator jumped 20%. We have had these same model SSD's in service with other installed systems for years and have never seen such rapid degradation. Anyone with insight into possible configuration issues that might result in this phenomena? It is a very useful feature to be able to see the SMART registers, but yielding some alarming results.

As has already been said, this is not due to Proxmox.
Only consumer SSDs without PLP are subject to such quick wear.
Were the SSDs previously operated on a raid controller with battery cache? The SSDs are extremely protected and the I/O is also written in an optimized way to keep wear to a minimum. If you then operate the same SSDs with a RaidZ1 or RaidZ2, for example, you have the highest possible write amplification and therefore much more wear.
If you want to turn your existing ESXi servers into PVE hosts and have cheap boot SSDs on a raid controller, simply leave the Raid1 in place and install PVE with ext4.
For the VMs I recommend DataCenter NVMe's and they have no problem with wearout, even with RaidZ setups.

esi_y · Sep 4, 2024

Falk R. said:
As has already been said, this is not due to Proxmox.

@tgx You will notice on this forum, that Proxmox VE is entirely flawless product.

Falk R. said:
Only consumer SSDs without PLP are subject to such quick wear.

@tgx If you are curious, apt install iotop, then create e.g. 10 nodes (you can completely virtualise this), then create 10 resources, even just containers, best with shared storage off the nodes, even something completely idling by is fine. Then activate High Availability on them. Then check (on any single node that you won't be taking down during this exercise) - iotop -oP (once interactive press a for cumulative results), watch for (amongst others) pmxcfs, and start migrating those resources around a bit, you can simulate some nodes dying in the process, bring them back up, etc.). You can compare this with what's going through to systemd-journald. And make up your mind, imagine how it scales.

Falk R. said:
Were the SSDs previously operated on a raid controller with battery cache? The SSDs are extremely protected and the I/O is also written in an optimized way to keep wear to a minimum. If you then operate the same SSDs with a RaidZ1 or RaidZ2, for example, you have the highest possible write amplification and therefore much more wear.

@tgx And combine it with this piece of information and the choice of ZFS.

Falk R. said:
If you want to turn your existing ESXi servers into PVE hosts and have cheap boot SSDs on a raid controller, simply leave the Raid1 in place and install PVE with ext4.

@tgx Yes. And check if you mind the iotop numbers you get with the above experiment.

leesteken · Sep 4, 2024

tgx said:
We are nearing making a decision on which platform we will choose to purge VMware from our infrastructure. We are down to two products with Proxmox still in the running. One area of concern we have found during our tests is the rapid degradation of SSD's when running Proxmox. Literally after only a a few days worth of service the SMART wear indicator jumped 20%.

I suspect that you ran VMware with a hardware RAID5 (with BBU?) and Proxmox with RAIDz1, but those are entirely different in performance and usable space. There are better ZFS configurations (with PLP drives) for VM, which will give you less write amplification. Or you can run Proxmox with the same hardware RAID5 (with/without BBU) if you want with your existing drives.

Search

Search

Proxmox and SSD's

tgx

New Member

waltar

Active Member

esi_y

Renowned Member

Falk R.

Distinguished Member

esi_y

Renowned Member

leesteken

Distinguished Member