Thanks for the hint on the SMART wearout status. I'll keep an eye on it.
I have an Ubuntu server running 24/7 with a handful of docker containers. Some of them can be pretty active though.
The other vms are a couple of Windows machines that I actively turn on when I need them. One of them is a development machine that feels incredibly sluggish because of this ssd issue.
Besides of the much higher TBW on the enterprise grade SSDs you're having, are you experiencing a "much higher" speed compared to your previous consumer SSDs?
Not sure, didn't made benchmarks back then. But in general consumer SSDs are more designed for reads and not with writes in mind. And they should be optimized for high and short bursts of IO instead of medium IO but that 24/7. So on paper enterprise SSDs might look slower with lower bandwidth and IOPS but a consumer SSD might only deliver that high speed for some minutes or seconds until the performance is crashing because the RAM cache and SLC cache gets full and performance will drop to terrible values. A enterprise SSDs performance shouldn't drop that low.
Right now I got 10 enterprise SSDs in my homeserver (paid 10-30€ per SSD) and each SSD got a 4GB DDR3 RAM chip for caching. So all SSDs together got more RAM than your complete server. And that are just the small 100 and 200GB SSDs. The bigger models with more capacity got even more RAM.
And a big difference are sync writes. Enterprise SSDs got a powerloss protection (buildin backup "battery") so they can quickly save cached stuff from the volatile RAM into the nonvolatile NAND if a power outage occures. Consumer/prosumer SSDs don't got such a backup "battery" and all data in the SSDs internal RAM cache will be lost. If a application needs to make sure that important data is really safely stored it will do a sync instead of a async write. A consumer SSD knows that it will loose all data if a power outage occures so it can't cache stuff in RAM and will directly write it to the NAND cells without any caching so the data can't be lost.
But now remember what I wrote earlier how SSDs work.
Lets say the SSD reports to use 4K blocks. But uses 16K blocks internally for reads/writes and a 128K row for erasing.
Now you want to sync write 32x 4K blocks. A enterprise SSD can use the RAM cache and will store this 32x 4K in RAM and immediately report back as "securley written" even if it didn't done a single write to the NAND yet. Then it will merge these 32x 4K blocks in RAM to 8x 16k blocks. Then it will erase a 128K row and write these 8x 16k blocks all at once. So in total it erased 128K, has written 128K and read 0K to store a sum of 128K (32x 4K) of data.
Here is how a consumer SSD handles this: Because it can't cache stuff in RAM it will write each of the 64 4K blocks one after another.
Read 128K from NAND to RAM, erase 128K, write 128K. That all to write a single 4k block. Now it will report back "I saved the first block, send me another one". The hosts sends the second 4k block. The SSD will again read, erase and write 128K and report back...this happens 32 times until all 32 blocks have been written. So in total the consumer SSD will read 4M (32x 128K), erase 4M and write 4M to store only 128K (32x 4K) of data.
So you got a reaaaaally bad read and write amplification here.
Thats why consumer SSDs are so terrible as a DB storage because DBs mainly do small 8K/16K/32K sync writes.