RAID-1 ZFS in Proxmox with Crucial MX500 (2TB) drives give 500iops - terrible

jsengupta

Active Member
Feb 27, 2020
44
4
28
29
We are trying to run MSSQL 2017 on window 2016 server. We do not have any RAID controller. Instead we have H330 mini HBA controller on our Dell R730xd.

We have created ZFS mirror with 2 Crucial MX500 (2TB) drives with the following settings:
ashift = 12
compression = off
arcsize = 48 Gig

The iops from the windows 2016 server we are getting is around 500 - 800. So terrible. Can anyone please guide us, why is it happening with ZFS?
 
Correct me... but this is a consumer SSD... search Forum or read about ZFS and Consumer/Prosumer SSDs....
When you have a SSD without capacitor and internal Power Loss Protection, ZFS will detect that and "FORCE" "EVERY" write to be fully committed... that drops your IOPS to near nothing. Cause consumer SSD have no PLP..... so forget it..... And using this for a Database-Server is at least no Option.....
 
  • Like
Reactions: pos-sudo and Dunuin
We are planning to purchase kingston DC500M and discard the crucials. Will it work with ZFS and provide iops as we are expecting?

However, the same crucial Mx drives are giving 5K iops in hardware RAID5 mode on the other host with H730 controller.
 
Hardware raid is different implementation than zfs,as are the posibilites. I usually recommend hardware raid if you don't need storage replication from zfs.
 
Hardware raid is different implementation than zfs,as are the posibilites. I usually recommend hardware raid if you don't need storage replication from zfs.

Actually we don’t have the controller having raid. The controller is only having hba mode. So we have no other option but to stick with zfs.

My question is if Kingston DC500M will provide iops in zfs-RAID5/RAID10/RAID1
 
Actually we don’t have the controller having raid. The controller is only having hba mode. So we have no other option but to stick with zfs.

My question is if Kingston DC500M will provide iops in zfs-RAID5/RAID10/RAID1
From Kingston DC500M they claim to be for Datacenter and have PLP.
Dedicated for VMs, Databases and so on..... So yes chance is good they will work well with ZFS.
 
  • Like
Reactions: 0bit
I am wondering how the Kingston DC500M is working for your setup? I have used the Intel D3-4610 and the S4510 in 240GB and 960GB dual raidz setup.
I've also used an ADATA XPG 8200 Pro and the Crucial MX500. The MX500 is certainly wearing down....
 
Correct me... but this is a consumer SSD... search Forum or read about ZFS and Consumer/Prosumer SSDs....
When you have a SSD without capacitor and internal Power Loss Protection, ZFS will detect that and "FORCE" "EVERY" write to be fully committed... that drops your IOPS to near nothing. Cause consumer SSD have no PLP..... so forget it..... And using this for a Database-Server is at least no Option.....

Crucial MX500's have their own PLP implementation that doesn't use capacitors, I'm not sure i'd trust it 100% and would rather have a pair of data centre NVME's to hold writes and flush to SSD in the background but it's better then having nothing.
 
  • Like
Reactions: leesteken
They also don't call it "power-loss protection" like all the other manufacturers using powercaps but "integrated power-loss immunity". ;)
 
Is this different from DRAM-less SDDs where there is no need for PLP because everything is always written to flash, and write amplification is therefore even worse (and speed/IOPS is also lower)?
 
They also don't call it "power-loss protection" like all the other manufacturers using powercaps but "integrated power-loss immunity". ;)

Yeah, it's not the same thing but looking into it I don't see any downsides to having it. I get concerned about SSD's with a DRAM cache although that's a bit off topic so perhaps needs it's own thread.
 
Is this different from DRAM-less SDDs where there is no need for PLP because everything is always written to flash, and write amplification is therefore even worse (and speed/IOPS is also lower)?

Yes it's different.
 
Yes it's different.
Looking some more into thiss, it appears that async writes are simply lost on MX500 when power goes out and sync writes are not cached (so still slow and write amplification). The only thing they "guarantee" is that existing data at rest is not damaged. Good to know that such a basic thing is worthy of being a special feature and therefore probably not present on other drives/brands. Thank you for bringing this warning to my attention.
 
Good to know that such a basic thing is worthy of being a special feature and therefore probably not present on other drives/brands.
Yes...that this is still a thing...
I think these days this is normal but if you look at early SSD generations it wasn't that uncommon that a power-loss was corrupting the internal tables loosing all the stored data. The data was still there but scrambled up and without knowing how to bring them in the correct order the data was useless. I guess SSD internal journaling wasn't a thing back then ;)
 
  • Like
Reactions: leesteken
Looking some more into thiss, it appears that async writes are simply lost on MX500 when power goes out and sync writes are not cached (so still slow and write amplification). The only thing they "guarantee" is that existing data at rest is not damaged. Good to know that such a basic thing is worthy of being a special feature and therefore probably not present on other drives/brands. Thank you for bringing this warning to my attention.
Could you provide me the links to this information? If their "power loss immunity" functionality is essentially a lie i'd like to know more details about that as i have a few of these drives and am/was planning to buy a few more.
 
Could you provide me the links to this information? If their "power loss immunity" functionality is essentially a lie i'd like to know more details about that as i have a few of these drives and am/was planning to buy a few more.
Just some forum posts, which are just opinions of strangers on the internet. Developing the advanced wear-leveling and trimming algorithms probably took some time before they were able to guarantee that data "in rest", which is also shuffled around in the background, is always safe.
Yes it's different.
If you know more about how it works, please let me know. I still assume you need either power backup (battery or capacitor) or write everything to flash (or some other persistent cache or fast flash). And only power backup can reduce write amplification (and cache sync writes), which is almost essential with modern journal and CoW filesystems.
Yes...that this is still a thing...
I think these days this is normal but if you look at early SSD generations it wasn't that uncommon that a power-loss was corrupting the internal tables loosing all the stored data. The data was still there but scrambled up and without knowing how to bring them in the correct order the data was useless. I guess SSD internal journaling wasn't a thing back then ;)
The only time I lost a ZFS mirror was during a power loss while the pool (both "prosumer" MLC drives) was trimming, which is probably a re-ordering case exactly like that.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!