ZFS On Debian or MDadm SoftRaid ? stability, and reliability of ZFS

The first thing that comes to my mind when talking about md/dm raid is this: The default caching mode for VMs we use is none, that is, the VMs will access disks with the O_DIRECT flag. The MD/DM raid implementations, for this case, will simply forward a pointer to the memory region to each individual block device, and each of those will copy the data from memory separately. If a 2nd thread is currently writing to that data, the underlying disks will sooner or later write different data, immediately corrupting the raid.[1]

In [1] I also mentioned a real case where this can happen: An in-progress write to swap happening for memory that is simultaneously freed, therefore the swap entry is already discarded while the disk I/O still happening, causing the raid to be degraded.

Ideally the kernel would just ignore O_DIRECT here, since it is in fact documented as *trying* to minimize cache effects... not forcibly skipping caches consistency be damned, completely disregarding the one job that for example a RAID1 has: actually writing the *same* data on both disks...

And yes, writing data which is being modified *normally* means you need to expect garbage on disk. However, the point of a RAID is to at least have the *same* garbage on *both* disks, not give userspace a trivial way to degrade the thing.

If you take care not to use this mode, you'll be fine with it though, but you'll be utilizing some more memory.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=99171
@wbumiller:
What cache mode would be recommended if "none" isn't an option with mdadm raid? "Writeback" because writes are cached? What about "writethrough"?
And what about LXCs where I can't choose any cache mode?
Would like to run my Zabbix LXC on top of LVM-Thin on top LUKS encrypted mdadm raid1. Looks like it is working fine so far, but if that might kill the raid then I would need to think about skipping the raid1 and using just LVM-Thin on a single partition. But I really like that redundancy.
ZFS isn`t an option here, all other guests use it, but ZFS will wear the SSDs 3 times faster compared to LVM-Thin and the Zabbix will spam the MySQL DB with millions of metrics that will wear the SSD by hundreds of GBs per day...
 
Last edited:
@wbumiller:
What cache mode would be recommended if "none" isn't an option with mdadm raid? "Writeback" because writes are cached? What about "writethrough"?
And what about LXCs where I can't choose any cache mode?
Would like to run my Zabbix LXC on top of LVM-Thin on top LUKS encrypted mdadm raid1. Looks like it is working fine so far, but if that might kill the raid then I would need to think about skipping the raid1 and using just LVM-Thin on a single partition. But I really like that redundancy.
ZFS isn`t an option here, all other guests use it, but ZFS will wear the SSDs 3 times faster compared to LVM-Thin and the Zabbix will spam the MySQL DB with millions of metrics that will wear the SSD by hundreds of GBs per day...

@Dunuin Since your question was never answered, I think you deserve at least this pointer:
https://forum.proxmox.com/threads/proxmox-4-4-virtio_scsi-regression.31471/page-3#post-711588

This was the earliest instance where I managed to locate mention of the same. And going down the rabbithole, it's not reproducible for me.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!