ZFS On Debian or MDadm SoftRaid ? stability, and reliability of ZFS

The first thing that comes to my mind when talking about md/dm raid is this: The default caching mode for VMs we use is none, that is, the VMs will access disks with the O_DIRECT flag. The MD/DM raid implementations, for this case, will simply forward a pointer to the memory region to each individual block device, and each of those will copy the data from memory separately. If a 2nd thread is currently writing to that data, the underlying disks will sooner or later write different data, immediately corrupting the raid.[1]

In [1] I also mentioned a real case where this can happen: An in-progress write to swap happening for memory that is simultaneously freed, therefore the swap entry is already discarded while the disk I/O still happening, causing the raid to be degraded.

Ideally the kernel would just ignore O_DIRECT here, since it is in fact documented as *trying* to minimize cache effects... not forcibly skipping caches consistency be damned, completely disregarding the one job that for example a RAID1 has: actually writing the *same* data on both disks...

And yes, writing data which is being modified *normally* means you need to expect garbage on disk. However, the point of a RAID is to at least have the *same* garbage on *both* disks, not give userspace a trivial way to degrade the thing.

If you take care not to use this mode, you'll be fine with it though, but you'll be utilizing some more memory.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=99171
@wbumiller:
What cache mode would be recommended if "none" isn't an option with mdadm raid? "Writeback" because writes are cached? What about "writethrough"?
And what about LXCs where I can't choose any cache mode?
Would like to run my Zabbix LXC on top of LVM-Thin on top LUKS encrypted mdadm raid1. Looks like it is working fine so far, but if that might kill the raid then I would need to think about skipping the raid1 and using just LVM-Thin on a single partition. But I really like that redundancy.
ZFS isn`t an option here, all other guests use it, but ZFS will wear the SSDs 3 times faster compared to LVM-Thin and the Zabbix will spam the MySQL DB with millions of metrics that will wear the SSD by hundreds of GBs per day...
 
Last edited:
@wbumiller:
What cache mode would be recommended if "none" isn't an option with mdadm raid? "Writeback" because writes are cached? What about "writethrough"?
And what about LXCs where I can't choose any cache mode?
Would like to run my Zabbix LXC on top of LVM-Thin on top LUKS encrypted mdadm raid1. Looks like it is working fine so far, but if that might kill the raid then I would need to think about skipping the raid1 and using just LVM-Thin on a single partition. But I really like that redundancy.
ZFS isn`t an option here, all other guests use it, but ZFS will wear the SSDs 3 times faster compared to LVM-Thin and the Zabbix will spam the MySQL DB with millions of metrics that will wear the SSD by hundreds of GBs per day...

@Dunuin Since your question was never answered, I think you deserve at least this pointer:
https://forum.proxmox.com/threads/proxmox-4-4-virtio_scsi-regression.31471/page-3#post-711588

This was the earliest instance where I managed to locate mention of the same. And going down the rabbithole, it's not reproducible for me.