Disk Cache wiki documentation

IsThisThingOn

Member
Nov 26, 2021
83
21
13
Hi everyone

I was reading through the wiki
https://pve.proxmox.com/wiki/Performance_Tweaks#Disk_Cache
have some questions and maybe even suggestions for improvement. Sometimes it also contradicts the points from the article it references to.

I'm no expert by any means and not a native speaker, so please don't take these suggestions as an insult. I could be very wrong on some basic understandings :)
Because it is a complex topic, I like to number the different points.

1. Maybe start with an exclaimer that for a secure writeback you need the barrier-passing feature. That feature is enabled by default on ext4
2. Do NTFS or ZFS guests also support write barriers by default?
3. The wiki says this only applies to RAW, but I was unable to find good documentation how these caching modes behave for qcow2. Anyone has some inputs?
4. The table is a little bit confusing to newcomers. I think simpler to understand by splitting up the Host Page Cache into read and write and what data can be lost in a power failure. Also I would replace "power loss" with "unexpected shutdown" because that covers more incidents than can happen.
5. Rename the mode "none" to "no cache" or "Default (no cache)", because that is how the GUI shows it
6. The description for cache=none says: "Because the actual storage device may report a write as completed when placed in its write queue only". For what device is that true? Are we talking about the drive firmware lying? I had this discussion in a TrueNAS forum, where I asked how we can trust HDDs to not have writes in a queue and the general consent was that there is no drive lying about having writes written while they are still in cache no matter if SSD or HDD. Otherwise the outrage would be even bigger than the SMR fiasko from WD. That is why I would ditch the "in case of power failure you can lose data"
7. "cache=none seems to be the best performance", can that really be true? Compared to other modes, it is just not using cache and directly talking to the disk, how can that be the best performance?
8. Reorder the modes to match the order of the GUI
9. According to the notes, directsync is the safest option. But why? Write through behaves the same but with a read cache, how is that less secure?


Here is how I as a beginner would write that wiki entry:



ModeProxmox Page cache readProxmox Page cache writeDisk Writedata loss from unexpected shutdowns?Notes
default (No Cache)disableddisablednormal behaviorOnly async writesNormal read performance

Normal write performance

Safe with enabled host barrier support
Direct Syncdisableddisabledforces syncNoNormal read performance

Slow write performance,
because even async writes are written as sync writes.

Safe even without host barrier support.
Write throughenableddisabledforces syncNoGood read performance, because of cache

Slow write performance, because even async writes are written as sync writes

Safe even without host barrier support
Write backenabledenablednormal behaviorOnly async writesGood read performance

Normal write performance

Safe with enabled barrier support
Write back (unsafe)enabledenabledignores flushYesGood read performance

Good write performance for sync writes

Very unsafe! Not recommended!

default (No Cache) is the default since Proxmox 2.X.
  • host page cache is not used
  • In case of a power failure, you could loose async writes
  • You need to use the barrier option in your Linux guest to avoid FS corruption in case of power failure.
This mode causes qemu-kvm to interact with the disk image file or block device with O_DIRECT semantics, so the host page cache is bypassed and I/O happens directly between the qemu-kvm userspace buffers and the storage device. The guest is expected to send down flush commands as needed to manage data integrity. Equivalent to direct access to your hosts' disk, performance wise.

Direct Sync
  • host page cache is not used
  • guest disk cache mode is writethrough
  • similar to Write through, an fsync is made for each write.
This mode causes qemu-kvm to interact with the disk image file or block device with both O_DSYNC and O_DIRECT semantics, where writes are reported as completed only when the data has been committed to the storage device, and when it is also desirable to bypass the host page cache. Like cache=writethrough, it is helpful to guests that do not send flushes when needed. It was the last cache mode added, completing the possible combinations of caching and direct access semantics.

Write through
  • host page cache is used as read cache
  • guest disk cache mode is writethrough
  • similar to Direct Sync, an fsync is made for each write.
This mode causes qemu-kvm to interact with the disk image file or block device with O_DSYNC semantics, where writes are reported as completed only when the data has been committed to the storage device. The host page cache is used in what can be termed a writethrough caching mode. Guest virtual storage adapter is informed that there is no writeback cache, so the guest would not need to send down flush commands to manage data integrity. The storage behaves as if there is a writethrough cache.

Write Back
  • host page cache is used as read & write cache
  • guest disk cache mode is writeback
  • In case of a power failure, you could loose async writes
  • You need to use the barrier option in your Linux guest to avoid FS corruption in case of power failure.
This mode causes qemu-kvm to interact with the disk image file or block device with neither O_DSYNC nor O_DIRECT semantics, so the host page cache is used and writes are reported to the guest as completed when placed in the host page cache, and the normal page cache management will handle commitment to the storage device. Additionally, the guest's virtual storage adapter is informed of the writeback cache, so the guest would be expected to send down flush commands as needed to manage data integrity.

Write Back (unsafe)
  • as Write Back, but ignores flush commands from the guest!
  • Warning: No data integrity even if the guest is sending flush commands. Not recommended for production use.
 
Last edited:
Thanks for the feedback. It is always good if someone reads it critically. I'll see how I can incorporate them :)

Some answers to your questions:

First off, Qemu might not interact directly with a disk only separated by a thin file system layer. There can be quite a few other storage layers in between, such as ZFS, Ceph, Gluster, iSCSI, … you name it, that can have their own quirks in how they implement access semantics.

The different cache modes are all possible combinations of O_DIRECT and O_DSYNC semantics ( see man 2 open).

6. The description for cache=none says: "Because the actual storage device may report a write as completed when placed in its write queue only". For what device is that true? Are we talking about the drive firmware lying? I had this discussion in a TrueNAS forum, where I asked how we can trust HDDs to not have writes in a queue and the general consent was that there is no drive lying about having writes written while they are still in cache no matter if SSD or HDD. Otherwise the outrage would be even bigger than the SMR fiasko from WD. That is why I would ditch the "in case of power failure you can lose data"
Do you have a link to that TrueNAS forum thread?

This thread on twitter is quite interesting in that regard: https://twitter.com/xenadu02/status/1495693475584557056
Scrolling through the responses is worth it ;)

Consumer SSDs are definitely lying about data being written in a way that it can be powered off in that exact moment. For example, if you benchmark them for 10 minutes without sync=1, they will, as expected, show a performance that is close to the specs. If you enable sync, then some will drop considerably in their bandwidth performance, while others will keep the same speeds… Without real power loss protection (capacitors) where the RAM can still be written down to non-volatile memory if power is cut, they have no way to actually achieve these sustained speeds. I my small test sample, the more reputable vendors were the ones dropping in speed once sync was enabled.


7. "cache=none seems to be the best performance", can that really be true? Compared to other modes, it is just not using cache and directly talking to the disk, how can that be the best performance?
The guest OS usually has its own page cache, in which it stores non-sync writes until they are written down to disk. Doing that twice comes with a cost. But, as almost always in live, it depends. There are situations where you might see an improvement if you use the host's page cache.

9. According to the notes, directsync is the safest option. But why? Write through behaves the same but with a read cache, how is that less secure?
You actually read the data from the disk/storage and not the cache. It is not 100% out of the question, that the data from the kernel down to the disk gets corrupted. Thus, the cache does reflect what you want the data to be, but it isn't what is on the actual storage.
 
Hi arron, thanks for taking the time to read it and write a response.
The different cache modes are all possible combinations of O_DIRECT and O_DSYNC semantics ( see man 2 open).
Not sure if I really understand the man page.
It sais:

Code:
To guarantee synchronous I/O,
O_SYNC must be used in addition to O_DIRECT

So to be 100% all data gets written and nothing is lost in an outage, I have to use direct sync as cache, because this is the only one that uses both O_SYNC and O_Direct?
I have a hard time believing this. Does that mean that to get the same behavior as a bare metal machine (a guaranteed write for sync writes) I have to use direct sync and force my async writes to be sync?


Do you have a link to that TrueNAS forum thread?
Sure. If you don't wanna read the whole thread, here is the gist of it: https://www.truenas.com/community/t...-pool-power-loss-protection.87100/post-721914

This thread on twitter is quite interesting in that regard: https://twitter.com/xenadu02/status/1495693475584557056
Scrolling through the responses is worth it ;)
Wow, not surprised by the Sabrent but SK Hynix looks bad.

Consumer SSDs are definitely lying about data being written in a way that it can be powered off in that exact moment. For example, if you benchmark them for 10 minutes without sync=1, they will, as expected, show a performance that is close to the specs. If you enable sync, then some will drop considerably in their bandwidth performance, while others will keep the same speeds… Without real power loss protection (capacitors) where the RAM can still be written down to non-volatile memory if power is cut, they have no way to actually achieve these sustained speeds. I my small test sample, the more reputable vendors were the ones dropping in speed once sync was enabled.
In my experience, most consumer SSDs are in fact way slower and will drop in performance if you enable sync because they don't lie. Even your twitter link, I read as:
"1 out of 7 reputable consumer(!) SSDs (Samsung, WD, Intel, Kingston, Seagate, Crucial, SK Hynix) don't loose data"

The guest OS usually has its own page cache, in which it stores non-sync writes until they are written down to disk. Doing that twice comes with a cost. But, as almost always in live, it depends. There are situations where you might see an improvement if you use the host's page cache.
That is a great point I have not thought about. So would you say that most of the time, it is not really worth bothering with cache? Can you explain a scenario where there are benefits of using cache?
 
Last edited:
So to be 100% all data gets written and nothing is lost in an outage, I have to use direct sync as cache, because this is the only one that uses both O_SYNC and O_Direct?
I have a hard time believing this. Does that mean that to get the same behavior as a bare metal machine (a guaranteed write for sync writes) I have to use direct sync and force my async writes to be sync?
Even on a bare metal machine, you would need to issue writes with direct & sync to only get the ACK once it is fully written down. Since that would mean that the performance is terrible, caching is used all the time. Typically, the OS flushes down writes periodically, every few seconds.

Applications that need good write assurances, for example databases, issue their writes with sync. For everything else, "we" have settled that cached writes are a good balance between write assurances and performance.

In my experience, most consumer SSDs are in fact way slower and will drop in performance if you enable sync because they don't lie.
I got my hands on some NVMEs last year, a Samsung 980 Pro, a WD Black SN 770 and a Crucial P5 Plus, all 2 TiB.

The Samsung was the only one who dropped to a few hundred IOPS once sync was enabled for the FIO 4k benchmarks. The other two were still at a few thousand IOPS with sync. I suspect that they lie about the speed and are still using a cache when ACKing the sync writes. The way the graphs oscillate kinda look like it. Unfortunately, I did not have any means to connect them in a way where I could detach them right after a flush / sync write test to see if the data was actually written.


That is a great point I have not thought about. So would you say that most of the time, it is not really worth bothering with cache? Can you explain a scenario where there are benefits of using cache?
Unless you don't have performance issues, leave the defaults (no cache). If you do experience issues, the writeback is the one I would try. Ceph (RBD) would be an example where it can improve the overall performance. RBD doesn't perform too well with many tiny writes. So using the writeback cache to additionally bunch them into less but larger writes can help.

While we are at it, Ceph is really happy if it gets SSDs with power loss protection, as they will ACK once the data is in its cache. And it is okay as it has enough power stored to write down everything from the cache to the non volatile memory.
 
Even on a bare metal machine, you would need to issue writes with direct & sync to only get the ACK once it is fully written down. Since that would mean that the performance is terrible, caching is used all the time. Typically, the OS flushes down writes periodically, every few seconds.

Applications that need good write assurances, for example databases, issue their writes with sync. For everything else, "we" have settled that cached writes are a good balance between write assurances and performance.
That is how I understand it to be handled. Default behavior is to only write stuff sync that is requested by the application to be sync, and write everything else async. But now I get what this doc means by "safest"! In this context it means "even safer than bare metal, because unlike bare metal, even stuff that did not ask for sync, gets sync and thous will not be lost". To play devils advocate here, one could argue that this is not really more safe, because these things did not expect to be safely written to begin with. But I think I get the argument.

So for me, no cache basically acts the same as a bare metal.

The Samsung was the only one who dropped to a few hundred IOPS once sync was enabled for the FIO 4k benchmarks.
Strange, the twitter thread explicitly states your Crucial P5 Plus and the previous SN 750 to handle it correctly. Maybe this should be a point of discussion here and in the TrueNAS forums, this could be pretty dangerous.

Unless you don't have performance issues, leave the defaults (no cache).
Will do. Is host page cache also in RAM, and because of that ARC will do some read caching anyway?

Thank you guys for your inputs!
 
Last edited:
That is how I understand it to be handled. Default behavior is to only write stuff sync that is requested by the application to be sync, and write everything else async. But now I get what this doc means by "safest"! In this context it means "even safer than bare metal, because unlike bare metal, even stuff that did not ask for sync, gets sync and thous will not be lost". To play devils advocate here, one could argue that this is not really more safe, because these things did not expect to be safely written to begin with. But I think I get the argument.
Yep, depending on the cache mode selected, you can force stricter / safer behavior than the guest OS / application might choose.

Strange, the twitter thread explicitly states your Crucial P5 Plus and the previous SN 750 to handle it correctly.
Again, take it with a grain of salt, as I could not actually test it. It is just my suspicion! And please keep in mind, that consumer SSDs are known to change over time but keep their name. Different firmware, different controllers, different memory chips. It all happened, unfortunately. This even goes so far, that some review sites put a disclaimer on older reviews as the same model that can be currently bought has barely anything in common with the one reviewed.
Different sized SSDs of the same model also might differ. The TL;DR is, that in the consumer SSD space, it is unfortunately really easy to buy something that is not what you expected if you don't pay close attention.

Will do. Is host page cache also in RAM, and because of that ARC will do some read caching anyway?
If you have the disk images on a ZFS pool, then yeah, using writeback (or writethrough) for a disk could probably lead to double caching.
 
  • Like
Reactions: IsThisThingOn
That is how I understand it to be handled. Default behavior is to only write stuff sync that is requested by the application to be sync, and write everything else async. But now I get what this doc means by "safest"! In this context it means "even safer than bare metal, because unlike bare metal, even stuff that did not ask for sync, gets sync and thous will not be lost". To play devils advocate here, one could argue that this is not really more safe, because these things did not expect to be safely written to begin with. But I think I get the argument.

So for me, no cache basically acts the same as a bare metal.


Strange, the twitter thread explicitly states your Crucial P5 Plus and the previous SN 750 to handle it correctly. Maybe this should be a point of discussion here and in the TrueNAS forums, this could be pretty dangerous.


Will do. Is host page cache also in RAM, and because of that ARC will do some read caching anyway?

Thank you guys for your inputs!
As the risk of being strung up and skinned alive for a resurrecting an old thread... I thought it was worth replying to clarify one point that I think may be causing some confusion.

When you say "bare metal", you are implying that is the standard, the bar to aim for in terms of data safety.... but you are forgetting one critical point:
The final caching is done by the drive itself and that is almost always *enabled* by default.

That is why the verbiage about "data can still be lost with a power outage even if you disable your OS or (shudder... raid card) write-back cache."

Until you disable the actual DRIVE cache(s, if raid), you are always running a de-facto "write-back" storage configuration.
Once you disable the drive cache (and all OS caching with direct/sync), only then are you truly safe from fs corruption / data-lose due to unexpected power failure or system crash.

Hope this helps.
 
Thanks for the clarification. I have learned some stuff since then, so I hope you don't mind my follow up questions.
The final caching is done by the drive itself and that is almost always *enabled* by default.
Sure, but only for async writes. So there is no difference between bare metal or a VM with no cache. At least that is how I understand it.

That is why the verbiage about "data can still be lost with a power outage even if you disable your OS or (shudder... raid card) write-back cache."
But that is only none important data, because it was not sync, right?
only then are you truly safe from fs corruption
Should fs corruption not be prevented by the fact that NTFS and ext4 is journaled?
data-lose
data lose of unimportant data, because it was async. Just like every "normal PC".


To play devils advocate:
it would be unreasonable to do every write on an Windows bare metal Office PC in sync, and thous it is unreasonable to do go for writeback in a "normal" Windows VM.
 
Once you disable the drive cache (and all OS caching with direct/sync), only then are you truly safe from fs corruption / data-lose due to unexpected power failure or system crash.
Hi,

Most modern fs have jurnals, so it is very dificult to have a fs coruption. Also any decent DB have transaction log, so I do not seen in my enviroment any corruption cases.

Some years ago I made several tests, and I cut manualy the power, several times, with no corruption events, no DB data lose!

Good luck / Bafta !
 
  • Like
Reactions: IsThisThingOn
The final caching is done by the drive itself and that is almost always *enabled* by default.
That's by decent RAID controller use BBU and enterprise SSDs use PLP, to deal data in flight during unexpected power loss.
Most modern fs have jurnals, so it is very dificult to have a fs coruption. Also any decent DB have transaction log, so I do not seen in my enviroment any corruption cases.
There are regular threads by people who lost ZFS pools or LVM volumes because of unexpected power loss. I assume that mostly (only?) happens when they used consumer SSDs, which I believe are more vulnerable (no PLP, often host RAM cache) than HDDs.
 
There are still some very dangerous misconceptions here.
Yes, journaling helps avoid filesystem corruption* (by default).
No, it is not *perfect* and you can still get a corrupted filesystem under certain (very rare) conditions... however, that is not what I am talking about.

I'm talking about *data* corruption which absolutely can happen, when your drive(s) (consumer or server, spinners or SSDs) have their internal caches enabled and you lose power, for example, between the journal is being written and the data write completing.
Your filesystem will be able to recover and avoid corruption, but your DATA is going to be corrupt. EVEN WITH SYNCHRONOUS transactions.

To go a step further, if you are unlucky enough (and over time, we all are) to lose power and/or crash in the middle of the journal being written, you will have corrupted data and most likely some file system corruption as well, but it

The ONLY way to guarantee atomic filesystem transactions (i.e. be assured your data is fully intact on disk), is to DISABLE any form of write-back caching (OS, Raid card, and disks), and use SYNC writes, and FULL journaling, as opposed to just metadata journaling.

Here is a great explanation (and I am somewhat lazy, so why re-type it?) by someone who is absolutely spot-on, from serverfault.

To be clear, I've been designing, building and operating high performance, redundant storage systems for several very large companies for over 25 years.

Here is a great synopsis from "BaronSamedi1958"
Data corruption when disk write caching is enabled


"
The most common type of journaling, called metadata journaling, only protects the integrity of the file system, not of data. This includes xfs, and ext3/ext4 in the default data=ordered mode.


If a non-journaling file system suffers a crash, it will be checked using fsck on the next boot. fsck scans every inode on the file system, looking for blocks that are marked as used but are not reachable (i.e. have no file name), and marks those blocks as unused. Doing this takes a long time.


With a metadata journaling file system, instead of doing an fsck, it knows which blocks it was in the middle of changing, so it can mark them as free without searching the whole partition for them. i.e. your data is NOT (entirely) written to the disk. i.e. it is corrupt.


There is a less common type of journaling, called data journaling, which is what ext3 does if you mount it with the data=journal option.


It attempts to protect all your data by writing not just a list of logical operations, but also the entire contents of each write to the journal. But because it's writing your data twice, it can be much slower.


As others have pointed out, even this is not a guarantee, because the hard drive might have told the operating system it had stored the data, when it fact it was still in the hard drive's cache. (unless you specifically disable it... hence, my original post. --infinisean)


For more information, take a look at the Wikipedia Journaling File System article and the Data Mode section of the ext4 documentation.
 
Hi,

Most modern fs have jurnals, so it is very dificult to have a fs coruption. Also any decent DB have transaction log, so I do not seen in my enviroment any corruption cases.

Some years ago I made several tests, and I cut manualy the power, several times, with no corruption events, no DB data lose!

Good luck / Bafta !
Yes, you are correct. The DB transaction log is another level of "journaling", and it necessary to ensure "atomic" commits.
The more layers of protection, the better.

Ultimately, plan on losing your data at some point (because you will, eventually) and have backups.
Offsite.
Multiple copies.

Trust me :)
 
  • Like
Reactions: guletz
Thanks for the clarification. I have learned some stuff since then, so I hope you don't mind my follow up questions.

Sure, but only for async writes. So there is no difference between bare metal or a VM with no cache. At least that is how I understand it.


But that is only none important data, because it was not sync, right?

Should fs corruption not be prevented by the fact that NTFS and ext4 is journaled?

data lose of unimportant data, because it was async. Just like every "normal PC".


To play devils advocate:
it would be unreasonable to do every write on an Windows bare metal Office PC in sync, and thous it is unreasonable to do go for writeback in a "normal" Windows VM.
I don't understand your point here...
Guaranteed Data Integrity comes at the cost of performance. Period. Always will.
Whether that tradeoff is worth it (to you) for a "normal" windows VM is your choice... to anyone who can't bear to lose any data, it certainly wouldn't be "unreasonable"

The only thing that is unreasonable is to expect to be able to get perfect data integrity without it costing you any performance (or money, but that's another topic).

The age old saying goes "You want it Fast, Cheap, or Reliable? Pick any two..."
Holds true for just about everything, in my experience.
 
  • Like
Reactions: guletz
There are still some very dangerous misconceptions here.
Yes, journaling helps avoid filesystem corruption* (by default).
No, it is not *perfect* and you can still get a corrupted filesystem under certain (very rare) conditions... however, that is not what I am talking about.

I'm talking about *data* corruption which absolutely can happen, when your drive(s) (consumer or server, spinners or SSDs) have their internal caches enabled and you lose power, for example, between the journal is being written and the data write completing.
Your filesystem will be able to recover and avoid corruption, but your DATA is going to be corrupt. EVEN WITH SYNCHRONOUS transactions.

To go a step further, if you are unlucky enough (and over time, we all are) to lose power and/or crash in the middle of the journal being written, you will have corrupted data and most likely some file system corruption as well, but it

The ONLY way to guarantee atomic filesystem transactions (i.e. be assured your data is fully intact on disk), is to DISABLE any form of write-back caching (OS, Raid card, and disks), and use SYNC writes, and FULL journaling, as opposed to just metadata journaling.

Here is a great explanation (and I am somewhat lazy, so why re-type it?) by someone who is absolutely spot-on, from serverfault.

To be clear, I've been designing, building and operating high performance, redundant storage systems for several very large companies for over 25 years.

Here is a great synopsis from "BaronSamedi1958"
Data corruption when disk write caching is enabled


"
The most common type of journaling, called metadata journaling, only protects the integrity of the file system, not of data. This includes xfs, and ext3/ext4 in the default data=ordered mode.


If a non-journaling file system suffers a crash, it will be checked using fsck on the next boot. fsck scans every inode on the file system, looking for blocks that are marked as used but are not reachable (i.e. have no file name), and marks those blocks as unused. Doing this takes a long time.


With a metadata journaling file system, instead of doing an fsck, it knows which blocks it was in the middle of changing, so it can mark them as free without searching the whole partition for them. i.e. your data is NOT (entirely) written to the disk. i.e. it is corrupt.


There is a less common type of journaling, called data journaling, which is what ext3 does if you mount it with the data=journal option.


It attempts to protect all your data by writing not just a list of logical operations, but also the entire contents of each write to the journal. But because it's writing your data twice, it can be much slower.


As others have pointed out, even this is not a guarantee, because the hard drive might have told the operating system it had stored the data, when it fact it was still in the hard drive's cache. (unless you specifically disable it... hence, my original post. --infinisean)


For more information, take a look at the Wikipedia Journaling File System article and the Data Mode section of the ext4 documentation.
Hi,

Your discuss is OK. But you do not take in account the probabilities that a such disaster event will occour. As I see in real life you do not have any guaranties.

Each level of protection will have a cost(money, manpower, time, performance, and so on). So any wise admin will try to find a optimum point betheen cost, performance and data saftey(you can get 2 from 3 at the same time).

In any enviroment, you need to make a very dificult balance, of this 3 items. Also security is another a big point, who change a lot of things at least from last 5 years.

This decisions make a great ADMIN, or a stupid one. I know very good tehnical admins, but with no balance capabilieties!

Good luck / Bafta !
 
  • Like
Reactions: IsThisThingOn
Hi,

Your discuss is OK. But you do not take in account the probabilities that a such disaster event will occour. As I see in real life you do not have any guaranties.

Each level of protection will have a cost(money, manpower, time, performance, and so on). So any wise admin will try to find a optimum point betheen cost, performance and data saftey(you can get 2 from 3 at the same time).

In any enviroment, you need to make a very dificult balance, of this 3 items. Also security is another a big point, who change a lot of things at least from last 5 years.

This decisions make a great ADMIN, or a stupid one. I know very good tehnical admins, but with no balance capabilieties!

Good luck / Bafta !
What?? I DO take into account the probabilities... I specifically said "in rare circumstances" that is a probability.
You made statements that were "absolute" and I was correcting that your assumptions of "prevention" were not guaranteed, the way you thought they were.

And yes, you definitely CAN ensure data integrity 100%, and I laid out how. it takes many layers of redundancy and conscious configuration by someone who understands the system in question from top (software stack / kernel) to bottom (the individual disks).
And money. it takes money because the most important thing is redundancy (which adds up, quickly, in financial terms) not just in hardware/disk/server level, but in backups across geographically redundant locations and proper administration.

That is the only way to "guarantee" your data.

Lastly, I'm not sure what point you are trying to make about "balance".... or if your comment about "stupid admins" is directed at me or not (or what value it adds to this conversation, for that matter) since that is *exactly* the point I made in my last post, that there is a "tradeoff" (or balance) when it comes to "Performance, Cost and Reliability" (and to choose two, as the saying goes).

Good luck with your data.
 
I'm talking about *data* corruption which absolutely can happen, when your drive(s) (consumer or server, spinners or SSDs) have their internal caches enabled
Yes, if drives lie about the sync writes that are still in cache, that is dangerous. But even with writeback you can't prevent that. To my knowledge, that is a hardware problem, not a software problem.
As others have pointed out, even this is not a guarantee, because the hard drive might have told the operating system it had stored the data, when it fact it was still in the hard drive's cache.
Yes, and what software could have prevented that? I would argue not even ZFS.
That is why this https://x.com/xenadu02/status/1495693475584557056 is so dangerous.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!