Like the title says, I am trying to understand the implications of using cache=unsafe for some VMs.
Don't you worry, I did my homework and I am well aware about what the PVE documentation says about it (ie https://pve.proxmox.com/wiki/Performance_Tweaks), as well as numerous other sources (eg http://www.ilsistemista.net/index.p...s-on-red-hat-enterprise-linux-62.html?start=2)
Unfortunately, I am still not 100% sure about "how" unsafe this setting is in the end for my VMs.
I have this naive idea how things work at the block device level:
a) we have block devices (that may or may not have their write caches enabled) and
b) we have operating systems that have (page) caches that get filled by write operations until either this cache is full or an operating system component issues a "flush" command on the cache, leading to the cache being finally written to the block device
So, as the documentation says, "cache=unsafe" means that the qemu block device ignores the "flush" command. But what then? I've read that ultimately a flush command is issued once the VM shuts down, but on the other hand it's impossible that the cache never gets flushed because of capacity limits. If I have a write intense application, producing maybe GB of data, it's impossible that this data fits into the VMs page cache (or any other) without getting flushed.
So in terms of "danger to lose data", how much data are we talking about? There must be some point when the data is getting flushed still?
The reason why I am investigating this is because my tests have shown that cache=unsafe provides a really huge I/O performance boost for the block devices (measured by fio).
Don't you worry, I did my homework and I am well aware about what the PVE documentation says about it (ie https://pve.proxmox.com/wiki/Performance_Tweaks), as well as numerous other sources (eg http://www.ilsistemista.net/index.p...s-on-red-hat-enterprise-linux-62.html?start=2)
Unfortunately, I am still not 100% sure about "how" unsafe this setting is in the end for my VMs.
I have this naive idea how things work at the block device level:
a) we have block devices (that may or may not have their write caches enabled) and
b) we have operating systems that have (page) caches that get filled by write operations until either this cache is full or an operating system component issues a "flush" command on the cache, leading to the cache being finally written to the block device
So, as the documentation says, "cache=unsafe" means that the qemu block device ignores the "flush" command. But what then? I've read that ultimately a flush command is issued once the VM shuts down, but on the other hand it's impossible that the cache never gets flushed because of capacity limits. If I have a write intense application, producing maybe GB of data, it's impossible that this data fits into the VMs page cache (or any other) without getting flushed.
So in terms of "danger to lose data", how much data are we talking about? There must be some point when the data is getting flushed still?
The reason why I am investigating this is because my tests have shown that cache=unsafe provides a really huge I/O performance boost for the block devices (measured by fio).