implications of qemu "cache=unsafe"

udotirol · Nov 20, 2020

Like the title says, I am trying to understand the implications of using cache=unsafe for some VMs.

Don't you worry, I did my homework and I am well aware about what the PVE documentation says about it (ie https://pve.proxmox.com/wiki/Performance_Tweaks), as well as numerous other sources (eg http://www.ilsistemista.net/index.p...s-on-red-hat-enterprise-linux-62.html?start=2)

Unfortunately, I am still not 100% sure about "how" unsafe this setting is in the end for my VMs.

I have this naive idea how things work at the block device level:

a) we have block devices (that may or may not have their write caches enabled) and

b) we have operating systems that have (page) caches that get filled by write operations until either this cache is full or an operating system component issues a "flush" command on the cache, leading to the cache being finally written to the block device

So, as the documentation says, "cache=unsafe" means that the qemu block device ignores the "flush" command. But what then? I've read that ultimately a flush command is issued once the VM shuts down, but on the other hand it's impossible that the cache never gets flushed because of capacity limits. If I have a write intense application, producing maybe GB of data, it's impossible that this data fits into the VMs page cache (or any other) without getting flushed.

So in terms of "danger to lose data", how much data are we talking about? There must be some point when the data is getting flushed still?

The reason why I am investigating this is because my tests have shown that cache=unsafe provides a really huge I/O performance boost for the block devices (measured by fio).

leesteken · Nov 20, 2020

Software uses the flush command to make sure that there is an order to things written to disk. If flushes never happen, it is like running the VM from a RAM disk and, of course, this is very fast.

Suppose some software want to delete file 1, but only if the new file 2 is fully written to disk. If a power loss happens while creating file 2, it can fall back to file 1 (because it is still there, therefore file 2 must not have been fully complete and processing restarts). If 1 is deleted (fast operation) before 2 is fully written (slow operation) and a reboot happens, processing cannot restart or worse: uses a broken file 2. This is also important for operating system updates, where you want to make sure you have at least one working version on disk when the system reboots.

In practice, data is written to disk out of order and asynchronously with cache=unsafe. If you wait long enough, all data will be safely written to disk, but when an unexpected power loss happens, no one will know which files are broken because there are not guarantees about the order of writes to disk (because the system lied about data being actually written to disk).

If you have regular backups, and don't mind spending time to restore the VM from backup after a power loss, which you need to do because you won't know what is broken, it might be worth it.

udotirol · Nov 21, 2020

Thx.

What I am interested in is the "written to disk out of order and asynchronously" part.

Obviously, at a certain point of time, the pending data must be written, no matter if a guest's flush command gets ignored or not, for example if the cache is full or once the VM terminates.

Which leads me to the question where this page cache is located. As we do the "cache=unsafe" setting on the virtual block device, I guess that means that the cache is located at the hypervisor side.

So is it right to assume that the cache data is "safe" as long as the hypervisor is up and running?
Or is the cache data at risk too, if I kill -9 the VM that has such a "cache=unsafe" device?

leesteken · Nov 21, 2020

There is caching at multiple levels inside the VM and the host/hypervisor and the physical disk drive.
Killing the VM unexpectedly might cause the writes that it itself cached to be lost. If it shuts down correctly, it will flush the data to the virtual disks but not the physical disks (if cache=unsafe).

If there is a lot of free memory on the host, data might not be flushed to the physical disks until the host is shutting down.
As long as the host is not interrupted unexpectedly by a forced power-off or a general power loss, the data is "safe" in memory.

You cannot really check or control what data is in memory and when it is safely written to disk. It all goes well, the data will eventually be safe.
If anything like a kill, power-off or power loss happens, it will be unclear which writes did not make it to the virtual and/or physical disk.

Edit: It is the breaking of the chain of flushing (at any point) that makes it unpredictable and therefore unsafe. The data could still be in memory somewhere on the way to the physical disk.
This is also the reason that flushing is slow: the data has to reach an actual physical storage (or a cache with a battery) and be reported back up the chain.

Search

Search

implications of qemu "cache=unsafe"

udotirol

Well-Known Member

leesteken

Distinguished Member

udotirol

Well-Known Member

leesteken

Distinguished Member