[TUTORIAL] Understanding QCOW2 Risks with QEMU cache=none in Proxmox

bbgeek17 · Nov 13, 2025

Hey everyone,

A few recent developments prompted us to examine QCOW2’s behavior and reliability characteristics more closely:

1. Community feedback

There are various community discussions questioning the reliability of QCOW2. We have customers (predating our native integration) interested in using QCOW on LVM.

2. Integrity testing failures with QCOW/LVM snapshots

When we ran our data integrity tests against the tech preview of QCOW2/LVM snapshots, we observed consistent failures starting immediately after the first snapshot was taken.

3. Confusing Documentation

The existing resources documenting the behavior and semantics of the various cache modes lack clarity.

After extensive lab testing, we now have a clear understanding of QEMU and QCOW2 behavior, as well as the inherent risks.

Lessons Learned:

Compared to physical storage devices, QCOW2 exhibits unusual write semantics due to delayed metadata updates.

The integrity issues with LVM snapshots arose from a common misconception that cache=none disables write caching entirely. In reality, this assumption only holds for RAW disks. QEMU/QCOW2 defers and maintains cached metadata structures that remain volatile for much longer than expected, even across guest reboot!

Subcluster allocation in the new snapshot chain feature ("Volume as Snapshot Chains") significantly increases metadata churn. It amplifies the risk of torn writes and data inconsistency after power loss or unplanned guest termination.

Technical Report:

We've published a technical article summarizing what we've learned, including a reproducible experiment that demonstrates the semantics leading to corruption on power loss:

Understanding QCOW2 Risks with QEMU cache=none in Proxmox

Please feel free to ask questions, and we'll do our best to answer. If you spot a gap in our understanding, let us know.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

floh8 · Nov 13, 2025

Thx nice to know

bbgeek17 · Nov 17, 2025

We have received several requests to provide a procedure to reproduce our results. Here are the steps you can use in your lab:

Create a Linux VM (we will use Alpine for a smaller footprint)

Code:

qm create 100 --name vm100 --memory 256 --socket 1 --bootdisk scsi0 --boot c --bootdisk scsi0 \
  --onboot no --scsihw virtio-scsi-single --net0 virtio,bridge=vmbr0,firewall=1 --ide2 local-lvm:cloudinit \ 
  --agent enabled=1 --sshkeys /root/.ssh/authorized_keys --serial0 socket --vga serial0 \ 
  --cicustom meta=local:snippets/alpine.metadata.100.701528,user=local:snippets/alpine.userdata.100.701528 \
  --ipconfig0 ip=dhcp \ 
  --scsi0 local-lvm:0,aio=native,iothread=1,import-from=/mnt/pve/nfs/template/iso/nocloud_alpine-3.22.0-x86_64-bios-cloudinit-r0.qcow2

Create a 3GB "data" disk on LVM storage pool with "Allow Snapshots as Volume-Chain" enabled. Assign the disk to the VM:

Code:

pvesm alloc testvg 100 '' 3G
qm disk rescan --vmid 100
qm set 100 --scsi1 testvg:vm-100-disk-0.qcow2,cache=none

Note that "cache=none" is default and will not be visible if you examine the VM config later.

Boot the VM and record disk names. In our case the "data" disk is /dev/sdb
Fill the disks with ones: doas badblocks -w -b 4096 -p 1 -t 0x11111111 /dev/sdb

Code:

doas hexdump -C -n $((1024*18024)) /dev/sdb
00000000  11 11 11 11 11 11 11 11  11 11 11 11 11 11 11 11  |................|
*
0119a000

Examine disk's data: doas hexdump -C -n $((1024*18024)) /dev/sdb
Simulate hardware or power failure by killing the process: pkill -9 -f 'kvm -id 100'
Boot the VM and confirm that disk's data is as expected (still all ones): doas hexdump -C -n $((1024*18024)) /dev/sdb
Initiate a VM snapshot: qm snapshot 100 snapshot1

snapshotting 'drive-scsi0' (local-lvm:vm-100-disk-0)
Logical volume "snap_vm-100-disk-0_snapshot1" created.
snapshotting 'drive-scsi1' (testvg:vm-100-disk-0.qcow2)
external qemu snapshot
Creating a new current volume with snapshot1 as backing snap
Renamed "vm-100-disk-0.qcow2" to "snap_vm-100-disk-0_snapshot1.qcow2" in volume group "tesvg"
Rounding up size to full physical extent 3.00 GiB
Logical volume "vm-100-disk-0.qcow2" created.
Formatting '/dev/tesvg/vm-100-disk-0.qcow2', fmt=qcow2 cluster_size=131072 extended_l2=on preallocation=metadata compression_type=zlib size=3221225472 backing_file=snap_vm-100-disk-0_snapshot1.qcow2 backing_fmt=qcow2 lazy_refcounts6
blockdev replace current by snapshot1
blockdev-snapshot: reopen current with snapshot1 backing image

Write a new, recognizable, pattern to the disk: doas badblocks -w -b 4096 -p 1 -t 0xDEADBEEF /dev/sdb
Examine the disk to confirm you can read the expected data: doas hexdump -C -n $((1024*18024)) /dev/sdb

Code:

00000000  de ad be ef de ad be ef  de ad be ef de ad be ef  |................|
*
0119a000

Kill the VM: pkill -9 -f 'kvm -id 100'
After restarting the VM - examine the data and observe that it went back to the pre-write state and all new data has been lost:

Code:

doas hexdump -C -n $((1024*18024)) /dev/sdb
00000000  11 11 11 11 11 11 11 11  11 11 11 11 11 11 11 11  |................|
*
0119a000

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

RolandK · Nov 18, 2025

ouch.

Wow, what a great writeup !!! thanks for making/sharing it ! will re-read tomorrow in depth.

Kurgan · Nov 22, 2025

Thanks a lot. I have read quite a lot of your website articles and while I'm not a Blockbridge customer and I'll probably never be (much smaller setup here) I have appreciated your KB articles a lot.

But now a question has arisen: Is LVM-Thin dangerous as QCOW2 is? I mean, does it also never flush metadata writes unless it's instructed to do so by the guest file system?

bbgeek17 · Nov 24, 2025

Kurgan said:
But now a question has arisen: Is LVM-Thin dangerous as QCOW2 is? I mean, does it also never flush metadata writes unless it's instructed to do so by the guest file system?

Hi @Kurgan,

Thanks for the great question. LVM is considerably more sophisticated than QEMU/QCOW, though I'm not an expert in its internal architecture. My assumption is that it uses a mix of demand-based and timer-based flushing. I'll try to dig into this and follow up with more detail after the holiday.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

bbgeek17 · Dec 1, 2025

Hi @Kurgan,

Based on a quick review, LVM-thin appears to offer better durability characteristics than QCOW2 for a few reasons:

LVM-thin stores its metadata in a B-tree and applies updates transactionaly (though not via a traditional journal). This design tends to preserve metadata ordering during writeback, reducing the risk of inconsistent or reordered indirect mappings after a power loss.

LVM-thin also flushes its metadata at regular intervals, keeping the in-flight metadata window small and limiting the amount of state that can be lost during an unexpected shutdown.

Unlike QCOW2, LVM-thin does not implement sub-cluster allocation. Sub-cluster-sized writes are handled through copy-on-write at the block level, which avoids much of the metadata churn associated with QCOW2’s finer-grained allocation. This reduces, although does not entirely eliminate, the likelihood of torn or partially updated data.

In summary, LVM-thin is generally safer than QCOW2 in these failure scenarios, but it still falls short of the guarantees provided by enterprise-grade storage systems.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Kurgan · Dec 2, 2025

Thank you very much for your answer and the time you spent researching it.

bbgeek17 · Dec 3, 2025

Hi @Kurgan,

Your question reminded me of another advantage of LVM: its volatile metadata doesn't reside in a user-land address space, so a process kill can't put it into a state that leads to data loss.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Alwin Antreich · Dec 3, 2025

bbgeek17 said:
misconception that cache=none disables write caching entirely

Here the link to the wiki [0] for anyone that doesn't know about the different cache modes. cache=none , keeps the cache on the storage layer active.

bbgeek17 said:
Kill the VM: pkill -9 -f 'kvm -id 100'

Does the cache=director writethrough change the result?

[0] https://pve.proxmox.com/wiki/Performance_Tweaks

bbgeek17 · Dec 4, 2025

Hi @Alwin Antreich

Good question. I’d expect both cache=direct and cache=writethrough to offer stronger consistency guarantees, but I’ll run some tests and report back.

For what it’s worth, some of the confusion around cache=none stems from the link you shared (we also reference it in our article). It describes how QEMU interacts with storage and explains why a guest advertises a write-back cache, but the reasoning isn’t entirely complete. When QCOW is the backend, the behavior is driven much more by QCOW’s volatile metadata than by the underlying storage device itself.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

bbgeek17 · Jan 8, 2026

Happy New Year! Sorry for the delay on this... we wanted to do a proper analysis.

Both cache=direct and cache=writethrough provide substantially stronger consistency guarantees than cache=none. If you are using QCOW/LVM and consistency is a priority, either mode is a reasonable choice.

That said, there are some subtle and often overlooked differences in how these modes are implemented. Both enforce write-completion ordering for data and metadata by issuing fdatasync operations. The key implementation difference is that cache=writethrough uses blocking system calls (i.e., writev), whereas cache=direct leverages asynchronous I/O (i.e., native aio / io_uring).

In theory, cache=writethrough allows for write combining and read caching, while cache=direct offers better concurrency. From a data-consistency standpoint, however, both modes are effectively equivalent, subject to the native ordering guarantees provided by the QCOW format.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Search

Search

[TUTORIAL] Understanding QCOW2 Risks with QEMU cache=none in Proxmox

bbgeek17

Distinguished Member

floh8

Renowned Member

bbgeek17

Distinguished Member

RolandK

Famous Member

Kurgan

Well-Known Member

bbgeek17

Distinguished Member

bbgeek17

Distinguished Member

Kurgan

Well-Known Member

bbgeek17

Distinguished Member

Alwin Antreich

Renowned Member

bbgeek17

Distinguished Member

bbgeek17

Distinguished Member

We value your privacy