Possible bug with qemu-img zeroinit and LVM

yobert · Feb 2, 2026

Hello there! After upgrading recently to PVE 9.1.4, we have started seeing an interesting import failure. We're importing from the debian 13 cloud image (stored with directory storage) into a new VM under LVM storage. I've tracked it down to what looks like a bug in the zeroinit patch to qemu. When proxmox tries to import the image via this command:

/usr/bin/qemu-img convert -p -n -f raw -O raw /dev/vm-vg/vm-118-disk-0 zeroinit:/dev/vm-vg/vm-120-disk-0

It ends up with a corrupted partition table. But if I manually do this command, it works:

/usr/bin/qemu-img convert -p -n -f raw -O raw /dev/vm-vg/vm-118-disk-0 /dev/vm-vg/vm-120-disk-0

I can supply more troubleshooting information if anyone is interested. Mostly I'm just curious if anyone else has hit this or if it's just me?

EDIT: This has nothing to do with the version upgrade--- that was just random chance. This has to do with existing bytes on the hard drive.

yobert · Feb 4, 2026

I've tracked this down further! I think there is a bug with how proxmox is using LVM and then the zeroinit block filter in qemu-img.

proxmox is creating the VM storage with these commands:

/sbin/lvcreate -aly -V 4096k --name vm-107-cloudinit --thinpool vm-vg/vm-pool
/sbin/lvcreate -aly -V 3145728k --name vm-107-disk-0 --thinpool vm-vg/vm-pool

This will NOT fill the newly created volume with zeros. They will have whatever bytes were leftover on your hard drive. (Could be random. Could be real data. Could be an existing partition table.)

Then, proxmox attempts to write the source import image this way:

/usr/bin/qemu-img convert -p -n -f raw -O raw /var/lib/vz/import/debian13-cloud-daily.raw zeroinit:/dev/vm-vg/vm-107-disk-0

Because the initialized disk is NOT filled with zeros, the new image is corrupted. The source image has many ranges that are filled with zeros, and the zeroinit filter correctly skips them.

I suspect the fix is to always pass the argument "--zero y" to lvcreate for thin pool LVM provisioning.

yobert · Feb 4, 2026

I thought about this more and realized--- This is a security issue.

If you delete a VM, and then provision a new VM using an import image with ranges of zeros in it, the newly created VM can contain bytes from the other VM in those ranges. This can share unauthorized data.

leesteken · Feb 4, 2026

yobert said:
I thought about this more and realized--- This is a security issue.

I cannot comment on what you have found as I know too little about it, but maybe report it here: https://pve.proxmox.com/wiki/Security_Reporting
EDIT: Or maybe that is only for (externally) exploitable security issues?

yobert · Feb 4, 2026

leesteken said:
I cannot comment on what you have found as I know too little about it, but maybe report it here: https://pve.proxmox.com/wiki/Security_Reporting
EDIT: Or maybe that is only for (externally) exploitable security issues?

Thanks! I emailed them! I think it could be a big issue since many proxmox setups are multi-tenant. (One customer on one VM, and another customer on another VM with an expectation of data being private.)

yobert · Feb 4, 2026

Hmmm... Something more complex is going on here because after just running the lvcreate command, it is indeed filled with zeros. But when I diff the newly created volume with the source import image, many of the gaps of the new volume that should be zeros are filled with random crap. Something else is at play here. Looking deeper now.

yobert · Feb 4, 2026

Okay I think the bug is not in how proxmox calls lvcreate-- the bug is in zeroinit. Zeroinit is somehow detecting that the target block file has thin extents and is doing something wrong. Here is how to recreate:

lvcreate -aly -V 3145728k --name bug --thinpool vm-vg/vm-pool

At this point, /dev/vm-vg/bug is 3 GB, and if you inspect it, it appears to be filled with zeros.

/usr/bin/qemu-img convert -p -n -f raw -O raw /var/lib/vz/import/debian13-cloud-daily.raw zeroinit:/dev/vm-vg/bug

At this point, the image is not identical to the source image debian13-cloud-daily.raw. If I use a binary diff tool, I can see huge ranges of zeros in the source image, and random bytes in the destination block device. Somehow zeroinit is treating extents wrong, assuming they have zeros under them, but they don't.

yobert · Feb 4, 2026

My source image is https://cloud.debian.org/images/cloud/trixie/latest/debian-13-genericcloud-amd64.raw but any image with large ranges of zeros should work to recreate the bug.

I am looking at the zeroinit code now to see if I can find the root cause.

bbgeek17 · Feb 4, 2026

Unfortunately, this is a known issue that we pointed out a while back :

https://kb.blockbridge.com/technote.../#data-consistency-and-information-leak-risks

[TUTORIAL] Post in thread 'Inside Proxmox VE 9 SAN Snapshot Support'

Aug 13, 2025

Hi Spirit,

Thank you for your hard work! I especially appreciate how cleanly the feature is integrated, with no excess knobs or tunables. Great job by the whole team!

We're aligned on the QEMU caching. Larger caches can help somewhat, but it's worth considering how the extra memory footprint will interact with chained snapshots.

FWIW, we've had hundreds of inquiries from people with legacy SANs looking for a fix to this exact problem. Most of these SANs are either near EOL or have only modest performance. In those cases, correctness is a far bigger concern than raw speed.

The main...

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

yobert · Feb 4, 2026

Thanks for linking that article! That helps me know what is expected. But this bug is so severe! This isn't even in the event of power loss. This is data corruption for a standard healthy VM provisioning. zeroinit simply doesn't work right. It should probably just be disabled.

bbgeek17 · Feb 4, 2026

yobert said:
Thanks for linking that article! That helps me know what is expected.

You are welcome. You may also find this article interesting as well https://kb.blockbridge.com/technote/proxmox-qemu-cache-none-qcow2

[TUTORIAL] Thread 'Understanding QCOW2 Risks with QEMU cache=none in Proxmox'

Nov 13, 2025

Hey everyone,

A few recent developments prompted us to examine QCOW2’s behavior and reliability characteristics more closely:

1. Community feedback

There are various community discussions questioning the reliability of QCOW2. We have customers (predating our native integration) interested in using QCOW on LVM.

2. Integrity testing failures with QCOW/LVM snapshots

When we ran our data integrity tests against the tech preview of QCOW2/LVM snapshots, we observed consistent failures starting immediately after the first snapshot was taken.

3. Confusing...

yobert said:
I suspect the fix is to always pass the argument "--zero y" to lvcreate for thin pool LVM provisioning.

I do not believe this option will help you, based on the man page description:

Code:

-Z|--zero y|n
              Controls zeroing of the first 4 KiB of data in the new LV.  Default is y. 
 Snapshot COW volumes are always zeroed.  For thin pools, this controls zeroing of provisioned blocks.  
LV is not zeroed if the read only flag is set.  Warning: trying to mount an unzeroed LV can cause the system to hang.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

yobert · Feb 4, 2026

Ah yeah for a minute I thought --zero was an option for the new thin provisioned volume, but apparently it's a flag for the pool itself only. And I have that set already (proxmox sets that by default). That's why this is such a strange bug: After creating the LVM volume, if I read all the data, it is indeed filled with zeros. LVM is doing the right thing. But after qemu-img does the data copy from the source file to the destination, it corrupts it. The checksum is not the same. Zero ranges on the input file become garbage on the output volume.

bbgeek17 · Feb 4, 2026

yobert said:
That's why this is such a strange bug

TBH, layering QCOW on a newly added LVM extent without manually zeroing it first seems like a recipe for trouble. This approach is unpopular because it slows provisioning, but it is the safer option.

When QCOW is placed on a filesystem, zeroing is implicitly handled by the filesystem through sparse file support. The remaining risk in this case is substantially smaller.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

yobert · Feb 4, 2026

I'm not using QCOW--- This is all raw only. The source image is a raw 3GB image that you can launch directly without conversion.

I finally understand the root of the problem. LVM does something interesting when you make a thin pool with --zero n. A new LVM volume, if the block device is read, will indeed return all zeros. But if you seek to a random position and write a single byte, a bunch of garbage data will appear around the write. It's a really crappy behavior of LVM I think but I'm guessing they had to compromise for other reasons. (Since most things will read the disk, see that it is zeros, and then decide to write a new partition table, it won't matter.) Zeroinit triggers this pathological behavior because it skips around to do its writes, thus rendering the garbage visible, and in my case, provisioning fails, because the visible garbage qualifies as a valid (though wrong) GPT partition table.

So, I guess I'd declare this as a bug in proxmox in the sense that it should really default to create an LVM thin pool with --zero y instead of --zero n. It's the only way to not randomly get garbage at random times.

yobert · Feb 4, 2026

Aha. I think the right fix is to change the proxmox block device attribute detection so that qemu-img will use zeroinit if the thin pool was created with the zero flag attribute, and otherwise it should not use zeroinit. That way it can be optimal when it is safe, but otherwise will do the right thing.

yobert · Feb 5, 2026

Looks like already reported bugs for this, or things along these lines:

https://bugzilla.proxmox.com/show_bug.cgi?id=7204
https://bugzilla.proxmox.com/show_bug.cgi?id=7275

Just linking for posterity. For now I'm going to work around the problem by using zfs. Thanks for reading if you got this far.

Possible bug with qemu-img zeroinit and LVM

yobert

New Member

yobert

New Member

yobert

New Member

leesteken

Distinguished Member

yobert

New Member

yobert

New Member

yobert

New Member

yobert

New Member

bbgeek17

Distinguished Member

[TUTORIAL] Post in thread 'Inside Proxmox VE 9 SAN Snapshot Support'

yobert

New Member

bbgeek17

Distinguished Member

[TUTORIAL] Thread 'Understanding QCOW2 Risks with QEMU cache=none in Proxmox'

yobert

New Member

bbgeek17

Distinguished Member

yobert

New Member

yobert

New Member

yobert

New Member

We value your privacy