Corrupted qcow2 images

anders_eken

New Member
Mar 13, 2024
6
0
1
We use Proxmox Backup Server and Client v3.3.1 to backup our KVM guests. These use qcow2 for the disks with ext4 as file system.

To ensure that the backed up images are valid, the backed up disks are mounted via proxmox-backup-client map after uploading to the Proxmox Backup Server and checked with qemu-img check.

We have now noticed that some of the uploaded disk-images are corrupt:

Code:
qemu-img: Could not open '/dev/loop0': Could not read L1 table: Input/output error

We investigated further and found that the image is indeed corrupt. Even after various repair attempts, the data could only be partially restored. To get to the bottom of the matter and rule out other causes, we shut down the VM and backed up the disk directly. Verification of the original source image before backup was successful, but the verification of the backed up image failed.

Configuration of the disk:

XML:
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none' discard='unmap'/>
      <source file='/path/to/disk.qcow2'/>
      <target dev='sda' bus='scsi'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>


We were also able to reproduce the behavior on a hypervisor with Debian 12 and Ubuntu 24.04.

In the course of further analysis, we discovered that images from freshly provisioned KVM guests were primarily affected. This means that the following behavior can now be reliably reproduced (we are able to provide a valid qcow2 image that can safely reproduce the error) :

1. Create a new KVM Guest
2. Backup the KVM Guest
3. qemu-check fails

Workaround:

1. Create a new KVM Guest
2. Write data in the KVM guest (e.g. with 'dd if=/dev/urandom of=/tmp/testfile bs=1M count=2048')
3. Backup the KVM Guest
4. qemu-check works

Is this behavior known? What is the cause of it? Thank you in advance!
 
Hi and thanks for the report.

However, I can not reproduce this issue on my local setup. Here is what I tested for completeness:
Code:
qemu-img create -f qcow2 test.img 20G
proxmox-backup-client backup test.img:./test.img
proxmox-backup-client map <snapshot> test.img
qemu-img check /dev/loop0

# Result:
No errors were found on the image.
Image end offset: 262144

So please provide more information:
1. Create a new KVM Guest
What tooling did you use and which version?
2. Backup the KVM Guest
What exact command did you use to backup the guest?
2. Write data in the KVM guest (e.g. with 'dd if=/dev/urandom of=/tmp/testfile bs=1M count=2048')
What guest OS was used and what is mounted on /tmp?

Configuration of the disk:

<disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none' discard='unmap'/> <source file='/path/to/disk.qcow2'/> <target dev='sda' bus='scsi'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk>
This does not look like related to Proxmox tooling config, so what was used? Is this libvirt or the like?

Also, if you are doing a backup of a running system, you must freeze the guest filesystems using the qemu-guest-agent first, then create a snapshot of the qcow2 image [0], unfreeze the filesystem.

[0] https://wiki.qemu.org/Documentation/CreateSnapshot

Edit: Also I forgot: Did you verify the backup snapshot successfully on PBS after the backup run?
 
Last edited:
Hi and thanks for the report.
Hi Chris! Thank you for your reply!

However, I can not reproduce this issue on my local setup. Here is what I tested for completeness:
Code:
qemu-img create -f qcow2 test.img 20G
proxmox-backup-client backup test.img:./test.img
proxmox-backup-client map <snapshot> test.img
qemu-img check /dev/loop0

# Result:
No errors were found on the image.
Image end offset: 262144

So please provide more information:

What tooling did you use and which version?
We use a libvirt based setup (without Proxmox PVE) with qemu 8.2.2 and libvirt 10.0.0.0 on Ubuntu 24.04. On Debian 12 qemu 7.2 and libvirt 9.0.0.0

The provisioning is based on a qcow2 image that was generated with Hashicorp Packer.

What exact command did you use to backup the guest?
The backup is done with this command:
Bash:
FILESTOPUSH="qemu-server.conf:/path/to/dumped/libvirt.xml sdz.img:/path/to/cloudinit/config.iso sda.img:/path/to/disk/sda.qcow2"
proxmox-backup-client backup ${FILESTOPUSH} --backup-type vm --backup-id ${ENTITYTOPUSH} --rate ${PBC_MAXUPLOAD_RATE:-100000000} --keyfile ${PBS_CERTIFICATE} --backup-time ${UPLOADTIMESTAMP}

What guest OS was used and what is mounted on /tmp?
The affected guests are running Debian 12. The /tmp folder in the example vm is located in the root partition /dev/sda1 with ext4 (no tmpfs). Guest-agent is running.
This does not look like related to Proxmox tooling config, so what was used? Is this libvirt or the like?
Yes, libvirt. The output comes from virsh dumpxml <domain>
Also, if you are doing a backup of a running system, you must freeze the guest filesystems using the qemu-guest-agent first, then create a snapshot of the qcow2 image [0], unfreeze the filesystem.
On running vms, we do a
virsh snapshot-create-as --quiesce --disk-only --atomic --domain ${ENTITYTOBACKUP} --name ${SNAPSHOTNAME} ${DISKSPEC};
first.

With --quiesce libvirt will try to freeze and unfreeze the guest virtual machine’s mounted file system(s), using the guest agent.

We initially suspected that there was a connection. For further tests, we therefore shut down the VMs before the backup in order to rule out any connection with this.

[0] https://wiki.qemu.org/Documentation/CreateSnapshot

Edit: Also I forgot: Did you verify the backup snapshot successfully on PBS after the backup run?
Yes. The PBS snapshot verification is always successful, regardless of whether the validation with qemu-img check is successful or not.

As mentioned above, we can provide a qcow2 image that reliably reproduces the issue.
 
Hi,
thanks for providing more details. Helps to get the bigger picture ;)

Can you try to calculate and compare the sha256sum of the qcow2 files for the following states:
  1. The qcow2 source file before starting the backup process with proxmox-backup-client
  2. The qcow2 source after the proxmox-backup-client run finished
  3. The qcow2 after restoring from a backup snapshot with proxmox-backup-client restore ...
Also, did you try to restore the snapshot from backup and then run a qemu-img check on the qcow2 file. So one can exclude issues with the loop device map?

As mentioned above, we can provide a qcow2 image that reliably reproduces the issue.
I rather prefer to not take in images from unknown source, but rather help you debug further. I hope you do understand.
 
Hi,
thanks for providing more details. Helps to get the bigger picture ;)

Can you try to calculate and compare the sha256sum of the qcow2 files for the following states:

Hi, thank you, please see the output below:

1. The qcow2 source file before starting the backup process with proxmox-backup-client
Code:
sha256sum sda.qcow2
4d4c90426bf43639855437b83d18bb86afeeb40069b3ae06673474f21841a1bc sda.qcow2
2. The qcow2 source after the proxmox-backup-client run finished
Code:
sha256sum sda.qcow2
4d4c90426bf43639855437b83d18bb86afeeb40069b3ae06673474f21841a1bc sda.qcow2

3. The qcow2 after restoring from a backup snapshot with proxmox-backup-client restore ...
Bash:
proxmox-backup-client restore 'vm/<UUID>/2025-01-24T09:22:37Z' "sda.img" -  --keyfile  ${PBS_CERTIFICATE} > restored.qcow2
sha256sum ./restored.qcow2
4d4c90426bf43639855437b83d18bb86afeeb40069b3ae06673474f21841a1bc  ./restored.qcow2
qemu-img check restored.qcow2
No errors were found on the image.
24705/163840 = 15.08% allocated, 1.38% fragmented, 0.00% compressed clusters
Image end offset: 2961965056

Restoring with proxmox-backup-client restore works. The problem appears to occur with proxmox-backup client map:

Bash:
proxmox-backup-client map <snapshot> test.img

sha256sum /dev/loop1
sha256sum: /dev/loop1: Input/output error

Copy Image;
cp /dev/loop1 restored.qcow2 (also causes an I/O Error)

sha256sum restored.qcow2
f834330b0ea93e79e557078c50ee6eed9811b833064b55454555465e25309e36 restored.qcow2
 
Well, just realized: the qcow2 is not a raw image, so if you map this as block device, that will not work. This is the same as trying to create a loop device based on your qcow2 file by losetup -f <path-to-qcow2>. That will also not work. You will have to convert it to raw first, if you want to access it like this, by e.g qemu-img convert <path-to-qcow2> <path-to-raw-output>. That you will then be able to setup as loop device via losetup -f <path-to-raw-image>.

So nothing wrong here? Your testing of the qcow2 image corruption via the mapped device will however not work.

Edit: You could convert the image to raw before uploading it to PBS, then the mapping should work and you could run the check. Will however require to convert back to qcow2 on restore, if wanted.
 
Last edited:
Thanks for the quick response. That sounds reasonable.

What I don't understand in this context is the fact that this procedure works in most cases and - as far as we can tell at the moment - only occurs with (almost freshly provisioned) VMs with only a small amount of data. Apart from that, qemu-img check works and the checksums also match. We were also able to create a loopdev with losetup -f restore.qcow2.
 
What I don't understand in this context is the fact that this procedure works in most cases and - as far as we can tell at the moment - only occurs with (almost freshly provisioned) VMs with only a small amount of data
I'm guessing here, but might be that the image is not aligned to the block size: https://linux-kernel-labs.github.io/refs/heads/master/labs/block_device_drivers.html
We were also able to create a loopdev with losetup -f restore.qcow2
Possible, but this is not a block device you can correctly work with? To get a block device for your qcow2 you will have to run something along the lines of qemu-nbd --connect=/dev/nbd0 restore.qcow2. Then you should also be able to access partitions and filesystems, ecc.
 
  • Like
Reactions: anders_eken