qcow2 corruption after snapshot or heavy disk I/O

w3ph · Apr 17, 2017

Alessandro 123 said:
So, this bug only happens when using the "obsolete" and "unsuggested" mode, right ?

That's what I've found - this only impacts virtio, not virtio-scsi. I obviously need to read the forums (or the docs) more often - I didn't realize that virtio's time had passed.

w3ph · Apr 17, 2017

mir said:
There are several benefits from using virtio-scsi instead of virtio:

virtio development has stopped and all efforts are used on virtio-scsi

performance is more or less on par - max 1-2% better with virtio

virtio-scsi passes unmap (trim, discard) through to the disk controller thereby regaining space

virtio-scsi is default for disks in proxmox 4

virtio vs virtio-scsi is more or less like comparing SATA with SAS. virtio-scsi is an intelligent controller

Thanks - excellent info. I'll go back and switch the virtio VMs to scsi next time each needs a reboot. A few percent difference in performance won't be noticeable in my environment and the advantages of scsi are clear.

tafkaz · Jun 23, 2017

Hi,
quick question:
After p2v migration of a win2012 server i wanted to use scsi for the disks as well, but the VM will only boot in with the VD added as IDE0.
Additional SCSI VDs are not even shown in window's device manager.
What would be the correct driver to add for scsi support, and where do i get it.
thanks in advance
Sascha

mir · Jun 23, 2017

see: https://pve.proxmox.com/wiki/Windows_2012_guest_best_practices

tafkaz · Jun 23, 2017

hm, i know that page, but that's still all about virtio drivers. from what i have understood, virtio is not being worked on anymore, so i thought that the complete driver set from fedorapeople would not be the way to go anymore.
Did i get that wrong?

thanks
Sascha

mir · Jun 23, 2017

The driver package contains this driver:

vioscsi

which is the one you are looking for.

tafkaz · Jun 23, 2017

very cool, thank you very much!
cheers Sascha

David Wilson · Dec 26, 2017

Good day guys,

Season's greetings to you.

I can confirm that we experienced the same scary problem after running a snapshot on a VM with qcow2 disk images stored on NFS, presented to the VM as "Virtio SCSI". Others experiencing a similar problem seem to report the problem only occurring when using "Virtio Block" whereas we experienced the problem using "VirtIO SCSI".

I have also posted the same comment at https://forum.proxmox.com/threads/corrupt-filesystem-after-snapshot.32232/ which may be a similar discussion thread?

Antony Street · Jan 9, 2018

David Wilson said:
Good day guys,

Season's greetings to you.

I can confirm that we experienced the same scary problem after running a snapshot on a VM with qcow2 disk images stored on NFS, presented to the VM as "Virtio SCSI". Others experiencing a similar problem seem to report the problem only occurring when using "Virtio Block" whereas we experienced the problem using "VirtIO SCSI".

I have also posted the same comment at https://forum.proxmox.com/threads/corrupt-filesystem-after-snapshot.32232/ which may be a similar discussion thread?

Would it be safer to convert them to raw for now until this is resolved?

dmulk · May 14, 2018

This is still a problem!

fabian · May 15, 2018

dmulk said:
This is still a problem!

which we are unable to reproduce so far. please provide as much details as possible (versions, configs, logs, specifics about your setup, involved NFS server software and config, environment factors like load or outages, ...).

dmulk · Oct 9, 2018

Just updating this thread....seems like there is an update/patch/possible solution here:

https://forum.proxmox.com/threads/corrupt-filesystem-after-snapshot.32232/page-3

mahesh.w · Jul 25, 2023

w3ph said:
Upgraded: pve-manager/4.4-13/7ea56165 (running kernel: 4.4.49-1-pve)

This didn't make a difference:

VM 1, CentOS 7, virtio 500 gb qcow2 disk, gets corrupted 100% of the time when I try to take a snapshot. Sample qemu-img check results below.

VM 2, CentOS 7, virtio 500 gb qcow2 disk, does not get corrupted. I can make multiple snapshots, roll back, etc. with no corruption or leaks detected by qemu-img check

The action that triggers the corruption problem is taking a snapshot. Left alone, I'm seeing no corruption in normal operation.

Now that I understand that I can take snapshots if I move the disk image to lvm-thin or if I create the VM with scsi disk, I can work around this, but it's a nasty bug if you aren't expecting it. I'm not sure when this started to happen since we don't take snapshots all that often, but it didn't happen with Proxmox 3.x and is happening now with 4.4. I don't know yet whether the VM's OS matters (we've had the problem with CentOS 6 and 7 VMs but that could be sampling error since the majority of the VMs we've been working with have used CentOS 6 or 7). I'll run some tests with Ubuntu VMs when I have a minute.

Here's the damage that happened after taking a snapshot with a CentOS 7 virtio qcow2 VM:

Image end offset: 537288376320
ERROR cluster 7988422 refcount=1 reference=2
ERROR cluster 7996583 refcount=1 reference=2
ERROR cluster 7996616 refcount=1 reference=2
ERROR cluster 7996653 refcount=1 reference=2
ERROR cluster 7998877 refcount=1 reference=2
ERROR cluster 8000268 refcount=1 reference=2
ERROR cluster 8000269 refcount=1 reference=2
ERROR cluster 8001931 refcount=1 reference=2
ERROR cluster 8001990 refcount=1 reference=2
ERROR cluster 8021195 refcount=1 reference=2
ERROR cluster 8029356 refcount=1 reference=2
ERROR cluster 8029357 refcount=1 reference=2
ERROR cluster 8029358 refcount=1 reference=2
ERROR cluster 8029359 refcount=1 reference=2
ERROR cluster 8029362 refcount=1 reference=2
ERROR cluster 8029878 refcount=1 reference=2
ERROR cluster 8032113 refcount=1 reference=2
ERROR cluster 8032137 refcount=1 reference=2
ERROR cluster 8032141 refcount=1 reference=2
ERROR cluster 8032147 refcount=1 reference=2
ERROR cluster 8032151 refcount=1 reference=2
ERROR cluster 8033532 refcount=1 reference=2
ERROR cluster 8033759 refcount=1 reference=2
ERROR cluster 8034310 refcount=1 reference=2
ERROR cluster 8034311 refcount=1 reference=2
ERROR cluster 8034312 refcount=1 reference=2
ERROR cluster 8034313 refcount=1 reference=2
ERROR cluster 8034459 refcount=1 reference=2
ERROR cluster 8034460 refcount=1 reference=2
ERROR cluster 8037582 refcount=1 reference=2
ERROR cluster 8037619 refcount=1 reference=2
ERROR cluster 8037620 refcount=1 reference=2
ERROR cluster 8037621 refcount=1 reference=2
ERROR cluster 8041788 refcount=1 reference=2
ERROR cluster 8064589 refcount=1 reference=2
ERROR cluster 8064590 refcount=1 reference=2
ERROR OFLAG_COPIED L2 cluster: l1_index=975 l1_entry=79e4c60000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7a04a70000 refcount=1
ERROR OFLAG_COPIED L2 cluster: l1_index=976 l1_entry=7a04c80000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7a04ed0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7a0d9d0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7a130c0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7a130d0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7a198b0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7a19c60000 refcount=1
ERROR OFLAG_COPIED L2 cluster: l1_index=979 l1_entry=7a64cb0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7a84ac0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7a84ad0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7a84ae0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7a84af0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7a84b20000 refcount=1
ERROR OFLAG_COPIED L2 cluster: l1_index=981 l1_entry=7aa4ce0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7aa4f30000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7aa4f40000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7aa4f50000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7aa4f60000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7aa4f70000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7aa4f80000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7ab53c0000 refcount=1
ERROR OFLAG_COPIED L2 cluster: l1_index=1007 l1_entry=7de4ee0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7e04cf0000 refcount=1
ERROR OFLAG_COPIED L2 cluster: l1_index=1008 l1_entry=7e04f00000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7e05120000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7e05190000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7e05330000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7e07940000 refcount=1

66 errors were found on the image.
Data may be corrupted, or further writes to the image may corrupt it.
8388608/8388608 = 100.00% allocated, 0.00% fragmented, 0.00% compressed clusters
Image end offset: 549840879616

i am getting same error... What i can do to resolve this ?

Search

Search

qcow2 corruption after snapshot or heavy disk I/O

w3ph

Renowned Member

w3ph

Renowned Member

tafkaz

Renowned Member

mir

Famous Member

tafkaz

Renowned Member

mir

Famous Member

tafkaz

Renowned Member

David Wilson

Active Member

Antony Street

Active Member

dmulk

Member

fabian

Proxmox Staff Member

dmulk

Member

mahesh.w

New Member

We value your privacy