Upgraded: pve-manager/4.4-13/7ea56165 (running kernel: 4.4.49-1-pve)
This didn't make a difference:
VM 1, CentOS 7, virtio 500 gb qcow2 disk, gets corrupted 100% of the time when I try to take a snapshot. Sample qemu-img check results below.
VM 2, CentOS 7, virtio 500 gb qcow2 disk, does not get corrupted. I can make multiple snapshots, roll back, etc. with no corruption or leaks detected by qemu-img check
The action that triggers the corruption problem is taking a snapshot. Left alone, I'm seeing no corruption in normal operation.
Now that I understand that I can take snapshots if I move the disk image to lvm-thin or if I create the VM with scsi disk, I can work around this, but it's a nasty bug if you aren't expecting it. I'm not sure when this started to happen since we don't take snapshots all that often, but it didn't happen with Proxmox 3.x and is happening now with 4.4. I don't know yet whether the VM's OS matters (we've had the problem with CentOS 6 and 7 VMs but that could be sampling error since the majority of the VMs we've been working with have used CentOS 6 or 7). I'll run some tests with Ubuntu VMs when I have a minute.
Here's the damage that happened after taking a snapshot with a CentOS 7 virtio qcow2 VM:
Image end offset: 537288376320
ERROR cluster 7988422 refcount=1 reference=2
ERROR cluster 7996583 refcount=1 reference=2
ERROR cluster 7996616 refcount=1 reference=2
ERROR cluster 7996653 refcount=1 reference=2
ERROR cluster 7998877 refcount=1 reference=2
ERROR cluster 8000268 refcount=1 reference=2
ERROR cluster 8000269 refcount=1 reference=2
ERROR cluster 8001931 refcount=1 reference=2
ERROR cluster 8001990 refcount=1 reference=2
ERROR cluster 8021195 refcount=1 reference=2
ERROR cluster 8029356 refcount=1 reference=2
ERROR cluster 8029357 refcount=1 reference=2
ERROR cluster 8029358 refcount=1 reference=2
ERROR cluster 8029359 refcount=1 reference=2
ERROR cluster 8029362 refcount=1 reference=2
ERROR cluster 8029878 refcount=1 reference=2
ERROR cluster 8032113 refcount=1 reference=2
ERROR cluster 8032137 refcount=1 reference=2
ERROR cluster 8032141 refcount=1 reference=2
ERROR cluster 8032147 refcount=1 reference=2
ERROR cluster 8032151 refcount=1 reference=2
ERROR cluster 8033532 refcount=1 reference=2
ERROR cluster 8033759 refcount=1 reference=2
ERROR cluster 8034310 refcount=1 reference=2
ERROR cluster 8034311 refcount=1 reference=2
ERROR cluster 8034312 refcount=1 reference=2
ERROR cluster 8034313 refcount=1 reference=2
ERROR cluster 8034459 refcount=1 reference=2
ERROR cluster 8034460 refcount=1 reference=2
ERROR cluster 8037582 refcount=1 reference=2
ERROR cluster 8037619 refcount=1 reference=2
ERROR cluster 8037620 refcount=1 reference=2
ERROR cluster 8037621 refcount=1 reference=2
ERROR cluster 8041788 refcount=1 reference=2
ERROR cluster 8064589 refcount=1 reference=2
ERROR cluster 8064590 refcount=1 reference=2
ERROR OFLAG_COPIED L2 cluster: l1_index=975 l1_entry=79e4c60000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7a04a70000 refcount=1
ERROR OFLAG_COPIED L2 cluster: l1_index=976 l1_entry=7a04c80000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7a04ed0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7a0d9d0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7a130c0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7a130d0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7a198b0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7a19c60000 refcount=1
ERROR OFLAG_COPIED L2 cluster: l1_index=979 l1_entry=7a64cb0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7a84ac0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7a84ad0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7a84ae0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7a84af0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7a84b20000 refcount=1
ERROR OFLAG_COPIED L2 cluster: l1_index=981 l1_entry=7aa4ce0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7aa4f30000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7aa4f40000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7aa4f50000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7aa4f60000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7aa4f70000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7aa4f80000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7ab53c0000 refcount=1
ERROR OFLAG_COPIED L2 cluster: l1_index=1007 l1_entry=7de4ee0000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7e04cf0000 refcount=1
ERROR OFLAG_COPIED L2 cluster: l1_index=1008 l1_entry=7e04f00000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7e05120000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7e05190000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7e05330000 refcount=1
ERROR OFLAG_COPIED data cluster: l2_entry=7e07940000 refcount=1
66 errors were found on the image.
Data may be corrupted, or further writes to the image may corrupt it.
8388608/8388608 = 100.00% allocated, 0.00% fragmented, 0.00% compressed clusters
Image end offset: 549840879616