Proxmox cloning corrupts the target image - extensive analysis included

santa_on_a_sledge

New Member
Dec 11, 2025
2
0
1
Hello,

I believe I have uncovered a bug related to cloning (either via the Web UI or via `qm clone` command on the node).

Packages installed:

Code:
proxmox-ve: 9.1.0 (running kernel: 6.17.2-1-pve)
pve-manager: 9.1.1 (running version: 9.1.1/42db4a6cf33dac83)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.17.2-1-pve-signed: 6.17.2-1
proxmox-kernel-6.17: 6.17.2-1
ceph-fuse: 19.2.3-pve1
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.3.1-1+pve4
ifupdown: not correctly installed
ifupdown2: 3.3.0-1+pmx11
intel-microcode: 3.20250812.1~deb13u1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.4
libpve-apiclient-perl: 3.4.2
libpve-cluster-api-perl: 9.0.7
libpve-cluster-perl: 9.0.7
libpve-common-perl: 9.0.15
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.5
libpve-network-perl: 1.2.3
libpve-rs-perl: 0.11.3
libpve-storage-perl: 9.1.0
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-3
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.0.20-1
proxmox-backup-file-restore: 4.0.20-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.2.1
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.3
proxmox-widget-toolkit: 5.1.2
pve-cluster: 9.0.7
pve-container: 6.0.18
pve-docs: 9.1.1
pve-edk2-firmware: not correctly installed
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.4
pve-firmware: 3.17-2
pve-ha-manager: 5.0.8
pve-i18n: 3.6.2
pve-qemu-kvm: 10.1.2-4
pve-xtermjs: 5.5.0-3
qemu-server: 9.1.0
smartmontools: 7.4-pve1
spiceterm: 3.4.1
swtpm: 0.8.0+pve3
vncterm: 1.9.1
zfsutils-linux: 2.3.4-pve1

I have configured a LVM-thin storage which is backed by a single 4096 byte sector drive. Fdisk on the node confirms that this size is being used.

I proceeded to create and install a linux VM, using this LVM-thin storage. By default proxmox opts to use 512 emulation inside the guests. Therefore the installer creates an install that uses 512 B sectors, since thats what was exposed to it.

This is what the VM config looks like:
Code:
agent: 1
bios: ovmf
boot: order=scsi0;net0
cores: 4
cpu: host
efidisk0: nvme-lvm-thin:vm-9000-disk-0,efitype=4m,ms-cert=2023,pre-enrolled-keys=1,size=4M
kvm: 1
machine: q35
memory: 4096
meta: creation-qemu=10.1.2,ctime=1765414133
name: ubuntu-golden-22.04
net0: virtio=42:A1:B3:9C:4A:0A,bridge=vmbr0
numa: 0
onboot: 0
ostype: l26
scsi0: nvme-lvm-thin:vm-9000-disk-1,backup=0,cache=none,discard=on,iothread=1,replicate=0,size=15G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=9998fcdc-2470-43b4-afb1-2579722d5538
sockets: 1
vmgenid: 2ad01187-0996-4e96-b83a-e63c83ca510b

Everything works well on this VM. Now when I shut it down completely and proceed to clone it and start the new cloned guest without changing anything, it fails to boot.

The reason is that the cloned image is being corrupted in the process. I suspect the reason behind this corruption is the sector size discrepancy. Because the proxmox node sees the associated disk's LV with 4096 byte sector size, but at the same time the actual data on that LV are layed out according to what the guest VM saw - 512 byte sectors.

Indeed, during the time both VMs were shut down and the cloned VM was not yet started, this command clearly says images differ:

Code:
# qemu-img compare /dev/proxmox_vg0/vm-9000-disk-1 /dev/proxmox_vg0/vm-1001-disk-1
Content mismatch at offset 4096!

When I started the freshly cloned VM, i couldnt boot. I suspect something during the cloning touched the partition table and rewrote it according to the proxmox host geometry - with 4096 sector size in mind. !!But lets not forget that the cloned guest VM instead sees 512 B sector size.


So I shut down all VMs and tried to perform manual bit copy of the drive:
Code:
dd if=/dev/proxmox_vg0/vm-9000-disk-1 of=/dev/proxmox_vg0/vm-1001-disk-1 bs=64K conv=noerror,sync status=progress

And then verified with:

Code:
# qemu-img compare /dev/proxmox_vg0/vm-9000-disk-1 /dev/proxmox_vg0/vm-1001-disk-1
Images are identical.

After that, the cloned VM successfully booted and everything worked, however it required this manual step.

______

Note, this is how the source LV looks from the *proxmox host* side:

Code:
# fdisk -l /dev/proxmox_vg0/vm-9000-disk-1
GPT PMBR size mismatch (31457279 != 3932159) will be corrected by write.
Disk /dev/proxmox_vg0/vm-9000-disk-1: 15 GiB, 16106127360 bytes, 3932160 sectors
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 131072 bytes / 131072 bytes
Disklabel type: dos
Disk identifier: 0x00000000

Device                            Boot Start     End Sectors Size Id Type
/dev/proxmox_vg0/vm-9000-disk-1p1          1 3932159 3932159  15G ee GPT

Partition 1 does not start on physical sector boundary.


and the mapped device from the source VM guest:

Code:
$ sudo fdisk -l /dev/sda
Disk /dev/sda: 15 GiB, 16106127360 bytes, 31457280 sectors
Disk model: QEMU HARDDISK   
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: ED066473-BFF8-4803-8FD5-FB16BE2AD585

Device       Start      End  Sectors  Size Type
/dev/sda1     2048  1466367  1464320  715M EFI System
/dev/sda2  1466368  5136383  3670016  1.8G Linux filesystem
/dev/sda3  5136384 31455231 26318848 12.5G Linux filesystem

This is the target VM LV right after clone from the *proxmox host* side:

Code:
# fdisk -l /dev/proxmox_vg0/vm-1001-disk-1
GPT PMBR size mismatch (31457279 != 3932159) will be corrected by write.
Disk /dev/proxmox_vg0/vm-1001-disk-1: 15 GiB, 16106127360 bytes, 3932160 sectors
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 131072 bytes / 131072 bytes
Disklabel type: dos
Disk identifier: 0x00000000

Device                            Boot Start     End Sectors Size Id Type
/dev/proxmox_vg0/vm-1001-disk-1p1          1 3932159 3932159  15G ee GPT

Partition 1 does not start on physical sector boundary.

DIff:

Code:
# cmp -l /dev/proxmox_vg0/vm-9000-disk-1 /dev/proxmox_vg0/vm-1001-disk-1 | head -n 20
               4097   0 300
               4098   0 270
               4099   0 367
               4100   0 264
               4101   0 263
               4102   0 367
               4103   0 232
               4104   0  71
               4105   0 227
               4106   0 164
               4107   0 254
               4108   0 177
               4109   0  30
               4110   0 251
               4111   0 130
               4112   0 163
               4113   0 100
               4114   0 160
               4115   0  30
               4116   0 255


I hope this issue can be resolved.
 
The reason is that the cloned image is being corrupted in the process. I suspect the reason behind this corruption is the sector size discrepancy. Because the proxmox node sees the associated disk's LV with 4096 byte sector size, but at the same time the actual data on that LV are layed out according to what the guest VM saw - 512 byte sectors.

cloning the disk does not care about its contents at all..

I have configured a LVM-thin storage which is backed by a single 4096 byte sector drive. Fdisk on the node confirms that this size is being used.

is this a physical disk? or iSCSI? or ? please provide more data!

could you also post "lvmconfig --typeconfig full | grep -e thin -e zero" and "pveversion -v", thanks!
 
It's a physical NVME disk on the proxmox node connected via PCIe4.
.

Here's lvmconfig:
Code:
# lvmconfig --typeconfig full | grep -e thin -e zero
        wipe_signatures_when_zeroing_new_lvs=1
        thin_pool_metadata_require_separate_pvs=0
        thin_pool_crop_metadata=0
        thin_pool_zero=1
        thin_pool_discards="passdown"
        thin_pool_chunk_size_policy="generic"
        zero_metadata=1
        sparse_segtype_default="thin"
        thin_check_executable="/usr/sbin/thin_check"
        thin_dump_executable="/usr/sbin/thin_dump"
        thin_repair_executable="/usr/sbin/thin_repair"
        thin_restore_executable="/usr/sbin/thin_restore"
        thin_check_options=["-q","--clear-needs-check-flag"]
        thin_repair_options=[""]
        thin_restore_options=[""]
        thin_pool_autoextend_threshold=100
        thin_pool_autoextend_percent=20
        thin_library="libdevmapper-event-lvm2thin.so"
        thin_command="lvm lvextend --use-policies"

The output of "pveversion -v" is already part of the post - the first code block at the start.