snapshot + lvm

Kosh

Well-Known Member
Dec 24, 2019
104
12
58
46
There's a LVM + snapshots. This is a partition connected via SAN.
The VM has a snapshot, but for some reason I can't delete it.
Code:
delete qemu external snapshot

delete first snapshot upd

block-commit current to base:upd

commit-drive-scsi1: transferred 63.6 MiB of 962.3 GiB (0.01%) in 10s

commit-drive-scsi1: transferred 772.9 MiB of 962.3 GiB (0.08%) in 11s

commit-drive-scsi1: transferred 1.4 GiB of 962.3 GiB (0.15%) in 12s

commit-drive-scsi1: transferred 2.1 GiB of 962.3 GiB (0.22%) in 13s

.....

.....

commit-drive-scsi1: transferred 845.9 GiB of 962.3 GiB (87.90%) in 20m 11s

commit-drive-scsi1: transferred 846.6 GiB of 962.3 GiB (87.98%) in 20m 12s

commit-drive-scsi1: transferred 847.3 GiB of 962.3 GiB (88.05%) in 20m 13s

commit-drive-scsi1: Cancelling block job

commit-drive-scsi1: Done.

TASK ERROR: Failed to complete block commit: block job (commit) error: commit-drive-scsi1: 'commit' has been cancelled


qm config
Code:
qm config 2090
agent: 1
boot: order=scsi0;scsi1;ide2;net0
cores: 16
cpu: x86-64-v2-AES
description:
ide2: none,media=cdrom
memory: 98304
meta: creation-qemu=7.1.0,ctime=1692033904
name: GitFlex
net0: virtio=00:50:56:b5:29:2a,bridge=vmbr206,firewall=1
numa: 0
ostype: l26
parent: upd
scsi0: Barracuda-VM002p:vm-2090-disk-1.qcow2,aio=native,size=700G
scsi1: Barracuda-VM002p:vm-2090-disk-0.qcow2,aio=native,backup=0,size=1020G
scsihw: virtio-scsi-single
smbios1: uuid=6094ab64-3772-4ee2-957b-bdb26cf66898
sockets: 1
tags: dit
vmgenid: 264df2cf-898c-418a-9c76-672536856b95

log
Code:
pvedaemon[3610003]: VM 2090 qmp command failed - VM 2090 qmp command 'block-job-cancel' failed - Block job 'commit-drive-scsi1' not found
 
Last edited:
Just to make sure you didnt cancel the block job yourself? Do you see any additional information in the journal?
 
I couldn't delete it even with the VM turned off.
I had to clone it and delete the original.

Code:
vm-2090-disk-0.qcow2: deleting snapshot 'upd' by commiting snapshot 'current'
running 'qemu-img commit /dev/Barracuda-VM002p/vm-2090-disk-0.qcow2'
qemu-img: Block job failed: No space left on device
The state of upd is now invalid. Don't try to clone or rollback it. You can only try to delete it again later
TASK ERROR: error commiting current to upd; command '/usr/bin/qemu-img commit /dev/Barracuda-VM002p/vm-2090-disk-0.qcow2' failed: exit code 1

there was plenty of space
 
Last edited:
Hi,
did you resize the image after the snapshot was taken? If, yes it might be https://bugzilla.proxmox.com/show_bug.cgi?id=7094

Please share the output of
Code:
qemu-img info --output=json --backing-chain /dev/Barracuda-VM002p/vm-2090-disk-0.qcow2
cat /etc/pve/qemu-server/2090.conf
pveversion -v
Make sure that the LV is active for the first command.
 
  • Like
Reactions: Kosh
Hi,
did you resize the image after the snapshot was taken? If, yes it might be https://bugzilla.proxmox.com/show_bug.cgi?id=7094

Please share the output of
Code:
qemu-img info --output=json --backing-chain /dev/Barracuda-VM002p/vm-2090-disk-0.qcow2
cat /etc/pve/qemu-server/2090.conf
pveversion -v
Make sure that the LV is active for the first command.
Thanks for the info, the old VM has already been deleted.
And yes, we expanded the VM's disk after the snapshot was taken.
 
Hi,
did you resize the image after the snapshot was taken? If, yes it might be https://bugzilla.proxmox.com/show_bug.cgi?id=7094

Please share the output of
Code:
qemu-img info --output=json --backing-chain /dev/Barracuda-VM002p/vm-2090-disk-0.qcow2
cat /etc/pve/qemu-server/2090.conf
pveversion -v
Make sure that the LV is active for the first command.
I can see that there is a patch available and that it got implemented a few days ago.
https://git.proxmox.com/?p=qemu-server.git;a=commit;h=30e41d065623380cd4106f0524326c4da75b1cc9
Just waiting for a version bump I assume. This will probably take a while to arrive in the pve-enterprise repo, correct?

A customer of ours currently encounters this exact issue.
He created a snapshot and expanded the volume afterwards. Now he tries to delete this snapshot.

See attached files for current error, and qm conf.

I assume he only needs to update and the snapshot deletion should work?
Or how does one proceed in this case?

Is there any quick fix he could do now, without waiting for the version bump and release to pve-enterprise repo?

Thanks!

P.s. the customer tried to fetch the qemu-img info, but nothing returned. Inactive LV?
 

Attachments

Last edited:
Hi @Khensu,
I can see that there is a patch available and that it got implemented a few days ago.
https://git.proxmox.com/?p=qemu-server.git;a=commit;h=30e41d065623380cd4106f0524326c4da75b1cc9
Just waiting for a version bump I assume. This will probably take a while to arrive in the pve-enterprise repo, correct?
yes, the fix is not packaged yet and will be included in the next bump.

I assume he only needs to update and the snapshot deletion should work?
Or how does one proceed in this case?
I think it won't work after the failure, even with the fix, because the qcow2 header already has been updated with the new size. So manual resize of the LV is necessary first.

Is there any quick fix he could do now, without waiting for the version bump and release to pve-enterprise repo?
While the VM is shut down, resize the snapshot LV:
Code:
# qemu-img measure --output=json --size 300G -O qcow2
{
    "required": 49414144,
    "fully-allocated": 322171961344
}
Use the fully-allocated size:
Code:
lvresize VG/snap_vm-XYZ-disk-N_SNAPSHOT.qcow2 -L 322171961344b
Then it should be possible to delete the snapshot.

P.s. the customer tried to fetch the qemu-img info, but nothing returned. Inactive LV?
What was the exact command? The customer can check with lvs. If the VM is not running, it needs to be activated manually with lvchange.
 
Hi @fiona,

thank you for the quick reply.

What was the exact command? The customer can check with lvs. If the VM is not running, it needs to be activated manually with lvchange.
Thanks! Now it worked. See attached the output.

The VM currently cannot be started, as it is locked.

Code:
ask started by HA resource agent
TASK ERROR: VM is locked (snapshot-delete)
As /dev/99_Proxmox01/vm-198-disk-1.qcow2 got extended, I assume we need to perform following command:

lvresize /dev/99_Proxmox01/snap_vm-198-disk-1_before_update_02_04_26.qcow2 -L 322171961344b

But I noticed that the virtual-size of the original /dev/99_Proxmox01/vm-198-disk-1.qcow2 (322122547200 or 322172878848? There are two.) do not match the calculated 322171961344 value. Is that still correct?

After resizing, we can simply try to delete the snapshot again?

qm delsnapshot 198 before_update_02_04_26

We might need to unlock the VM before, or not?

qm unlock 198

Thanks!
 

Attachments

Last edited:
As /dev/99_Proxmox01/vm-198-disk-1.qcow2 got extended, I assume we need to perform following command:

lvresize /dev/99_Proxmox01/snap_vm-198-disk-1_before_update_02_04_26.qcow2 -L 322171961344b

But I noticed that the virtual-size of the original /dev/99_Proxmox01/vm-198-disk-1.qcow2 (322122547200 or 322172878848? There are two.) do not match the calculated 322171961344 value. Is that still correct?
Yes, the LV needs to be larger to fit the qcow2 metadata and LVM rounds up to a full extent when allocating. That's where the mismatch comes in. You need the result from the measure command to account for the metadata.
After resizing, we can simply try to delete the snapshot again?

qm delsnapshot 198 before_update_02_04_26

We might need to unlock the VM before, or not?

qm unlock 198
Yes.