Snapshots as Volume-Chain Creates Large Snapshot Volumes

For further testing, we tried with the saferemove option on, and the same problem occurs with the timeout after discards.

Code:
zero-out data on image vm-102-disk-0.qcow2 (/dev/san_test/del-vm-102-disk-0.qcow2)
reduce stepsize to the maximum supported by the storage: 1048576 bytes
/dev/san_test/del-vm-102-disk-0.qcow2: Zero-filled 1935671296 bytes from the offset 0
---TRUNCATED---
/dev/san_test/del-vm-102-disk-0.qcow2: Zero-filled 743440384 bytes from the offset 1098923376640
TASK ERROR: lvremove 'san_test/del-vm-102-disk-0.qcow2' error: 'storage-VMs'-locked command timed out - aborting

The lvremove command is still running in the background and does complete successfully. While the lvremove is still running, 'lvs' fails to return as the VG is locked still. So it seems that the Proxmox lock is set to 60s but the actual task continues beyond this and successfully completes, but then the follow up tasks (ie lvrename) are not attempted because Proxmox has 'failed' the task.

Poking a bit further, we disabled zeroing feature again, and disabled the 'issue_discards' flag in lvm.conf and repeated the test, the snapshot is deleted succesfully nearly instantly. So it appears that the discards operation in lvremove signifigantly increases the time that lvremove takes.
 
Last edited:
that likely means that your storage doesn't offer efficient discards. this is a hard problem to solve, as the 60s are a cluster-wide hard-coded limit, so it's not easy to change. and also blocking the whole storage w.r.t. operations modifying LVM (allocation, renames, removal of volumes or snapshots) for long stretches of times would cause issues as soon as you do multiple things in parallel.