Deleting several VMs may leave behind Disks

tschmidt · Jul 22, 2024

I'm working on automatically deploying VMs via the API. To update them I delete the old and clone a new one from an uptodate template. During this I noticed that when deleting several VMs some may leave their disk behind.

I think I had i happen with 5 VMs but for testing this its easier to just use a hundred or so. Then most of em left their disk.

I'm using a 4 Node hyper-converged Ceph cluster running uptodate non-subscriber and call the http API directly from python.

tschmidt · Jul 22, 2024

Here's a simplified demonstrator (the real code watches the tasks for completion and makes sure it only has a maximum of 5 running, most likely of mixed types like clones, etc.)

fabian · Jul 22, 2024

have you checked the task log of one of the VMs that failed to get all its disks removed?

tschmidt · Jul 22, 2024

No, I only ever checked the status (which said OK), the output indeed reveals that there is a problem:

Code:

trying to acquire cfs lock 'storage-pool1' ...
trying to acquire cfs lock 'storage-pool1' ...
trying to acquire cfs lock 'storage-pool1' ...
trying to acquire cfs lock 'storage-pool1' ...
trying to acquire cfs lock 'storage-pool1' ...
trying to acquire cfs lock 'storage-pool1' ...
trying to acquire cfs lock 'storage-pool1' ...
trying to acquire cfs lock 'storage-pool1' ...
trying to acquire cfs lock 'storage-pool1' ...
Could not remove disk 'pool1:base-101-disk-0/vm-199-disk-0', check manually: cfs-lock 'storage-pool1' error: got lock request timeout
purging VM 199 from related configurations..
TASK OK

I look if I can post the status too. Proxmox disallos cut'n'pasting even the taskid.

EDIT:

JSON:

{
    "data": {
        "type": "qmdestroy",
        "user": "root@pam",
        "pstart": 24283591,
        "exitstatus": "OK",
        "status": "stopped",
        "pid": 1383677,
        "node": "pve1",
        "tokenid": "apitest",
        "id": "199",
        "starttime": 1721635829,
        "upid": "UPID:pve1:00151CFD:017289C7:669E13F5:qmdestroy:199:root@pam!apitest:"
    }
}

fabian · Jul 22, 2024

yeah, this is caused by contention on the storage layer. probably those should be more visible in the task status by being properly logged as warnings.

tschmidt · Jul 22, 2024

definitely, but even better would be higher timeouts and/or retries

but thanks for your help, now at least I have an easier time trying to work around it

Search

Search

Deleting several VMs may leave behind Disks

tschmidt

New Member

tschmidt

New Member

Attachments

fabian

Proxmox Staff Member

tschmidt

New Member

fabian

Proxmox Staff Member

tschmidt

New Member