Deleting several VMs may leave behind Disks

tschmidt

New Member
Oct 11, 2023
9
0
1
I'm working on automatically deploying VMs via the API. To update them I delete the old and clone a new one from an uptodate template. During this I noticed that when deleting several VMs some may leave their disk behind.

I think I had i happen with 5 VMs but for testing this its easier to just use a hundred or so. Then most of em left their disk.

I'm using a 4 Node hyper-converged Ceph cluster running uptodate non-subscriber and call the http API directly from python.
 
Here's a simplified demonstrator (the real code watches the tasks for completion and makes sure it only has a maximum of 5 running, most likely of mixed types like clones, etc.)
 

Attachments

  • pve-del-race.py.gz
    867 bytes · Views: 1
have you checked the task log of one of the VMs that failed to get all its disks removed?
 
No, I only ever checked the status (which said OK), the output indeed reveals that there is a problem:

Code:
trying to acquire cfs lock 'storage-pool1' ...
trying to acquire cfs lock 'storage-pool1' ...
trying to acquire cfs lock 'storage-pool1' ...
trying to acquire cfs lock 'storage-pool1' ...
trying to acquire cfs lock 'storage-pool1' ...
trying to acquire cfs lock 'storage-pool1' ...
trying to acquire cfs lock 'storage-pool1' ...
trying to acquire cfs lock 'storage-pool1' ...
trying to acquire cfs lock 'storage-pool1' ...
Could not remove disk 'pool1:base-101-disk-0/vm-199-disk-0', check manually: cfs-lock 'storage-pool1' error: got lock request timeout
purging VM 199 from related configurations..
TASK OK

I look if I can post the status too. Proxmox disallos cut'n'pasting even the taskid.

EDIT:
JSON:
{
    "data": {
        "type": "qmdestroy",
        "user": "root@pam",
        "pstart": 24283591,
        "exitstatus": "OK",
        "status": "stopped",
        "pid": 1383677,
        "node": "pve1",
        "tokenid": "apitest",
        "id": "199",
        "starttime": 1721635829,
        "upid": "UPID:pve1:00151CFD:017289C7:669E13F5:qmdestroy:199:root@pam!apitest:"
    }
}
 
Last edited:
yeah, this is caused by contention on the storage layer. probably those should be more visible in the task status by being properly logged as warnings.
 
definitely, but even better would be higher timeouts and/or retries

but thanks for your help, now at least I have an easier time trying to work around it
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!