TASK ERROR: clone failed: disk image '/var/lib/vz/images/10088/vm-10088-cloudinit.qcow2' already exists

rootnetworks · Jul 19, 2022

Hello,

Today I see After destroy a vm then trying to clone a new VM take the same vmid got this error, looks like when destroy a vm did not cleanup the old vm directory.

create full clone of drive ide0 (local:126/vm-126-cloudinit.qcow2)
TASK ERROR: clone failed: disk image '/var/lib/vz/images/10088/vm-10088-cloudinit.qcow2' already exists

/var/lib/vz/images# ls 10088
vm-10088-cloudinit.qcow2

Recently I migrated this VM to node 2 from node 1 in the cluster Does it have anything to do with this?

Code:

proxmox-ve: 7.1-1 (running kernel: 5.13.19-6-pve)
pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe)
pve-kernel-helper: 7.1-12
pve-kernel-5.13: 7.1-9
pve-kernel-5.13.19-6-pve: 5.13.19-14
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.1
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-6
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-3
libpve-guest-common-perl: 4.1-1
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.1-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-2
proxmox-backup-client: 2.1.5-1
proxmox-backup-file-restore: 2.1.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-7
pve-cluster: 7.1-3
pve-container: 4.1-4
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-5
pve-ha-manager: 3.3-3
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.1-2
pve-xtermjs: 4.16.0-1
qemu-server: 7.1-4
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.2-pve1

Thank you

Tuxington · Aug 1, 2022

I ran into the same error myself in the UI:

Code:

update VM 9998: -description   -ide0 local:cloudinit -name packer-img-rhel8
TASK ERROR: disk image '/var/lib/vz/images/9998/vm-9998-cloudinit.qcow2' already exists

Simply removing the cloudinit.qcow2 files allowed me to attach a cloud-init drive to the templates again.

I remove the templates with the following loop. I'm not sure if there needs to be a delay in between calls to make this happy? After reviewing my node this appears to be the first time I've ran into this issue:

Code:

for i in $(seq 9996 1 9999); do
  echo "Deleting Template ID: ${i}"
  curl -k -X DELETE -H 'Authorization: PVEAPIToken=packer-api@pve!pkr=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX' https://<host>/api2/json/nodes/proxmox/qemu/${i}
  echo -e "\n"
done

It only seems to have affected 2/4 templates that I tried to delete:

/var/log/syslog:

Code:

Jul 31 02:00:02 proxmox pvedaemon[2555226]: <packer-api@pve!pkr> starting task UPID:proxmox:001FAD18:00EB77D5:62E62872:qmdestroy:9996:packer-api@pve!pkr:
Jul 31 02:00:02 proxmox pvedaemon[2075928]: destroy VM 9996: UPID:proxmox:001FAD18:00EB77D5:62E62872:qmdestroy:9996:packer-api@pve!pkr:
Jul 31 02:00:02 proxmox pvedaemon[1562523]: <packer-api@pve!pkr> starting task UPID:proxmox:001FAD22:00EB77DF:62E62872:qmdestroy:9997:packer-api@pve!pkr:
Jul 31 02:00:02 proxmox pvedaemon[2075938]: destroy VM 9997: UPID:proxmox:001FAD22:00EB77DF:62E62872:qmdestroy:9997:packer-api@pve!pkr:
Jul 31 02:00:02 proxmox pvedaemon[1562523]: <packer-api@pve!pkr> starting task UPID:proxmox:001FAD68:00EB77E6:62E62872:qmdestroy:9998:packer-api@pve!pkr:
Jul 31 02:00:02 proxmox pvedaemon[2076008]: destroy VM 9998: UPID:proxmox:001FAD68:00EB77E6:62E62872:qmdestroy:9998:packer-api@pve!pkr:
Jul 31 02:00:02 proxmox pvedaemon[2555226]: <packer-api@pve!pkr> starting task UPID:proxmox:001FAD72:00EB77ED:62E62872:qmdestroy:9999:packer-api@pve!pkr:
Jul 31 02:00:02 proxmox pvedaemon[2076018]: destroy VM 9999: UPID:proxmox:001FAD72:00EB77ED:62E62872:qmdestroy:9999:packer-api@pve!pkr:
Jul 31 02:00:15 proxmox pvedaemon[2555226]: <packer-api@pve!pkr> end task UPID:proxmox:001FAD18:00EB77D5:62E62872:qmdestroy:9996:packer-api@pve!pkr: OK
Jul 31 02:00:19 proxmox pvedaemon[2076008]: Could not remove disk 'local:9998/vm-9998-cloudinit.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
Jul 31 02:00:19 proxmox pvedaemon[2076018]: Could not remove disk 'local:9999/vm-9999-cloudinit.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
Jul 31 02:00:22 proxmox pvedaemon[1562523]: <packer-api@pve!pkr> end task UPID:proxmox:001FAD22:00EB77DF:62E62872:qmdestroy:9997:packer-api@pve!pkr: OK
Jul 31 02:00:28 proxmox pvedaemon[1562523]: <packer-api@pve!pkr> end task UPID:proxmox:001FAD68:00EB77E6:62E62872:qmdestroy:9998:packer-api@pve!pkr: OK
Jul 31 02:00:34 proxmox pvedaemon[2555226]: <packer-api@pve!pkr> end task UPID:proxmox:001FAD72:00EB77ED:62E62872:qmdestroy:9999:packer-api@pve!pkr: OK

And it appears this is the first time that I've ran into this error on cloud-init disks, but not the first time for any disk:

/var/log/syslog*:

Code:

# grep -i "Could not remove disk" syslog*
syslog:Jul 31 02:00:19 proxmox pvedaemon[2076008]: Could not remove disk 'local:9998/vm-9998-cloudinit.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog:Jul 31 02:00:19 proxmox pvedaemon[2076018]: Could not remove disk 'local:9999/vm-9999-cloudinit.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.1:Jul 24 02:00:25 proxmox pvedaemon[2308088]: Could not remove disk 'local:9999/base-9999-disk-1.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.2:Jul 17 02:00:25 proxmox pvedaemon[2089768]: Could not remove disk 'local:9999/base-9999-disk-0.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.3:Jul 10 16:43:16 proxmox pvedaemon[2570905]: Could not remove disk 'local:9877/base-9877-disk-0.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.3:Jul 10 16:43:16 proxmox pvedaemon[2570919]: Could not remove disk 'local:9878/base-9878-disk-0.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.3:Jul 10 16:43:16 proxmox pvedaemon[2570945]: Could not remove disk 'local:9879/base-9879-disk-0.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.3:Jul 11 00:14:16 proxmox pvedaemon[3806297]: Could not remove disk 'local:9783/base-9783-disk-0.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.3:Jul 11 00:14:16 proxmox pvedaemon[3806311]: Could not remove disk 'local:9784/base-9784-disk-0.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.3:Jul 11 00:14:16 proxmox pvedaemon[3806332]: Could not remove disk 'local:9785/base-9785-disk-0.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.3:Jul 11 12:42:35 proxmox pvedaemon[2044654]: Could not remove disk 'local:9876/base-9876-disk-0.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.3:Jul 11 12:42:35 proxmox pvedaemon[2044671]: Could not remove disk 'local:9877/base-9877-disk-1.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.3:Jul 11 12:42:35 proxmox pvedaemon[2044685]: Could not remove disk 'local:9878/base-9878-disk-1.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.3:Jul 11 12:42:35 proxmox pvedaemon[2044704]: Could not remove disk 'local:9879/base-9879-disk-1.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.3:Jul 11 21:07:07 proxmox pvedaemon[3358590]: Could not remove disk 'local:9879/base-9879-disk-2.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.3:Jul 11 21:07:07 proxmox pvedaemon[3358532]: Could not remove disk 'local:9877/base-9877-disk-2.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.3:Jul 11 21:07:07 proxmox pvedaemon[3358560]: Could not remove disk 'local:9878/base-9878-disk-2.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.3:Jul 11 21:09:32 proxmox pvedaemon[3369320]: Could not remove disk 'local:9785/base-9785-disk-1.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout

Manual verification:

Code:

root@proxmox:/var/log# ll /var/lib/vz/images/9999
total 6730634
-r--r--r-- 1 root root 53695545344 Jul 10 14:55 base-9999-disk-0.qcow2
-r--r--r-- 1 root root 53695545344 Jul 17 02:19 base-9999-disk-1.qcow2
-r--r--r-- 1 root root 53695545344 Jul 31 02:19 base-9999-disk-2.qcow2
-rw-r----- 1 root root     4521984 Jul 24 02:19 vm-9999-cloudinit.qcow2
root@proxmox:/var/log# ll /var/lib/vz/images/9998
total 1960597
-r--r--r-- 1 root root 53695545344 Jul 31 02:23 base-9998-disk-0.qcow2
-rw-r----- 1 root root     4521984 Jul 24 02:21 vm-9998-cloudinit.qcow2
root@proxmox:/var/log# ll /var/lock/pve-manager/pve-storage-local
-rw-r--r-- 1 root root 0 Jul 29 11:21 /var/lock/pve-manager/pve-storage-local

I run updates on a weekly basis, so here's what was updated around those times.

/var/log/apt/history.log:

Code:

Start-Date: 2022-07-01  07:00:11
Commandline: apt-get dist-upgrade -y
Upgrade: libssl1.1:amd64 (1.1.1n-0+deb11u2, 1.1.1n-0+deb11u3), openssl:amd64 (1.1.1n-0+deb11u2, 1.1.1n-0+deb11u3)
End-Date: 2022-07-01  07:00:16

Start-Date: 2022-07-03  01:08:32
Commandline: apt install zabbix-agent
Install: zabbix-agent:amd64 (1:5.0.8+dfsg-1)
End-Date: 2022-07-03  01:08:39

Start-Date: 2022-07-08  07:00:13
Commandline: apt-get dist-upgrade -y
Install: pve-kernel-5.15.39-1-pve:amd64 (5.15.39-1, automatic)
Upgrade: gpg:amd64 (2.2.27-2+deb11u1, 2.2.27-2+deb11u2), pve-qemu-kvm:amd64 (6.2.0-10, 6.2.0-11), gnupg:amd64 (2.2.27-2+deb11u1, 2.2.27-2+deb11u2), gpg-wks-server:amd64 (2.2.27-2+deb11u1, 2.2.27-2+deb11u2), gpg-agent:amd64 (2.2.27-2+deb11u1, 2.2.27-2+deb
11u2), lxc-pve:amd64 (4.0.12-1, 5.0.0-3), gpgv:amd64 (2.2.27-2+deb11u1, 2.2.27-2+deb11u2), libpve-access-control:amd64 (7.2-2, 7.2-3), gpgsm:amd64 (2.2.27-2+deb11u1, 2.2.27-2+deb11u2), libpve-http-server-perl:amd64 (4.1-2, 4.1-3), pve-manager:amd64 (7.2-
5, 7.2-7), pve-kernel-5.15:amd64 (7.2-5, 7.2-6), dirmngr:amd64 (2.2.27-2+deb11u1, 2.2.27-2+deb11u2), gnupg-utils:amd64 (2.2.27-2+deb11u1, 2.2.27-2+deb11u2), gnupg-l10n:amd64 (2.2.27-2+deb11u1, 2.2.27-2+deb11u2), gpg-wks-client:amd64 (2.2.27-2+deb11u1, 2.
2.27-2+deb11u2), gpgconf:amd64 (2.2.27-2+deb11u1, 2.2.27-2+deb11u2), pve-kernel-helper:amd64 (7.2-5, 7.2-6), intel-microcode:amd64 (3.20220207.1~deb11u1, 3.20220510.1~deb11u1)
End-Date: 2022-07-08  07:03:57

Start-Date: 2022-07-15  07:00:11
Commandline: apt-get dist-upgrade -y
Install: linux-headers-5.10.0-16-common:amd64 (5.10.127-1, automatic), linux-headers-5.10.0-16-amd64:amd64 (5.10.127-1, automatic)
Upgrade: dpkg:amd64 (1.20.10, 1.20.11), tcpdump:amd64 (4.99.0-2, 4.99.0-2+deb11u1), linux-kbuild-5.10:amd64 (5.10.120-1, 5.10.127-1), openssh-client:amd64 (1:8.4p1-5, 1:8.4p1-5+deb11u1), usb.ids:amd64 (2022.02.15-0+deb11u1, 2022.05.20-0+deb11u1), ssh:amd
64 (1:8.4p1-5, 1:8.4p1-5+deb11u1), linux-compiler-gcc-10-x86:amd64 (5.10.120-1, 5.10.127-1), procmail:amd64 (3.22-26, 3.22-26+deb11u1), libfribidi0:amd64 (1.0.8-2, 1.0.8-2+deb11u1), libfreetype6:amd64 (2.10.4+dfsg-1, 2.10.4+dfsg-1+deb11u1), logrotate:amd
64 (3.18.0-2, 3.18.0-2+deb11u1), libwbclient0:amd64 (2:4.13.13+dfsg-1~deb11u3, 2:4.13.13+dfsg-1~deb11u4), linux-headers-amd64:amd64 (5.10.120-1, 5.10.127-1), openssh-server:amd64 (1:8.4p1-5, 1:8.4p1-5+deb11u1), libsmbclient:amd64 (2:4.13.13+dfsg-1~deb11u
3, 2:4.13.13+dfsg-1~deb11u4), dpkg-dev:amd64 (1.20.10, 1.20.11), bash:amd64 (5.1-2+b3, 5.1-2+deb11u1), base-files:amd64 (11.1+deb11u3, 11.1+deb11u4), gnutls-bin:amd64 (3.7.1-5, 3.7.1-5+deb11u1), distro-info-data:amd64 (0.51+deb11u1, 0.51+deb11u2), smbcli
ent:amd64 (2:4.13.13+dfsg-1~deb11u3, 2:4.13.13+dfsg-1~deb11u4), libdpkg-perl:amd64 (1.20.10, 1.20.11), openssh-sftp-server:amd64 (1:8.4p1-5, 1:8.4p1-5+deb11u1), nano:amd64 (5.4-2, 5.4-2+deb11u1), samba-libs:amd64 (2:4.13.13+dfsg-1~deb11u3, 2:4.13.13+dfsg
-1~deb11u4), samba-common:amd64 (2:4.13.13+dfsg-1~deb11u3, 2:4.13.13+dfsg-1~deb11u4), libgnutls30:amd64 (3.7.1-5, 3.7.1-5+deb11u1), libgnutlsxx28:amd64 (3.7.1-5, 3.7.1-5+deb11u1), libgnutls-dane0:amd64 (3.7.1-5, 3.7.1-5+deb11u1), postfix:amd64 (3.5.6-1+b
1, 3.5.13-0+deb11u1), linux-libc-dev:amd64 (5.10.120-1, 5.10.127-1)
End-Date: 2022-07-15  07:01:13

Start-Date: 2022-07-22  07:00:17
Commandline: apt-get dist-upgrade -y
Upgrade: libpve-rs-perl:amd64 (0.6.1, 0.6.2), pve-firmware:amd64 (3.4-2, 3.5-1), libpve-cluster-api-perl:amd64 (7.2-1, 7.2-2), pve-lxc-syscalld:amd64 (1.1.1-1, 1.2.2-1), libpve-storage-perl:amd64 (7.2-5, 7.2-7), pve-cluster:amd64 (7.2-1, 7.2-2), libproxm
ox-rs-perl:amd64 (0.1.1, 0.1.2), proxmox-backup-file-restore:amd64 (2.2.3-1, 2.2.4-1), libpve-access-control:amd64 (7.2-3, 7.2-4), pve-container:amd64 (4.2-1, 4.2-2), proxmox-backup-client:amd64 (2.2.3-1, 2.2.4-1), libpve-cluster-perl:amd64 (7.2-1, 7.2-2
)
End-Date: 2022-07-22  07:01:01

Start-Date: 2022-07-29  07:00:11
Commandline: apt-get dist-upgrade -y
Install: pve-kernel-5.15.39-2-pve:amd64 (5.15.39-2, automatic)
Upgrade: linux-kbuild-5.10:amd64 (5.10.127-1, 5.10.127-2), zfs-zed:amd64 (2.1.4-pve1, 2.1.5-pve1), zfs-initramfs:amd64 (2.1.4-pve1, 2.1.5-pve1), spl:amd64 (2.1.4-pve1, 2.1.5-pve1), libnvpair3linux:amd64 (2.1.4-pve1, 2.1.5-pve1), pve-ha-manager:amd64 (3.3
-4, 3.4.0), linux-compiler-gcc-10-x86:amd64 (5.10.127-1, 5.10.127-2), libuutil3linux:amd64 (2.1.4-pve1, 2.1.5-pve1), linux-headers-5.10.0-16-common:amd64 (5.10.127-1, 5.10.127-2), libzpool5linux:amd64 (2.1.4-pve1, 2.1.5-pve1), linux-headers-amd64:amd64 (
5.10.127-1, 5.10.127-2), proxmox-backup-file-restore:amd64 (2.2.4-1, 2.2.5-1), linux-headers-5.10.0-16-amd64:amd64 (5.10.127-1, 5.10.127-2), proxmox-backup-client:amd64 (2.2.4-1, 2.2.5-1), pve-kernel-5.15:amd64 (7.2-6, 7.2-7), libzfs4linux:amd64 (2.1.4-p
ve1, 2.1.5-pve1), pve-kernel-helper:amd64 (7.2-6, 7.2-7), zfsutils-linux:amd64 (2.1.4-pve1, 2.1.5-pve1), linux-libc-dev:amd64 (5.10.127-1, 5.10.127-2)
End-Date: 2022-07-29  07:04:26

Current versions:

Code:

proxmox-ve: 7.2-1 (running kernel: 5.15.39-2-pve)
pve-manager: 7.2-7 (running version: 7.2-7/d0dd0e85)
pve-kernel-5.15: 7.2-7
pve-kernel-helper: 7.2-7
pve-kernel-5.4: 6.4-15
pve-kernel-5.3: 6.1-6
pve-kernel-5.15.39-2-pve: 5.15.39-2
pve-kernel-5.15.39-1-pve: 5.15.39-1
pve-kernel-5.15.35-3-pve: 5.15.35-6
pve-kernel-5.15.35-2-pve: 5.15.35-5
pve-kernel-5.15.35-1-pve: 5.15.35-3
pve-kernel-5.4.174-2-pve: 5.4.174-2
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-3
libpve-storage-perl: 7.2-7
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.5-1
proxmox-backup-file-restore: 2.2.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-2
pve-container: 4.2-2
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.5-1
pve-ha-manager: 3.4.0
pve-i18n: 2.7-2
pve-qemu-kvm: 6.2.0-11
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.5-pve1

fiona · Aug 1, 2022

Hi,

Tuxington said:

I ran into the same error myself in the UI:

Code:

update VM 9998: -description   -ide0 local:cloudinit -name packer-img-rhel8
TASK ERROR: disk image '/var/lib/vz/images/9998/vm-9998-cloudinit.qcow2' already exists

Simply removing the cloudinit.qcow2 files allowed me to attach a cloud-init drive to the templates again.

I remove the templates with the following loop. I'm not sure if there needs to be a delay in between calls to make this happy? After reviewing my node this appears to be the first time I've ran into this issue:

Code:

for i in $(seq 9996 1 9999); do
  echo "Deleting Template ID: ${i}"
  curl -k -X DELETE -H 'Authorization: PVEAPIToken=packer-api@pve!pkr=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX' https://<host>/api2/json/nodes/proxmox/qemu/${i}
  echo -e "\n"
done

It only seems to have affected 2/4 templates that I tried to delete:

/var/log/syslog:

Code:

Jul 31 02:00:02 proxmox pvedaemon[2555226]: <packer-api@pve!pkr> starting task UPID:proxmox:001FAD18:00EB77D5:62E62872:qmdestroy:9996:packer-api@pve!pkr:
Jul 31 02:00:02 proxmox pvedaemon[2075928]: destroy VM 9996: UPID:proxmox:001FAD18:00EB77D5:62E62872:qmdestroy:9996:packer-api@pve!pkr:
Jul 31 02:00:02 proxmox pvedaemon[1562523]: <packer-api@pve!pkr> starting task UPID:proxmox:001FAD22:00EB77DF:62E62872:qmdestroy:9997:packer-api@pve!pkr:
Jul 31 02:00:02 proxmox pvedaemon[2075938]: destroy VM 9997: UPID:proxmox:001FAD22:00EB77DF:62E62872:qmdestroy:9997:packer-api@pve!pkr:
Jul 31 02:00:02 proxmox pvedaemon[1562523]: <packer-api@pve!pkr> starting task UPID:proxmox:001FAD68:00EB77E6:62E62872:qmdestroy:9998:packer-api@pve!pkr:
Jul 31 02:00:02 proxmox pvedaemon[2076008]: destroy VM 9998: UPID:proxmox:001FAD68:00EB77E6:62E62872:qmdestroy:9998:packer-api@pve!pkr:
Jul 31 02:00:02 proxmox pvedaemon[2555226]: <packer-api@pve!pkr> starting task UPID:proxmox:001FAD72:00EB77ED:62E62872:qmdestroy:9999:packer-api@pve!pkr:
Jul 31 02:00:02 proxmox pvedaemon[2076018]: destroy VM 9999: UPID:proxmox:001FAD72:00EB77ED:62E62872:qmdestroy:9999:packer-api@pve!pkr:
Jul 31 02:00:15 proxmox pvedaemon[2555226]: <packer-api@pve!pkr> end task UPID:proxmox:001FAD18:00EB77D5:62E62872:qmdestroy:9996:packer-api@pve!pkr: OK
Jul 31 02:00:19 proxmox pvedaemon[2076008]: Could not remove disk 'local:9998/vm-9998-cloudinit.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
Jul 31 02:00:19 proxmox pvedaemon[2076018]: Could not remove disk 'local:9999/vm-9999-cloudinit.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
Jul 31 02:00:22 proxmox pvedaemon[1562523]: <packer-api@pve!pkr> end task UPID:proxmox:001FAD22:00EB77DF:62E62872:qmdestroy:9997:packer-api@pve!pkr: OK
Jul 31 02:00:28 proxmox pvedaemon[1562523]: <packer-api@pve!pkr> end task UPID:proxmox:001FAD68:00EB77E6:62E62872:qmdestroy:9998:packer-api@pve!pkr: OK
Jul 31 02:00:34 proxmox pvedaemon[2555226]: <packer-api@pve!pkr> end task UPID:proxmox:001FAD72:00EB77ED:62E62872:qmdestroy:9999:packer-api@pve!pkr: OK

And it appears this is the first time that I've ran into this error on cloud-init disks, but not the first time for any disk:

/var/log/syslog*:

Code:

# grep -i "Could not remove disk" syslog*
syslog:Jul 31 02:00:19 proxmox pvedaemon[2076008]: Could not remove disk 'local:9998/vm-9998-cloudinit.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog:Jul 31 02:00:19 proxmox pvedaemon[2076018]: Could not remove disk 'local:9999/vm-9999-cloudinit.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.1:Jul 24 02:00:25 proxmox pvedaemon[2308088]: Could not remove disk 'local:9999/base-9999-disk-1.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.2:Jul 17 02:00:25 proxmox pvedaemon[2089768]: Could not remove disk 'local:9999/base-9999-disk-0.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.3:Jul 10 16:43:16 proxmox pvedaemon[2570905]: Could not remove disk 'local:9877/base-9877-disk-0.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.3:Jul 10 16:43:16 proxmox pvedaemon[2570919]: Could not remove disk 'local:9878/base-9878-disk-0.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.3:Jul 10 16:43:16 proxmox pvedaemon[2570945]: Could not remove disk 'local:9879/base-9879-disk-0.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.3:Jul 11 00:14:16 proxmox pvedaemon[3806297]: Could not remove disk 'local:9783/base-9783-disk-0.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.3:Jul 11 00:14:16 proxmox pvedaemon[3806311]: Could not remove disk 'local:9784/base-9784-disk-0.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.3:Jul 11 00:14:16 proxmox pvedaemon[3806332]: Could not remove disk 'local:9785/base-9785-disk-0.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.3:Jul 11 12:42:35 proxmox pvedaemon[2044654]: Could not remove disk 'local:9876/base-9876-disk-0.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.3:Jul 11 12:42:35 proxmox pvedaemon[2044671]: Could not remove disk 'local:9877/base-9877-disk-1.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.3:Jul 11 12:42:35 proxmox pvedaemon[2044685]: Could not remove disk 'local:9878/base-9878-disk-1.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.3:Jul 11 12:42:35 proxmox pvedaemon[2044704]: Could not remove disk 'local:9879/base-9879-disk-1.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.3:Jul 11 21:07:07 proxmox pvedaemon[3358590]: Could not remove disk 'local:9879/base-9879-disk-2.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.3:Jul 11 21:07:07 proxmox pvedaemon[3358532]: Could not remove disk 'local:9877/base-9877-disk-2.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.3:Jul 11 21:07:07 proxmox pvedaemon[3358560]: Could not remove disk 'local:9878/base-9878-disk-2.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout
syslog.3:Jul 11 21:09:32 proxmox pvedaemon[3369320]: Could not remove disk 'local:9785/base-9785-disk-1.qcow2', check manually: can't lock file '/var/lock/pve-manager/pve-storage-local' - got timeout

Manual verification:

Code:

root@proxmox:/var/log# ll /var/lib/vz/images/9999
total 6730634
-r--r--r-- 1 root root 53695545344 Jul 10 14:55 base-9999-disk-0.qcow2
-r--r--r-- 1 root root 53695545344 Jul 17 02:19 base-9999-disk-1.qcow2
-r--r--r-- 1 root root 53695545344 Jul 31 02:19 base-9999-disk-2.qcow2
-rw-r----- 1 root root     4521984 Jul 24 02:19 vm-9999-cloudinit.qcow2
root@proxmox:/var/log# ll /var/lib/vz/images/9998
total 1960597
-r--r--r-- 1 root root 53695545344 Jul 31 02:23 base-9998-disk-0.qcow2
-rw-r----- 1 root root     4521984 Jul 24 02:21 vm-9998-cloudinit.qcow2
root@proxmox:/var/log# ll /var/lock/pve-manager/pve-storage-local
-rw-r--r-- 1 root root 0 Jul 29 11:21 /var/lock/pve-manager/pve-storage-local

I run updates on a weekly basis, so here's what was updated around those times.

/var/log/apt/history.log:

Code:

Start-Date: 2022-07-01  07:00:11
Commandline: apt-get dist-upgrade -y
Upgrade: libssl1.1:amd64 (1.1.1n-0+deb11u2, 1.1.1n-0+deb11u3), openssl:amd64 (1.1.1n-0+deb11u2, 1.1.1n-0+deb11u3)
End-Date: 2022-07-01  07:00:16

Start-Date: 2022-07-03  01:08:32
Commandline: apt install zabbix-agent
Install: zabbix-agent:amd64 (1:5.0.8+dfsg-1)
End-Date: 2022-07-03  01:08:39

Start-Date: 2022-07-08  07:00:13
Commandline: apt-get dist-upgrade -y
Install: pve-kernel-5.15.39-1-pve:amd64 (5.15.39-1, automatic)
Upgrade: gpg:amd64 (2.2.27-2+deb11u1, 2.2.27-2+deb11u2), pve-qemu-kvm:amd64 (6.2.0-10, 6.2.0-11), gnupg:amd64 (2.2.27-2+deb11u1, 2.2.27-2+deb11u2), gpg-wks-server:amd64 (2.2.27-2+deb11u1, 2.2.27-2+deb11u2), gpg-agent:amd64 (2.2.27-2+deb11u1, 2.2.27-2+deb
11u2), lxc-pve:amd64 (4.0.12-1, 5.0.0-3), gpgv:amd64 (2.2.27-2+deb11u1, 2.2.27-2+deb11u2), libpve-access-control:amd64 (7.2-2, 7.2-3), gpgsm:amd64 (2.2.27-2+deb11u1, 2.2.27-2+deb11u2), libpve-http-server-perl:amd64 (4.1-2, 4.1-3), pve-manager:amd64 (7.2-
5, 7.2-7), pve-kernel-5.15:amd64 (7.2-5, 7.2-6), dirmngr:amd64 (2.2.27-2+deb11u1, 2.2.27-2+deb11u2), gnupg-utils:amd64 (2.2.27-2+deb11u1, 2.2.27-2+deb11u2), gnupg-l10n:amd64 (2.2.27-2+deb11u1, 2.2.27-2+deb11u2), gpg-wks-client:amd64 (2.2.27-2+deb11u1, 2.
2.27-2+deb11u2), gpgconf:amd64 (2.2.27-2+deb11u1, 2.2.27-2+deb11u2), pve-kernel-helper:amd64 (7.2-5, 7.2-6), intel-microcode:amd64 (3.20220207.1~deb11u1, 3.20220510.1~deb11u1)
End-Date: 2022-07-08  07:03:57

Start-Date: 2022-07-15  07:00:11
Commandline: apt-get dist-upgrade -y
Install: linux-headers-5.10.0-16-common:amd64 (5.10.127-1, automatic), linux-headers-5.10.0-16-amd64:amd64 (5.10.127-1, automatic)
Upgrade: dpkg:amd64 (1.20.10, 1.20.11), tcpdump:amd64 (4.99.0-2, 4.99.0-2+deb11u1), linux-kbuild-5.10:amd64 (5.10.120-1, 5.10.127-1), openssh-client:amd64 (1:8.4p1-5, 1:8.4p1-5+deb11u1), usb.ids:amd64 (2022.02.15-0+deb11u1, 2022.05.20-0+deb11u1), ssh:amd
64 (1:8.4p1-5, 1:8.4p1-5+deb11u1), linux-compiler-gcc-10-x86:amd64 (5.10.120-1, 5.10.127-1), procmail:amd64 (3.22-26, 3.22-26+deb11u1), libfribidi0:amd64 (1.0.8-2, 1.0.8-2+deb11u1), libfreetype6:amd64 (2.10.4+dfsg-1, 2.10.4+dfsg-1+deb11u1), logrotate:amd
64 (3.18.0-2, 3.18.0-2+deb11u1), libwbclient0:amd64 (2:4.13.13+dfsg-1~deb11u3, 2:4.13.13+dfsg-1~deb11u4), linux-headers-amd64:amd64 (5.10.120-1, 5.10.127-1), openssh-server:amd64 (1:8.4p1-5, 1:8.4p1-5+deb11u1), libsmbclient:amd64 (2:4.13.13+dfsg-1~deb11u
3, 2:4.13.13+dfsg-1~deb11u4), dpkg-dev:amd64 (1.20.10, 1.20.11), bash:amd64 (5.1-2+b3, 5.1-2+deb11u1), base-files:amd64 (11.1+deb11u3, 11.1+deb11u4), gnutls-bin:amd64 (3.7.1-5, 3.7.1-5+deb11u1), distro-info-data:amd64 (0.51+deb11u1, 0.51+deb11u2), smbcli
ent:amd64 (2:4.13.13+dfsg-1~deb11u3, 2:4.13.13+dfsg-1~deb11u4), libdpkg-perl:amd64 (1.20.10, 1.20.11), openssh-sftp-server:amd64 (1:8.4p1-5, 1:8.4p1-5+deb11u1), nano:amd64 (5.4-2, 5.4-2+deb11u1), samba-libs:amd64 (2:4.13.13+dfsg-1~deb11u3, 2:4.13.13+dfsg
-1~deb11u4), samba-common:amd64 (2:4.13.13+dfsg-1~deb11u3, 2:4.13.13+dfsg-1~deb11u4), libgnutls30:amd64 (3.7.1-5, 3.7.1-5+deb11u1), libgnutlsxx28:amd64 (3.7.1-5, 3.7.1-5+deb11u1), libgnutls-dane0:amd64 (3.7.1-5, 3.7.1-5+deb11u1), postfix:amd64 (3.5.6-1+b
1, 3.5.13-0+deb11u1), linux-libc-dev:amd64 (5.10.120-1, 5.10.127-1)
End-Date: 2022-07-15  07:01:13

Start-Date: 2022-07-22  07:00:17
Commandline: apt-get dist-upgrade -y
Upgrade: libpve-rs-perl:amd64 (0.6.1, 0.6.2), pve-firmware:amd64 (3.4-2, 3.5-1), libpve-cluster-api-perl:amd64 (7.2-1, 7.2-2), pve-lxc-syscalld:amd64 (1.1.1-1, 1.2.2-1), libpve-storage-perl:amd64 (7.2-5, 7.2-7), pve-cluster:amd64 (7.2-1, 7.2-2), libproxm
ox-rs-perl:amd64 (0.1.1, 0.1.2), proxmox-backup-file-restore:amd64 (2.2.3-1, 2.2.4-1), libpve-access-control:amd64 (7.2-3, 7.2-4), pve-container:amd64 (4.2-1, 4.2-2), proxmox-backup-client:amd64 (2.2.3-1, 2.2.4-1), libpve-cluster-perl:amd64 (7.2-1, 7.2-2
)
End-Date: 2022-07-22  07:01:01

Start-Date: 2022-07-29  07:00:11
Commandline: apt-get dist-upgrade -y
Install: pve-kernel-5.15.39-2-pve:amd64 (5.15.39-2, automatic)
Upgrade: linux-kbuild-5.10:amd64 (5.10.127-1, 5.10.127-2), zfs-zed:amd64 (2.1.4-pve1, 2.1.5-pve1), zfs-initramfs:amd64 (2.1.4-pve1, 2.1.5-pve1), spl:amd64 (2.1.4-pve1, 2.1.5-pve1), libnvpair3linux:amd64 (2.1.4-pve1, 2.1.5-pve1), pve-ha-manager:amd64 (3.3
-4, 3.4.0), linux-compiler-gcc-10-x86:amd64 (5.10.127-1, 5.10.127-2), libuutil3linux:amd64 (2.1.4-pve1, 2.1.5-pve1), linux-headers-5.10.0-16-common:amd64 (5.10.127-1, 5.10.127-2), libzpool5linux:amd64 (2.1.4-pve1, 2.1.5-pve1), linux-headers-amd64:amd64 (
5.10.127-1, 5.10.127-2), proxmox-backup-file-restore:amd64 (2.2.4-1, 2.2.5-1), linux-headers-5.10.0-16-amd64:amd64 (5.10.127-1, 5.10.127-2), proxmox-backup-client:amd64 (2.2.4-1, 2.2.5-1), pve-kernel-5.15:amd64 (7.2-6, 7.2-7), libzfs4linux:amd64 (2.1.4-p
ve1, 2.1.5-pve1), pve-kernel-helper:amd64 (7.2-6, 7.2-7), zfsutils-linux:amd64 (2.1.4-pve1, 2.1.5-pve1), linux-libc-dev:amd64 (5.10.127-1, 5.10.127-2)
End-Date: 2022-07-29  07:04:26

Current versions:

Code:

proxmox-ve: 7.2-1 (running kernel: 5.15.39-2-pve)
pve-manager: 7.2-7 (running version: 7.2-7/d0dd0e85)
pve-kernel-5.15: 7.2-7
pve-kernel-helper: 7.2-7
pve-kernel-5.4: 6.4-15
pve-kernel-5.3: 6.1-6
pve-kernel-5.15.39-2-pve: 5.15.39-2
pve-kernel-5.15.39-1-pve: 5.15.39-1
pve-kernel-5.15.35-3-pve: 5.15.35-6
pve-kernel-5.15.35-2-pve: 5.15.35-5
pve-kernel-5.15.35-1-pve: 5.15.35-3
pve-kernel-5.4.174-2-pve: 5.4.174-2
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-3
libpve-storage-perl: 7.2-7
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.5-1
proxmox-backup-file-restore: 2.2.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-2
pve-container: 4.2-2
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.5-1
pve-ha-manager: 3.4.0
pve-i18n: 2.7-2
pve-qemu-kvm: 6.2.0-11
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.5-pve1

the issue is probably that you start all those DELETE requests right after another. Each of them needs to acquire the lock on the storage when removing a volume. This is done to ensure that there are no naming conflicts, only one operation will access the volume, etc.. But some requests run into the timeout waiting for their turn. Please try adding a little bit of sleep between the requests (or even wait for the task to be finished by checking https://pve.proxmox.com/pve-docs/api-viewer/#/nodes/{node}/tasks/{upid}/status with the returned UPID).

Tuxington · Aug 1, 2022

fiona said:
Hi,

the issue is probably that you start all those DELETE requests right after another. Each of them needs to acquire the lock on the storage when removing a volume. This is done to ensure that there are no naming conflicts, only one operation will access the volume, etc.. But some requests run into the timeout waiting for their turn. Please try adding a little bit of sleep between the requests (or even wait for the task to be finished by checking https://pve.proxmox.com/pve-docs/api-viewer/#/nodes/{node}/tasks/{upid}/status with the returned UPID).

Do you know if this is maybe a new (intended) behavior?

I haven't been running these scripts for THAT long, but can't say I've ran into issues with it not removing the cloud-init disks before. The other (normal) disk files I found mentioned in syslog are likely from the same type of DELETE request loop trying to remove templates in bulk.

Just wondering if there might be something funky going on with something I'm doing as I've been running into odd things here and there, but nothing major.

fiona · Aug 2, 2022

Tuxington said:
Do you know if this is maybe a new (intended) behavior?

I don't think this recently changed.

Tuxington said:
I haven't been running these scripts for THAT long, but can't say I've ran into issues with it not removing the cloud-init disks before. The other (normal) disk files I found mentioned in syslog are likely from the same type of DELETE request loop trying to remove templates in bulk.

Since cloud-init disks have a unique name, you will notice the left-over disks more quickly. Maybe the operations for removing cloud-init disks just got lucky in the past. Or until now you didn't create a clout-init disk with a VM ID that still had a left-over one and didn't notice.

Tuxington said:
Just wondering if there might be something funky going on with something I'm doing as I've been running into odd things here and there, but nothing major.

tt2468 · Aug 24, 2022

I've been experiencing what I believe to be the same issue, except I can replicate it reliably. I'm creating a VM via the API, configuring it, then starting it, and my start tasks fail semi-regularly. I can reduce the chances of it happening by using a 5 second delay, but this is suboptimal as I'm shooting for the lowest possible delay.

The issue seems to be a race condition, where something tries to regenerate a cloudinit drive while there is already another process generating it.

Using the method the UI does of deleting the drive via PUT and then recreating it also fails occasionally with

Code:

disk image '/mnt/pve/optanestor/images/9001/vm-9001-cloudinit.qcow2' does not exist

logged to journalctl.

Seems the PUT request being "synchronous" is not actually synchronous, as definitely appears to be returning early before the image regeneration has finished.

fiona · Aug 24, 2022

Hi,

tt2468 said:
I've been experiencing what I believe to be the same issue, except I can replicate it reliably. I'm creating a VM via the API, configuring it, then starting it, and my start tasks fail semi-regularly. I can reduce the chances of it happening by using a 5 second delay, but this is suboptimal as I'm shooting for the lowest possible delay.

The issue seems to be a race condition, where something tries to regenerate a cloudinit drive while there is already another process generating it.

Using the method the UI does of deleting the drive via PUT and then recreating it also fails occasionally with

Code:

disk image '/mnt/pve/optanestor/images/9001/vm-9001-cloudinit.qcow2' does not exist

logged to journalctl.

Seems the PUT request being "synchronous" is not actually synchronous, as definitely appears to be returning early before the image regeneration has finished.

do the failing start tasks get the same error as above? Stupid question, but do you wait for the response of the server after issuing the PUT requests? Can you share (the parts of) the script exhibiting the issues?

FYI, currently, the cloudinit image is only regenerated when the VM starts or when pending changes are applied. When you configure a cloudinit drive, an empty disk is allocated, but no image is generated yet.

tt2468 · Aug 24, 2022

fiona said:
Hi,

do the failing start tasks get the same error as above? Stupid question, but do you wait for the response of the server after issuing the PUT requests? Can you share (the parts of) the script exhibiting the issues?

FYI, currently, the cloudinit image is only regenerated when the VM starts or when pending changes are applied. When you configure a cloudinit drive, an empty disk is allocated, but no image is generated yet.

I think I might be in the wrong here. For context, this is my process:
- Create VM via API, wait via polling UPID
- Call custom API connector on node which uses `qm set` to apply the instance metadata
- Start VM via API, wait via polling UPID (this is where things fail, with the `already exists` error)

What I completely didn't realize, is that `qm set` uses the synchronous API and the synchronous API is returning basically immediately. I've just now replaced the `qm set` command with an equivalent `pvesh create` command and it appears to finally be working correctly. So what really must have been happing is the `qm set` was shooting off a generation, then starting the VM triggered a conflicting generation. Seems like maybe cloudinit drive regeneration process needs to be protected by a lock.

Just for confirmation since I was unable to find documentation or source code for this, does the `pvesh create` command perform a wait for the task to complete before returning?

EDIT: Looks like it failed again, same `already exists` error on VM start. So `pvesh create` must not be waiting or there's something else going on.

fiona · Aug 25, 2022

tt2468 said:
I think I might be in the wrong here. For context, this is my process:
- Create VM via API, wait via polling UPID
- Call custom API connector on node which uses `qm set` to apply the instance metadata

Did you wait for the qm set command to finish here?

tt2468 said:
- Start VM via API, wait via polling UPID (this is where things fail, with the `already exists` error)

So it's different from the error upon recreating (which is a does not exist error)? Does the VM <ID> - Start task log include more information? Can you post an excerpt of the syslog surrounding the error?

tt2468 said:
What I completely didn't realize, is that `qm set` uses the synchronous API and the synchronous API is returning basically immediately.

Yes, but the synchronous API only returns after allocating the image. What kind of storage does the cloudinit disk reside on?

tt2468 said:
I've just now replaced the `qm set` command with an equivalent `pvesh create` command and it appears to finally be working correctly. So what really must have been happing is the `qm set` was shooting off a generation, then starting the VM triggered a conflicting generation. Seems like maybe cloudinit drive regeneration process needs to be protected by a lock.

Just for confirmation since I was unable to find documentation or source code for this, does the `pvesh create` command perform a wait for the task to complete before returning?

No, pvesh create uses POST (see man pvesh) and thus the asynchronous API for /nodes/{node}/qemu/{vmid}/config. The asynchronous API will spawn a worker and return the UPID immediately (well, with pvesh, we actually run it in foreground mode when issued from the CLI).

The synchronous API for /nodes/{node}/qemu/{vmid}/config is with PUT. But using the asynchronous endpoint and checking on the task via UPID until it's finished is actually recommended, because there is a timeout for API calls and disk allocations can take a while (but shouldn't be relevant for the small cloudinit disk).

tt2468 said:
EDIT: Looks like it failed again, same `already exists` error on VM start. So `pvesh create` must not be waiting or there's something else going on.

tt2468 · Aug 25, 2022

fiona said:
Did you wait for the qm set command to finish here?

Yes, all of my code waits for the command to finish.

fiona said:
So it's different from the error upon recreating (which is a does not exist error)? Does the VM <ID> - Start task log include more information? Can you post an excerpt of the syslog surrounding the error?

Here's the full error it gives me:

Code:

failed to stat '/mnt/pve/optanestor/images/9005/vm-9005-cloudinit.qcow2'
TASK ERROR: disk image '/mnt/pve/optanestor/images/9005/vm-9005-cloudinit.qcow2' already exists

fiona said:
No, pvesh create uses POST (see man pvesh) and thus the asynchronous API for /nodes/{node}/qemu/{vmid}/config. The asynchronous API will spawn a worker and return the UPID immediately (well, with pvesh, we actually run it in foreground mode when issued from the CLI).

Hmm, alright. How can I get that UPID using pvesh? I tried in JSON mode and it just returned the "update vm ..." string in quotes instead of raw. I have to use PVESH because I cannot configure the `cicustom` field via the token API.

fiona · Aug 25, 2022

tt2468 said:

Here's the full error it gives me:

Code:

failed to stat '/mnt/pve/optanestor/images/9005/vm-9005-cloudinit.qcow2'
TASK ERROR: disk image '/mnt/pve/optanestor/images/9005/vm-9005-cloudinit.qcow2' already exists

Can you post part of /var/log/syslog surrounding the issue?

What kind of storage is mounted at /mnt/pve/optanestor?

Does the following script throw any error? Please use a VM ID that's not currently in use and the storage ID belonging to the mount:

Code:

perl disk-test.pl <storage ID> <VM ID>

Code:

root@pve701 ~ # cat disk-test.pl  
#!/usr/bin/perl

use strict;
use warnings;

use PVE::Storage;

my $storeid = $ARGV[0] or die "usage: perl $0 <storeid> <vmid>\n";
my $vmid = $ARGV[1] or die "usage: perl $0 <storeid> <vmid>\n";

my $storecfg = PVE::Storage::config();
my $scfg = PVE::Storage::storage_config($storecfg, $storeid);
die "expected file-based storage\n" if !$scfg->{path};

my $name = "vm-$vmid-cloudinit.qcow2";
my $volid = "$storeid:$vmid/vm-$vmid-cloudinit.qcow2";

for (my $i = 0; $i < 10; $i++) {
    PVE::Storage::vdisk_alloc($storecfg, $storeid, $vmid, 'qcow2', $name, 4 * 1024);
    eval {
        my $size = eval { PVE::Storage::volume_size_info($storecfg, $volid) };
        die $@ if $@;
        die "didn't get a size\n" if !defined($size);
        die "unexpected size - $size\n" if $size <= 0;
        print "got size $size\n";
    };
    my $err = $@;
    PVE::Storage::vdisk_free($storecfg, $volid);
    die $err if $err;
}

tt2468 · Aug 25, 2022

fiona said:
Can you post part of /var/log/syslog surrounding the issue?

Code:

Aug 25 01:55:18 prox-gpu2 pvedaemon[393777]: <root@pam!dispatcher_worker> starting task UPID:prox-gpu2:0006D3DD:0093B3F0:630738F6:qmstart:9005:root@pam!dispatcher_worker:
Aug 25 01:55:18 prox-gpu2 pvedaemon[447453]: start VM 9005: UPID:prox-gpu2:0006D3DD:0093B3F0:630738F6:qmstart:9005:root@pam!dispatcher_worker:
Aug 25 01:55:18 prox-gpu2 pvedaemon[447453]: failed to stat '/mnt/pve/optanestor/images/9005/vm-9005-cloudinit.qcow2'
Aug 25 01:55:18 prox-gpu2 pvedaemon[447453]: disk image '/mnt/pve/optanestor/images/9005/vm-9005-cloudinit.qcow2' already exists
Aug 25 01:55:18 prox-gpu2 pvedaemon[393777]: <root@pam!dispatcher_worker> end task UPID:prox-gpu2:0006D3DD:0093B3F0:630738F6:qmstart:9005:root@pam!dispatcher_worker: disk image '/mnt/pve/optanestor/images/9005/vm-9005-cloudinit.qcow2' already exists
Aug 25 01:55:30 prox-gpu2 pvedaemon[447514]: start VM 9005: UPID:prox-gpu2:0006D41A:0093B8A4:63073902:qmstart:9005:root@pam!dispatcher_worker:
Aug 25 01:55:30 prox-gpu2 pvedaemon[444788]: <root@pam!dispatcher_worker> starting task UPID:prox-gpu2:0006D41A:0093B8A4:63073902:qmstart:9005:root@pam!dispatcher_worker:

Here's the log from the failed start, and a successful retry 10 seconds later (our current workaround)

fiona said:
What kind of storage is mounted at /mnt/pve/optanestor?

It's NFS storage over a 10g network.

fiona said:
Does the following script throw any error?

Doesn't seem to:

Code:

Formatting '/mnt/pve/optanestor/images/105/vm-105-cloudinit.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off preallocation=metadata compression_type=zlib size=4194304 lazy_refcounts=off refcount_bits=16
got size 4194304
Formatting '/mnt/pve/optanestor/images/105/vm-105-cloudinit.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off preallocation=metadata compression_type=zlib size=4194304 lazy_refcounts=off refcount_bits=16
got size 4194304
Formatting '/mnt/pve/optanestor/images/105/vm-105-cloudinit.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off preallocation=metadata compression_type=zlib size=4194304 lazy_refcounts=off refcount_bits=16
got size 4194304
Formatting '/mnt/pve/optanestor/images/105/vm-105-cloudinit.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off preallocation=metadata compression_type=zlib size=4194304 lazy_refcounts=off refcount_bits=16
got size 4194304
Formatting '/mnt/pve/optanestor/images/105/vm-105-cloudinit.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off preallocation=metadata compression_type=zlib size=4194304 lazy_refcounts=off refcount_bits=16
got size 4194304
Formatting '/mnt/pve/optanestor/images/105/vm-105-cloudinit.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off preallocation=metadata compression_type=zlib size=4194304 lazy_refcounts=off refcount_bits=16
got size 4194304
Formatting '/mnt/pve/optanestor/images/105/vm-105-cloudinit.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off preallocation=metadata compression_type=zlib size=4194304 lazy_refcounts=off refcount_bits=16
got size 4194304
Formatting '/mnt/pve/optanestor/images/105/vm-105-cloudinit.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off preallocation=metadata compression_type=zlib size=4194304 lazy_refcounts=off refcount_bits=16
got size 4194304
Formatting '/mnt/pve/optanestor/images/105/vm-105-cloudinit.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off preallocation=metadata compression_type=zlib size=4194304 lazy_refcounts=off refcount_bits=16
got size 4194304
Formatting '/mnt/pve/optanestor/images/105/vm-105-cloudinit.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off preallocation=metadata compression_type=zlib size=4194304 lazy_refcounts=off refcount_bits=16
got size 4194304

fiona · Aug 25, 2022

tt2468 said:

Code:

Aug 25 01:55:18 prox-gpu2 pvedaemon[393777]: <root@pam!dispatcher_worker> starting task UPID:prox-gpu2:0006D3DD:0093B3F0:630738F6:qmstart:9005:root@pam!dispatcher_worker:
Aug 25 01:55:18 prox-gpu2 pvedaemon[447453]: start VM 9005: UPID:prox-gpu2:0006D3DD:0093B3F0:630738F6:qmstart:9005:root@pam!dispatcher_worker:
Aug 25 01:55:18 prox-gpu2 pvedaemon[447453]: failed to stat '/mnt/pve/optanestor/images/9005/vm-9005-cloudinit.qcow2'
Aug 25 01:55:18 prox-gpu2 pvedaemon[447453]: disk image '/mnt/pve/optanestor/images/9005/vm-9005-cloudinit.qcow2' already exists
Aug 25 01:55:18 prox-gpu2 pvedaemon[393777]: <root@pam!dispatcher_worker> end task UPID:prox-gpu2:0006D3DD:0093B3F0:630738F6:qmstart:9005:root@pam!dispatcher_worker: disk image '/mnt/pve/optanestor/images/9005/vm-9005-cloudinit.qcow2' already exists
Aug 25 01:55:30 prox-gpu2 pvedaemon[447514]: start VM 9005: UPID:prox-gpu2:0006D41A:0093B8A4:63073902:qmstart:9005:root@pam!dispatcher_worker:
Aug 25 01:55:30 prox-gpu2 pvedaemon[444788]: <root@pam!dispatcher_worker> starting task UPID:prox-gpu2:0006D41A:0093B8A4:63073902:qmstart:9005:root@pam!dispatcher_worker:

Here's the log from the failed start, and a successful retry 10 seconds later (our current workaround)

I was hoping for a bit more, with VM creation and modification

Search

Search

TASK ERROR: clone failed: disk image '/var/lib/vz/images/10088/vm-10088-cloudinit.qcow2' already exists

rootnetworks

New Member

Tuxington

Member

fiona

Proxmox Staff Member

Tuxington

Member

fiona

Proxmox Staff Member

tt2468

Active Member

fiona

Proxmox Staff Member

tt2468

Active Member

fiona

Proxmox Staff Member

tt2468

Active Member

fiona

Proxmox Staff Member

tt2468

Active Member

fiona

Proxmox Staff Member