Hi,
I do have 3-node cluster with latest up-to-date version of Proxmox VE 7.2-7. This is fresh installation. We used to have PVE5.x
Technical Information:
Storage back-end configuration : SAN (iSCSI+LVM)
lvm.conf global_filter value:
global_filter = [ "r|/dev/zd.*|", "r|/dev/mapper/pve-.*|", "r|/dev/mapper/small--pool.*|", "r|/dev/mapper/medium--pool.*|", "r|/dev/mapper/large--pool.*|", "r|/dev/mapper/template--pool.*|" ]
I am bypassing the scan of the SAN-Based Volumes
I do have OS template from which we clone (Full clone mode) VMs. THis is done automatically with API calls.
For some reason the clone creation fails. Here is the example of the error from the TASK:
"create full clone of drive scsi0 (pve-template-pool-datastore:vm-106-disk-0)
device-mapper: create ioctl on small--pool2-vm--159--disk--0 LVM-bTzHDpropJH4Td1wQwNT5oskPWvcNATTsn1InnBnH1qZv1edqGzlxnj1qMWW0BB3 failed: Device or resource busy
TASK ERROR: clone failed: lvcreate 'small-pool2/vm-159-disk-0' error: Failed to activate new LV small-pool2/vm-159-disk-0."
In this case the corrupt VM iD is 159.
========================================================================================================
Here are the SYSLOGs:
Aug 02 07:55:07 pve-002 pvedaemon[544590]: <root@pam> successful auth for user 'jenkins-prox@stratinfotech.com'
Aug 02 07:55:07 pve-002 pveproxy[1253665]: Clearing outdated entries from certificate cache
Aug 02 07:55:17 pve-002 pvedaemon[616446]: jenkins-prox@stratinfotech.com starting task UPIDve-002:0014317B:0333B024:62E910A5:qmclone:106:jenkins-prox@stratinfotech.com:
Aug 02 07:55:19 pve-002 pvedaemon[1323387]: VM 106 qmp command failed - VM 106 not running
Aug 02 07:55:20 pve-002 pvedaemon[1323387]: clone failed: lvcreate 'small-pool1/vm-115-disk-0' error: Failed to activate new LV small-pool1/vm-115-disk-0.
Aug 02 07:55:20 pve-002 pvedaemon[616446]: jenkins-prox@stratinfotech.com end task UPIDve-002:0014317B:0333B024:62E910A5:qmclone:106:jenkins-prox@stratinfotech.com: clone failed: lvcreate 'small-pool1/vm-115-disk-0' error: Failed to activate new LV small-pool1/vm-115-disk-0.
Aug 02 07:57:25 pve-002 pvedaemon[616446]: <root@pam> successful auth for user 'jenkins-prox@stratinfotech.com'
Aug 02 07:57:28 pve-002 pveproxy[1253666]: Clearing outdated entries from certificate cache
Aug 02 07:57:34 pve-002 pvedaemon[556778]: jenkins-prox@stratinfotech.com starting task UPIDve-002:00143302:0333E5D0:62E9112E:qmclone:106:jenkins-prox@stratinfotech.com:
Aug 02 07:57:36 pve-002 pvedaemon[1323778]: VM 106 qmp command failed - VM 106 not running
Aug 02 07:57:37 pve-002 pvedaemon[1323778]: clone failed: lvcreate 'small-pool1/vm-115-disk-0' error: Failed to activate new LV small-pool1/vm-115-disk-0.
Aug 02 07:57:37 pve-002 pvedaemon[556778]: jenkins-prox@stratinfotech.com end task UPIDve-002:00143302:0333E5D0:62E9112E:qmclone:106:jenkins-prox@stratinfotech.com: clone failed: lvcreate 'small-pool1/vm-115-disk-0' error: Failed to activate new LV small-pool1/vm-115-disk-0.
Aug 02 08:17:01 pve-002 CRON[1326621]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Aug 02 08:17:01 pve-002 CRON[1326622]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Aug 02 08:17:01 pve-002 CRON[1326621]: pam_unix(cron:session): session closed for user root
Aug 02 08:42:40 pve-002 pmxcfs[1504]: [dcdb] notice: data verification successful
Aug 02 09:17:01 pve-002 CRON[1335351]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Aug 02 09:17:01 pve-002 CRON[1335352]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Aug 02 09:17:01 pve-002 CRON[1335351]: pam_unix(cron:session): session closed for user root
Aug 02 09:28:57 pve-002 pvedaemon[616446]: <root@pam> successful auth for user 'andyb@stratinfotech.com'
Aug 02 09:30:25 pve-002 pvedaemon[544590]: andyb@stratinfotech.com starting task UPIDve-002:00146800:033C65D1:62E926F1:qmclone:106:andyb@stratinfotech.com:
Aug 02 09:30:27 pve-002 pvedaemon[1337344]: VM 106 qmp command failed - VM 106 not running
Aug 02 09:30:28 pve-002 pvedaemon[1337344]: clone failed: lvcreate 'small-pool1/vm-115-disk-0' error: Failed to activate new LV small-pool1/vm-115-disk-0.
Aug 02 09:30:28 pve-002 pvedaemon[544590]: andyb@stratinfotech.com end task UPIDve-002:00146800:033C65D1:62E926F1:qmclone:106:andyb@stratinfotech.com: clone failed: lvcreate 'small-pool1/vm-115-disk-0' error: Failed to activate new LV small-pool1/vm-115-disk-0.
Aug 02 09:33:24 pve-002 pvedaemon[616446]: andyb@stratinfotech.com starting task UPIDve-002:001469D7:033CABC7:62E927A4:qmclone:106:andyb@stratinfotech.com:
========================================================================================================
As you can see in the example above the faulty or corrupt VM ID is 115.
Possible reasons:
1. My global_filter that I pasted above was just working fine with my PROXMOX cluster 5.x. Is there something missing ?
2. Our automated (Jenkins) system that creates and deletes VMs has not changed (pvesh delete /nodes/{node}/qemu/{vmid}). I know that the new PVE 7.2.x has new switches when it comes to the VM deletion:
"Purge from job configurations"
"Destroy unreferenced disks owned by guest"
Is it necessary to add those checks into the API command line (Jenkins call) while deleting the VM ? is the absense of those switches the root cause ?
My temporary workaround: I am forced to clone the VM manually through the GUI to reserve the faulty VM ID (after severals fails I am able to generate VM manually off the template) and then Jenkins users are able to spin up other VMs with other nextID.
I would appreciate your prompt response.
Regards,
Andy
I do have 3-node cluster with latest up-to-date version of Proxmox VE 7.2-7. This is fresh installation. We used to have PVE5.x
Technical Information:
Storage back-end configuration : SAN (iSCSI+LVM)
lvm.conf global_filter value:
global_filter = [ "r|/dev/zd.*|", "r|/dev/mapper/pve-.*|", "r|/dev/mapper/small--pool.*|", "r|/dev/mapper/medium--pool.*|", "r|/dev/mapper/large--pool.*|", "r|/dev/mapper/template--pool.*|" ]
I am bypassing the scan of the SAN-Based Volumes
I do have OS template from which we clone (Full clone mode) VMs. THis is done automatically with API calls.
For some reason the clone creation fails. Here is the example of the error from the TASK:
"create full clone of drive scsi0 (pve-template-pool-datastore:vm-106-disk-0)
device-mapper: create ioctl on small--pool2-vm--159--disk--0 LVM-bTzHDpropJH4Td1wQwNT5oskPWvcNATTsn1InnBnH1qZv1edqGzlxnj1qMWW0BB3 failed: Device or resource busy
TASK ERROR: clone failed: lvcreate 'small-pool2/vm-159-disk-0' error: Failed to activate new LV small-pool2/vm-159-disk-0."
In this case the corrupt VM iD is 159.
========================================================================================================
Here are the SYSLOGs:
Aug 02 07:55:07 pve-002 pvedaemon[544590]: <root@pam> successful auth for user 'jenkins-prox@stratinfotech.com'
Aug 02 07:55:07 pve-002 pveproxy[1253665]: Clearing outdated entries from certificate cache
Aug 02 07:55:17 pve-002 pvedaemon[616446]: jenkins-prox@stratinfotech.com starting task UPIDve-002:0014317B:0333B024:62E910A5:qmclone:106:jenkins-prox@stratinfotech.com:
Aug 02 07:55:19 pve-002 pvedaemon[1323387]: VM 106 qmp command failed - VM 106 not running
Aug 02 07:55:20 pve-002 pvedaemon[1323387]: clone failed: lvcreate 'small-pool1/vm-115-disk-0' error: Failed to activate new LV small-pool1/vm-115-disk-0.
Aug 02 07:55:20 pve-002 pvedaemon[616446]: jenkins-prox@stratinfotech.com end task UPIDve-002:0014317B:0333B024:62E910A5:qmclone:106:jenkins-prox@stratinfotech.com: clone failed: lvcreate 'small-pool1/vm-115-disk-0' error: Failed to activate new LV small-pool1/vm-115-disk-0.
Aug 02 07:57:25 pve-002 pvedaemon[616446]: <root@pam> successful auth for user 'jenkins-prox@stratinfotech.com'
Aug 02 07:57:28 pve-002 pveproxy[1253666]: Clearing outdated entries from certificate cache
Aug 02 07:57:34 pve-002 pvedaemon[556778]: jenkins-prox@stratinfotech.com starting task UPIDve-002:00143302:0333E5D0:62E9112E:qmclone:106:jenkins-prox@stratinfotech.com:
Aug 02 07:57:36 pve-002 pvedaemon[1323778]: VM 106 qmp command failed - VM 106 not running
Aug 02 07:57:37 pve-002 pvedaemon[1323778]: clone failed: lvcreate 'small-pool1/vm-115-disk-0' error: Failed to activate new LV small-pool1/vm-115-disk-0.
Aug 02 07:57:37 pve-002 pvedaemon[556778]: jenkins-prox@stratinfotech.com end task UPIDve-002:00143302:0333E5D0:62E9112E:qmclone:106:jenkins-prox@stratinfotech.com: clone failed: lvcreate 'small-pool1/vm-115-disk-0' error: Failed to activate new LV small-pool1/vm-115-disk-0.
Aug 02 08:17:01 pve-002 CRON[1326621]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Aug 02 08:17:01 pve-002 CRON[1326622]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Aug 02 08:17:01 pve-002 CRON[1326621]: pam_unix(cron:session): session closed for user root
Aug 02 08:42:40 pve-002 pmxcfs[1504]: [dcdb] notice: data verification successful
Aug 02 09:17:01 pve-002 CRON[1335351]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Aug 02 09:17:01 pve-002 CRON[1335352]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Aug 02 09:17:01 pve-002 CRON[1335351]: pam_unix(cron:session): session closed for user root
Aug 02 09:28:57 pve-002 pvedaemon[616446]: <root@pam> successful auth for user 'andyb@stratinfotech.com'
Aug 02 09:30:25 pve-002 pvedaemon[544590]: andyb@stratinfotech.com starting task UPIDve-002:00146800:033C65D1:62E926F1:qmclone:106:andyb@stratinfotech.com:
Aug 02 09:30:27 pve-002 pvedaemon[1337344]: VM 106 qmp command failed - VM 106 not running
Aug 02 09:30:28 pve-002 pvedaemon[1337344]: clone failed: lvcreate 'small-pool1/vm-115-disk-0' error: Failed to activate new LV small-pool1/vm-115-disk-0.
Aug 02 09:30:28 pve-002 pvedaemon[544590]: andyb@stratinfotech.com end task UPIDve-002:00146800:033C65D1:62E926F1:qmclone:106:andyb@stratinfotech.com: clone failed: lvcreate 'small-pool1/vm-115-disk-0' error: Failed to activate new LV small-pool1/vm-115-disk-0.
Aug 02 09:33:24 pve-002 pvedaemon[616446]: andyb@stratinfotech.com starting task UPIDve-002:001469D7:033CABC7:62E927A4:qmclone:106:andyb@stratinfotech.com:
========================================================================================================
As you can see in the example above the faulty or corrupt VM ID is 115.
Possible reasons:
1. My global_filter that I pasted above was just working fine with my PROXMOX cluster 5.x. Is there something missing ?
2. Our automated (Jenkins) system that creates and deletes VMs has not changed (pvesh delete /nodes/{node}/qemu/{vmid}). I know that the new PVE 7.2.x has new switches when it comes to the VM deletion:
"Purge from job configurations"
"Destroy unreferenced disks owned by guest"
Is it necessary to add those checks into the API command line (Jenkins call) while deleting the VM ? is the absense of those switches the root cause ?
My temporary workaround: I am forced to clone the VM manually through the GUI to reserve the faulty VM ID (after severals fails I am able to generate VM manually off the template) and then Jenkins users are able to spin up other VMs with other nextID.
I would appreciate your prompt response.
Regards,
Andy