[Bug] A bug about the API to resize disk in version 7.3-3

keepthebeats · Feb 27, 2023

Version:

According to the reply from @fabian, I added a step 2 to poll the task status of the clone, but after tests, the problem is the same as before.

My operations that trigger the bug:
1. I use this API to create a VM "POST /api2/json/nodes/{node}/qemu/{vmid}/clone".

2. I poll the API "GET /api2/json/nodes/{node}/tasks/{upid}/status" to wait until the clone task status is "stopped" and the exitstatus is "OK".

3. I poll this API to wait for the VM config to be unlocked "GET /api2/json/nodes/{node}/qemu/{vmid}/status/current".

4. I use this API to set the CPU cores and Ram size of the VM "PUT /api2/json/nodes/{node}/qemu/{vmid}/config".

5. I use this API to get the disk name of this VM "GET /api2/json/nodes/{node}/qemu/{vmid}/config".

6. I call this API with retry to resize the disk of this VM to an absolute value "PUT /api2/json/nodes/{node}/qemu/{vmid}/resize". It seems that after a VM is created by clone, its disk cannot be resized in a short time (about 1 - 2 minutes), so I resize it with retry in this step. In this step, the first try of resizing failed, and then a following try succeeded. The bug is: In this condition, although inside the VM we can see the disk is resized, Proxmox still shows the old size instead of the resized value in the API "GET /api2/json/nodes/{node}/qemu/{vmid}/status/current".

I think the key steps to trigger the bug are step 1, step 2 and step 6, but to be cautious, I listed all of my operations here.

Now I have found a method to avoid this problem with this operation:
If I need to resize the disk to 150G, I firstly set the absolute value "149G" using the API "PUT /api2/json/nodes/{node}/qemu/{vmid}/resize". Then I set the relative value "+1G" using the API "PUT /api2/json/nodes/{node}/qemu/{vmid}/resize".

Thanks to the reply from @fabian, I think this may be the best method to escape from the problem:
After cloning a VM and resizing the VM disk, SSH to the Proxmox server to execute the command "qm rescan --vmid <vmid>".

This post is just to share information about this bug.

fabian · Feb 27, 2023

you need to provide a bit more information - e.g., what exactly is the error you get when you attempt step 5? is the clone actually finished at this point?

the proper way to wait for the clone to be done is to poll the task status (the clone POST request should return the task identifier, called a UPID, which can then be passed to nodes/{node}/tasks/{upid} to get the status), steps 2-5 should then only happen after the clone task is actually finished..

keepthebeats · Feb 27, 2023

fabian said:
you need to provide a bit more information - e.g., what exactly is the error you get when you attempt step 5? is the clone actually finished at this point?

the proper way to wait for the clone to be done is to poll the task status (the clone POST request should return the task identifier, called a UPID, which can then be passed to nodes/{node}/tasks/{upid} to get the status), steps 2-5 should then only happen after the clone task is actually finished..

Thank you, I will try with the UPID.
There is no error in the attempts of step 5, the response is completely empty.

keepthebeats · Feb 27, 2023

fabian said:
you need to provide a bit more information - e.g., what exactly is the error you get when you attempt step 5? is the clone actually finished at this point?

the proper way to wait for the clone to be done is to poll the task status (the clone POST request should return the task identifier, called a UPID, which can then be passed to nodes/{node}/tasks/{upid} to get the status), steps 2-5 should then only happen after the clone task is actually finished..

I added a step to check the task status before other steps, by doing:
poll the API "GET /api2/json/nodes/{node}/tasks/{upid}/status" to wait until the task status is "stopped" and the exitstatus is "OK".

But the problem does not change.

But I still want to thank you for telling me the right way to do it.

fabian · Feb 27, 2023

you need to provide more information, if a request fails it should return some sort of error. how are you accessing the API? is there anything at all visible in the logs of the PVE system?

keepthebeats · Feb 27, 2023

fabian said:
you need to provide more information, if a request fails it should return some sort of error. how are you accessing the API? is there anything at all visible in the logs of the PVE system?

I use go language to send the request equivalent to "curl -X PUT -k -i -H 'Authorization: PVEAPIToken=XXXXXXXXXXXXXXXXXX' -H "Content-Type: application/json" -d '{ "disk": "scsi0", "size": "150G"}' https://192.168.160.33:8006/api2/json/nodes/XXXX/qemu/102/resize"

Previously, I did not print the HTTP status, so I did not find any error, but I added this print just now, and I find:

(1) After the clone task is finished, in a short time (about 1 minute), if I send this resize request, the response HTTP Status is 596 Connection timed out, HTTP Status Code is 596, the response body is nothing. This is the abnormal response.
(2) After I wait for enough time (normally more than 1 minute), if I send this resize request, the response HTTP Status is 200 OK, HTTP Status Code is 200, the response body is {"data":null}. This is the normal response.

I sent the request two times at 15:04:47 and 15:05:23. The 15:04:47 request has the response in (1), and the 15:05:23 request has the response in (2).
This is the /var/log/syslog on the Proxmox.

After I sent the 2 requests (15:04:47 and 15:05:23), inside the VM I can see the disk is resized, but Proxmox still shows the old size instead of the resized value in the API "GET /api2/json/nodes/{node}/qemu/{vmid}/status/current".

fabian · Feb 27, 2023

the size in the config is only informational, a "rescan" should fix it. the queston is why the resize either takes so long or overloads the system that you run into a timeout..

keepthebeats · Feb 27, 2023

fabian said:
the size in the config is only informational, a "rescan" should fix it. the queston is why the resize either takes so long or overloads the system that you run into a timeout..

I did not find an API for the "rescan", so I choose to SSH to the Proxmox server to execute the command "qm rescan --vmid <vmid>" after I resize a VM disk. I think this is probably the best method to escape from the problem. Thank you for the information.

Search

Search

[Bug] A bug about the API to resize disk in version 7.3-3

keepthebeats

New Member

fabian

Proxmox Staff Member

keepthebeats

New Member

keepthebeats

New Member

fabian

Proxmox Staff Member

keepthebeats

New Member

fabian

Proxmox Staff Member

keepthebeats

New Member

We value your privacy