[SOLVED] Proxmox 7+ unable to parse worker upid

dmitrynovice · Feb 15, 2023

Hello.

I am testing my script which uses Proxmox API (7.3-3) for creation/removing network interfaces (or other things in vm config).

While testing it was common situation to have 20+ network interfaces on vm.

So I wrote something like:

JavaScript:

const hv = require('../hv')

//177 is vmid

for (let i = 1; i <= 28; i++) {
   hv.deleteFromConfig(177, 'net' + i)
}

It asynchronously makes request for vm config change (POST) and get UPID for this task. Then it checks UPID status after some random time (~every 2 sec) and waiting for task completion.

This code quite often fails (but not always) with following response from Proxmox Api:

JSON:

{ data: null, errors: { upid: 'unable to parse worker upid' } }

The question is: do I hit some kind of "task worker limit" or what can be reason for that?

I understand that this "config change" may be performed in one request like "-delete net1,net2,netN" but I want understand what is going on better.

dcsapak · Feb 15, 2023

can you post the UPID in question when it fails?

dmitrynovice · Feb 15, 2023

So okay. Seems like I found what it was. UPID parsing working just fine.

Root cause is that sometimes Proxmox answer with "'{"data":null}'" for config change request (POST). So my script can not find UPID.

Example of request and response from logs of my script:

Code:

request was: curl -X POST -k -d "delete=net2" https://192.168.1.222:8006/api2/json/nodes/ilja/qemu/177/config -H 'Authorization: PVEAPIToken=root@pam!api_token=9d6824c0-a1c9-453e-8524-9ce179120c3e'
response was: {
  stdout: '{"data":"UPID:ilja:000038A3:00072945:63ECD7E2:qmconfig:177:root@pam!api_token:"}',
  stderr: '  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n' +
    '                                 Dload  Upload   Total   Spent    Left  Speed\n' +
    '\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\r100    91  100    80  100    11   1159    159 --:--:-- --:--:-- --:--:--  1318\n'
}
request was: curl -X POST -k -d "delete=net4" https://192.168.1.222:8006/api2/json/nodes/ilja/qemu/177/config -H 'Authorization: PVEAPIToken=root@pam!api_token=9d6824c0-a1c9-453e-8524-9ce179120c3e'
response was: {
  stdout: '{"data":"UPID:ilja:000038A6:00072949:63ECD7E2:qmconfig:177:root@pam!api_token:"}',
  stderr: '  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n' +
    '                                 Dload  Upload   Total   Spent    Left  Speed\n' +
    '\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\r100    91  100    80  100    11   1000    137 --:--:-- --:--:-- --:--:--  1137\n'
}
request was: curl -X POST -k -d "delete=net7" https://192.168.1.222:8006/api2/json/nodes/ilja/qemu/177/config -H 'Authorization: PVEAPIToken=root@pam!api_token=9d6824c0-a1c9-453e-8524-9ce179120c3e'
response was: {
  stdout: '{"data":null}',
  stderr: '  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n' +
    '                                 Dload  Upload   Total   Spent    Left  Speed\n' +
    '\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\r100    24  100    13  100    11    164    139 --:--:-- --:--:-- --:--:--   303\n'
}

Can you please tell am I hitting some kind of tasks limit or it's because of "kind of temporary vm config lock" or something else?

I tried to run "test" for several times...so it fails randomly...I am not found any rule for that.

dcsapak · Feb 15, 2023

can you also show the response headers? there is not any built in limit for tasks, but it seems like you run into an error or something like that (which should be shown...) or did you maybe use the parameter 'background_delay' ? (in that case the api call returns null if the worker is finished with the specified time (it's mostly a parameter used in the gui)

dmitrynovice · Feb 15, 2023

dcsapak said:
can you also show the response headers? there is not any built in limit for tasks, but it seems like you run into an error or something like that (which should be shown...) or did you maybe use the parameter 'background_delay' ? (in that case the api call returns null if the worker is finished with the specified time (it's mostly a parameter used in the gui)

Done.

HTTP/1.1 500 close (rename) atomic file '/var/log/pve/tasks/active' failed: No such file or directory

Code:

request was: curl -i -X POST -k -d "delete=net14" https://192.168.1.222:8006/api2/json/nodes/ilja/qemu/177/config -H 'Authorization: PVEAPIToken=root@pam!api_token=9d6824c0-a1c9-453e-8524-9ce179120c3e' response was: {
  stdout: "HTTP/1.1 500 close (rename) atomic file '/var/log/pve/tasks/active' failed: No such file or directory\r\n" +
    'Cache-Control: max-age=0\r\n' +
    'Connection: close\r\n' +
    'Date: Wed, 15 Feb 2023 14:37:39 GMT\r\n' +
    'Pragma: no-cache\r\n' +
    'Server: pve-api-daemon/3.0\r\n' +
    'Content-Length: 13\r\n' +
    'Content-Type: application/json;charset=UTF-8\r\n' +
    'Expires: Wed, 15 Feb 2023 14:37:39 GMT\r\n' +
    '\r\n' +
    '{"data":null}',
  stderr: '  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n' +
    '                                 Dload  Upload   Total   Spent    Left  Speed\n' +
    '\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\r100    25  100    13  100    12    101     93 --:--:-- --:--:-- --:--:--   195\n'
}

dcsapak · Feb 17, 2023

hi, i can reproduce it, and i am not sure why exactly it happens, i'm still investigating

for now, i'd rewrite your client to retry requests when they receive an error (that can happen anyway, regardless of this behaviour)

dmitrynovice · Feb 17, 2023

dcsapak said:
hi, i can reproduce it, and i am not sure why exactly it happens, i'm still investigating

for now, i'd rewrite your client to retry requests when they receive an error (that can happen anyway, regardless of this behaviour)

Thanks a lot! Main goal was to get attention of developers or QA team. Currently I am in "progress of development" so this problem not urgent for me. Project is quite ambitious so I want to be sure how Proxmox Api will work under load. I was also testing "GET vm config" and found that "under load" it sometimes answer with empty response. If I find something interesting I will let you know. Proxmox is already great product but soon will become even better. Thanks again.

dcsapak · Feb 20, 2023

ok, found the issue (was not very easy to track down) and sent an rfc patch: https://lists.proxmox.com/pipermail/pve-devel/2023-February/055822.html

edit:

Thanks a lot! Main goal was to get attention of developers or QA team. Currently I am in "progress of development" so this problem not urgent for me. Project is quite ambitious so I want to be sure how Proxmox Api will work under load. I was also testing "GET vm config" and found that "under load" it sometimes answer with empty response. If I find something interesting I will let you know. Proxmox is already great product but soon will become even better. Thanks again.

btw, you should always check the http response code and error message, any api call can fail under certain circumstances (depending on what you're doing you might want to retry or return the error to a user)

dmitrynovice · Feb 20, 2023

dcsapak said:
ok, found the issue (was not very easy to track down) and sent an rfc patch: https://lists.proxmox.com/pipermail/pve-devel/2023-February/055822.html

edit:

btw, you should always check the http response code and error message, any api call can fail under certain circumstances (depending on what you're doing you might want to retry or return the error to a user)

Good news that you found what causing this. Thank you.
I also completely agree with your recomendation about more reliable code.

But while I am developing first "prototype" I am not using complex checks with re-sending requests nor even "try-catch".
I am not considering "border cases" or any kind of failures and not producing any extreme high load.
Because most of the time in normal conditions everything should just work smoothly.

Only this approach give me hope that "prototype" may be finished within a reasonable period of time.
And this is also useful for finding "strange behavior or bugs" as we see. So win-win situation.

Search

Search

[SOLVED] Proxmox 7+ unable to parse worker upid

dmitrynovice

Active Member

dcsapak

Proxmox Staff Member

dmitrynovice

Active Member

dcsapak

Proxmox Staff Member

dmitrynovice

Active Member

dcsapak

Proxmox Staff Member

dmitrynovice

Active Member

dcsapak

Proxmox Staff Member

dmitrynovice

Active Member