Proxmox VE API gives HTTP error 500, "user name not set" when PUT qemu config.

tjying95

Member
Mar 11, 2020
6
0
6
30
Hi all,

I have a cluster of 4 Proxmox servers, all running version 6.2-4, a version that I prefer not upgrade unless I necessary have to.
I am creating and removing many QEMU VMs few times a day with up to 20 VMs being concurrently created at a time. I am doing this programmatically through the PVE API.

I occasionally am encountering this error when executing a PUT request on https://PROXMOX_SERVER:8006/api2/json/nodes/PROXMOX_SERVER/qemu/606/config:
Code:
HTTPError: 500 Server Error: user name not set
This happens around 5% of time. I am using username and password to authenticate to the API. The same requests chain is executed every time.

Here are the requests chain I run:
  1. GET /nodes
  2. GET /nodes/PROXMOX_SERVER/qemu
  3. GET /nodes/PROXMOX_SERVER/qemu/100/config/
  4. Continues to query for all the other VMs.
  5. GET /nodes/PROXMOX_SERVER/qemu/1543/status/current/
  6. GET /nodes/PROXMOX_SERVER/qemu/1543/clone <-- Cloning as a linked clone.
  7. Wait for completion.
  8. GET /nodes/PROXMOX_SERVER/qemu/606/config/
  9. PUT /nodes/PROXMOX_SERVER/qemu/606/config <-- This sets the config description to empty.
  10. GET /nodes/PROXMOX_SERVER/qemu/606/config/
  11. PUT /nodes/PROXMOX_SERVER/qemu/606/config <-- This sets the config description to something.
The bolded line is where the error comes out.

I tried searching around, but doesn't seem like anyone been encountering this before.
Is this a known issue? Anything I can do to solve this?

code_language.shell:
~# pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.34-1-pve)
pve-manager: 6.2-4 (running version: 6.2-4/9824574a)
pve-kernel-5.4: 6.2-1
pve-kernel-helper: 6.2-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.3
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-2
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve2
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-1
pve-cluster: 6.1-8
pve-container: 3.1-5
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-2
pve-qemu-kvm: 5.0.0-2
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1
 
is there anything in the logs ("journalctl --since XXX --until XXX" surrounding the time of the last request? how many requests are happening roughly in that timeframe (check /var/log/pveproxy/access.log)?
 
Doesn't seem to have anything useful in the logs.
Please ignore the qmp command 'guest-ping' failed - got timeout. Those are expected as I am using it to check if the VM is up.

There are quite some number of requests during that timeframe.
 

Attachments

the amount of requests doesn't look too bad, unless this a very overloaded system in general.. I would still recommend upgrading to 7.x, since 6.x is EOL and no longer supported in any fashion..
 
the amount of requests doesn't look too bad, unless this a very overloaded system in general.. I would still recommend upgrading to 7.x, since 6.x is EOL and no longer supported in any fashion..
I would prefer not to upgrade at the moment. We would need to do a bunch of validations to make sure our tools work with the newer versions etc. :(

Do you know in which component this error message "user name not set" is produced? Maybe I can try to trace it from there.
 
you can grep for it in /usr/share/perl5/PVE ;)
 
Thanks. I tried looking around the source, it doesn't seem to lead to anything that might cause this.

Just wondering, is there is any code path difference if I used PUT vs POST for /nodes/PROXMOX_SERVER/qemu/606/config? My understanding is that it shouldn't affect anything much, other than long config setting being set as a task and returned early. Also if the config is locked, will using POST help? Or it will just error out and complain config is locked as usual?

I am using a workaround to retry if it encounter this error. Not the best solution, but I think it should work.
 
the only difference between PUT or POST on the config call is as you say - whether the handling happens sync (by the API server worker itself, with a 30s timeout) or async (in a newly forked task worker, without a timeout, with the return value of the API call being the task UPID for further queries). locking and all the other things are identical.