API returns 500 but no logs

Don Daniello

Active Member
Jan 28, 2012
60
7
28
Hello,

I seem to have an unusual problem which happened shortly after upgrading to PVE 7.

I have an application which creates LXC containers on a PVE cluster over API. Recently, it started getting 500 status code in response to POST /api2/json/nodes/node1/lxc. The response body does not have any content, so no error message is sent. The authentication call to node1 works, but LXC creation does not.

Using the same input parameters exactly (same ID, network settings, image, limits, etc) over Web UI causes no problem, the container is created.

I doubt it is a problem with the application because there are more nodes in the cluster and LXC creation works on other nodes without any issue, so I also don't think PVE 7 changed any API that would cause this.

What I specifically have an issue with is that pveproxy logs the 500 response in access log but that's really the only trace of that request. I cannot find a way to make pvedaemon log what it does not like about that request, or really anything at all. Running pvedeamon in debug mode/foreground also does not generate any logs related to that.

Any pointers on what I can do to further troubleshoot?
 
hi,

that sounds weird, could you post and check if pveversion -v output matches for all the nodes you have?
Recently, it started getting 500 status code in response to POST /api2/json/nodes/node1/lxc
how is the request being sent from your app?

Any pointers on what I can do to further troubleshoot?
you could check for any errors in journalctl when the request is made
 
The request is sent from a nodejs app via HTTPS API remotely.

There is nothing in journal related to that request. However, I am certain it is reaching that server since pveproxy on it shows the request in the access log.

The versions are all the same, all latest:
proxmox-ve: 7.0-2 (running kernel: 5.11.22-1-pve)
pve-manager: 7.0-10 (running version: 7.0-10/d2f465d3)
pve-kernel-5.11: 7.0-6
pve-kernel-helper: 7.0-6
pve-kernel-5.4: 6.4-4
pve-kernel-5.11.22-3-pve: 5.11.22-5
pve-kernel-5.11.22-2-pve: 5.11.22-4
pve-kernel-5.11.22-1-pve: 5.11.22-2
pve-kernel-5.4.124-1-pve: 5.4.124-1
pve-kernel-4.15: 5.4-19
pve-kernel-4.15.18-30-pve: 4.15.18-58
ceph-fuse: 14.2.21-1
corosync: 3.1.2-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.21-pve1
libproxmox-acme-perl: 1.2.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-5
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-2
libpve-storage-perl: 7.0-9
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.7-1
proxmox-backup-file-restore: 2.0.7-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-5
pve-cluster: 7.0-3
pve-container: 4.0-8
pve-docs: 7.0-5
pve-edk2-firmware: 3.20200531-1
pve-firewall: 4.2-2
pve-firmware: 3.2-4
pve-ha-manager: 3.3-1
pve-i18n: 2.4-1
pve-qemu-kvm: 6.0.0-2
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-11
smartmontools: 7.2-pve2
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1
 
have you done reboots across all nodes after finishing the upgrades? after making a kernel upgrade you should reboot the node.

There is nothing in journal related to that request. However, I am certain it is reaching that server since pveproxy on it shows the request in the access log.

you can apply the following diff for pve-container:
Code:
diff --git a/src/PVE/API2/LXC.pm b/src/PVE/API2/LXC.pm
index b929481..ea93eb4 100644
--- a/src/PVE/API2/LXC.pm
+++ b/src/PVE/API2/LXC.pm
@@ -192,6 +192,10 @@ __PACKAGE__->register_method({
 
        PVE::Cluster::check_cfs_quorum();
 
+       use Data::Dumper;
+       use Sys::Syslog;
+       syslog('info', Dumper($param));
+
        my $rpcenv = PVE::RPCEnvironment::get();
        my $authuser = $rpcenv->get_user();

you can add these 3 lines to /usr/share/perl5/PVE/API2/LXC.pm directly and restart the service pvedaemon, or you can pull our git repo, apply the patch and build a deb file to install on that node.

after that on a different terminal run journalctl -f and run your application to create the container. you should get an output like this:
Code:
Jul 28 11:23:16 pla7ground pct[3437]: $VAR1 = {
                                                'unprivileged' => '1',
                                                'node' => 'pla7ground',
                                                'ostemplate' => 'local:vztmpl/debian-10-standard_10.7-1_amd64.tar.gz',
                                                'net0' => 'name=eth0,ip=dhcp,firewall=1,bridge=vmbr0',
                                                'storage' => 'local-lvm',
                                                'vmid' => '12345'
                                              };
Jul 28 11:23:16 pla7ground pct[3437]: <root@pam> starting task UPID:pla7ground:00000D6F:00013B91:61012204:vzcreate:12345:root@pam:
Jul 28 11:23:18 pla7ground kernel: EXT4-fs (dm-12): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
Jul 28 11:23:40 pla7ground systemd[1]: var-lib-lxc-12345-rootfs.mount: Succeeded.
Jul 28 11:23:40 pla7ground systemd[3011]: var-lib-lxc-12345-rootfs.mount: Succeeded.
Jul 28 11:23:43 pla7ground pct[3437]: <root@pam> end task UPID:pla7ground:00000D6F:00013B91:61012204:vzcreate:12345:root@pam: OK

please post that here
 
Yes, nodes were all rebooted after upgrade to PVE7. Not all nodes were rebooted after latest minor kernel update, I doubt that matters.

I've also tried rebooting the node in question, which did not help.

What's new is that prior to installing latest set of updates yesterday and prior to rebooting it, a second node which actually worked (LXC container was created correctly over API) also stopped working, just like the first node in the cluster that has done that for over a week. Installing latest updates (within PVE7) and rebooted did not change it in any way.

Here's a log for pvedaemon. There's nothing else in journal relevant to this process, the pvedaemon doesn't crash either.

Jul 28 12:05:07 otb systemd[1]: Starting PVE API Daemon...
Jul 28 12:05:08 otb pvedaemon[928878]: starting server
Jul 28 12:05:08 otb pvedaemon[928878]: starting 3 worker(s)
Jul 28 12:05:08 otb pvedaemon[928878]: worker 928879 started
Jul 28 12:05:08 otb pvedaemon[928878]: worker 928880 started
Jul 28 12:05:08 otb pvedaemon[928878]: worker 928881 started
Jul 28 12:05:08 otb systemd[1]: Started PVE API Daemon.
Jul 28 12:05:23 otb pvedaemon[928881]: <root@pam> successful auth for user 'apiuser@pve'
Jul 28 12:05:24 otb pvedaemon[928879]: $VAR1 = {
'ostemplate' => 'local:vztmpl/ubuntu-20.04-custom.tar.gz',
'password' => 'oSNy4TWevCxHkbEQK',
'swap' => '0',
'description' => '7W68uma5Zxhe6JkLA',
'cpulimit' => '2',
'unprivileged' => '1',
'cpuunits' => '4000',
'storage' => 'local-zfs',
'node' => 'otb',
'hostname' => '167-X-X-X',
'onboot' => '1',
'memory' => '4096',
'nameserver' => '213.186.33.99 8.8.8.8 1.1.1.1',
'cores' => '4',
'rootfs' => '5',
'vmid' => '123832',
'net0' => 'name=eth0,bridge=vmbr0,type=veth,ip=167.X.X.X/32,gw=192.X.X.254,rate=10,firewall=1,hwaddr=02:00:00:X:X:X'
};
 
@oguz I've made a few important discoveries.

First, I realized that PVE API returns error message in the "status line" of HTTP response. That was my mistake to not realize it earlier and to be honest, I've never seen an API server actually do that before. Therefore, I kept being confused by empty response body, not realizing the error message was in the status line instead of the body.

Secondly, for the request parameters as shown above, the error message was:
Code:
Only root can pass arbitrary filesystem paths. at /usr/share/perl5/PVE/Storage.pm line 492.

That's an interesting error. It explains why I couldn't reproduce it from the UI interface or my curl-like requests. I've been using the root@pam account, where this error couldn't quite come up. In the params above, you can see I'm using apiuser@pve, which is obviously not root. However, I am not passing arbitrary filesystem paths.

I looked at the check_volume_access function and double-checked permissions on apiuser@pve. It has the default Administrator role which includes all permissions propagated throughout everything. I found two invocations of check_volume_access that could stand in the way. Interestingly, it's the one in /usr/share/perl5/PVE/LXC.pm and not the one in /usr/share/perl5/PVE/API2/LXC.pm. Commenting it out avoids the issue and container is created correctly.

I believe there is a bug with permission check, or at least that something has changed across versions. As you can see, I don't seem to be using a mountpoint/filesystem path that should be inaccessible to the apiuser@pve, especially that it has Administrator permission and therefore should be able to access all storage. I haven't thoroughly traced where the mixup could be yet.
 
However, I am not passing arbitrary filesystem paths.
can you post the exact parameters you give?
because from the log it looks like you send:

{
'ostemplate' => 'local:vztmpl/ubuntu-20.04-custom.tar.gz',
'password' => 'oSNy4TWevCxHkbEQK',
'swap' => '0',
'description' => '7W68uma5Zxhe6JkLA',
'cpulimit' => '2',
'unprivileged' => '1',
'cpuunits' => '4000',
'storage' => 'local-zfs',
'node' => 'otb',
'hostname' => '167-X-X-X',
'onboot' => '1',
'memory' => '4096',
'nameserver' => '213.186.33.99 8.8.8.8 1.1.1.1',
'cores' => '4',
'rootfs' => '5',
'vmid' => '123832',
'net0' => 'name=eth0,bridge=vmbr0,type=veth,ip=167.X.X.X/32,gw=192.X.X.254,rate=10,firewall=1,hwaddr=02:00:00:X:X:X'
};

which contains: 'rootfs: 5' which is not a valid volid, thus it gets interpreted as a arbitrary filesystem path, my guess is that you wanted to give something like: 'rootfs: local:5' which would allocate 5GiB on the storage local?
 
@dcsapak So that looks like the crux of it. Passing rootfs: local-zfs:5 seems to be doing the trick.

The interesting part is that apparently I've been using it wrong for 7+ years but it worked all the way until now as non-root. I have to say, the documentation for rootfs parameter (https://pve.proxmox.com/pve-docs/api-viewer/#/nodes/{node}/lxc) does not unfortunately explain that. I just assumed that providing a separate parameter storage: local-zfs was enough to tell it which storage pool to use.

I appreciate your guidance in helping me troubleshoot this. I do believe there is some breaking change on server side here, but I guess that depends on your definition of what's the right behavior anyway.
 
Hi there, I stumbled upon the same issue as Don w.r.t to the error message (in my case, I could see it by debugging it via curl --verbose).

W.r.t the documentation of the API, would you accept patches against it? I also think having more information about the rootfs argument would help here, w/ maybe a mention about the root user

Thanks!
 
just wanted to add that I got into this problem with the ansible module community.general.proxmox

so, instead of:
YAML:
- name: create new container
  community.general.proxmox:
    storage: local
    disk: 5

```

the correct definition would be:
YAML:
- name: create new container
  community.general.proxmox:
    disk: local:5
 
just wanted to add that I got into this problem with the ansible module community.general.proxmox

so, instead of:
YAML:
- name: create new container
  community.general.proxmox:
    storage: local
    disk: 5

```

the correct definition would be:
YAML:
- name: create new container
  community.general.proxmox:
    disk: local:5
This solved my issue. I spent about 2 days working on this and THIS solved it! Please ensure this thread is marked as solved for the search engines. Thank you!!

PS- Note that "local" can be the name of any one of your storage pools. ie "zfspool"
 
tasks:
- name: Create LXC Containers with 512mb 4gb of disk 2 vCPUs
community.general.proxmox:
api_user: "{{ proxmox_api_user }}"
api_token_id: "full"
api_token_secret: "{{ proxmox_api_token_secret }}"
api_host: "{{ proxmox_api_host }}"
ostemplate: "local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst"
node: phoenix
hostname: garbage
disk: local-lvm:5

This was driving me crazy, thanks so much @nxet, seriously, I registered just to say thanks. Was going crazy. Wife was asking if I'd ever quit labbing.
 
  • Like
Reactions: rialn
tasks:
- name: Create LXC Containers with 512mb 4gb of disk 2 vCPUs
community.general.proxmox:
api_user: "{{ proxmox_api_user }}"
api_token_id: "full"
api_token_secret: "{{ proxmox_api_token_secret }}"
api_host: "{{ proxmox_api_host }}"
ostemplate: "local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst"
node: phoenix
hostname: garbage
disk: local-lvm:5

This was driving me crazy, thanks so much @nxet, seriously, I registered just to say thanks. Was going crazy. Wife was asking if I'd ever quit labbing.
Thank you so much! For me it was "disk: local-lvm:5" field.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!