HTTP 596 error when trying to use the API

Benjamin Hodgens

Active Member
Mar 1, 2016
15
1
43
112
I'm running proxmox 6, fully up to date. (pve-manager/6.4-13/9f411e79 (running kernel: 5.4.157-1-pve)

When trying to use the API to deploy VMs using terraform using an API key, I get this error from Terraform:
Error: error creating LXC container: 596 tls_process_server_certificate: certificate verify failed, error status: (params: {"arch":"amd64","cmode":"tty","console":true,"cores":4,"cpulimit":0,"cpuunits":1024,"features":"","hostname":"tf-hs-1","memory":4098,"net0":"bridge=vmbr0,name=eth0,ip6=manual,gw=10.9.8.1,ip=10.9.8.201/24","onboot":false,"ostemplate":"local:packages/centos-7-default_20160205_amd64.tar.xz","password":"password123","protection":false,"rootfs":"valhalla-vms:8","start":true,"storage":"local","swap":512,"tags":"","tty":2,"unique":true,"unprivileged":true,"vmid":143}) │ │ with proxmox_lxc.cluster[0], │ on main.tf line 23, in resource "proxmox_lxc" "cluster": │ 23: resource "proxmox_lxc" "cluster" { │

Verbose logs don't provide any more clarification over "HTTP/1.1 596 tls_process_server_certificate: certificate verify failed". I get this error from proxmox regardless of whether I'm doing certificate verification as well as regardless of what kind of certificate is in use.

Checking the pveproxy/access.log I see this associated with the request:

::ffff:10.9.8.20 - terraform@pam!terraform [29/12/2021:21:55:35 -0700] "GET /api2/json/cluster/nextid HTTP/1.1" 200 14 ::ffff:10.9.8.20 - terraform@pam!terraform [29/12/2021:21:55:35 -0700] "POST /api2/json/nodes/bastion.my.domain.net/lxc HTTP/1.1" 596 -


Correlated syslog indicates:

Dec 29 21:55:35 bastion pveproxy[3576]: '/etc/pve/nodes/bastion.my.domain.net/pve-ssl.pem' does not exist!#012 Dec 29 21:55:35 bastion pveproxy[3576]: Could not verify remote node certificate 'E4:8A:15:16:B4:15:62:F6:6C:CC:DF:43:2E:6E:9F:E5:11:D6:9F:F3:37:50:B1:F5:17:9A:B1:A3:07:CB:06:36' with list of pinned certificates, refreshing cache

The above directory doesn't exist. However, the short DNS name does:

# ls /etc/pve/nodes/ bastion kismet hercules mora

Thinking I'm clever, I attempted a symlink to see if that'd fix it, but due to it being a fuse filesystem it fails:

$ sudo ln -s bastion.my.domain.net bastion ln: failed to create symbolic link 'bastion/bastion.my.domain.net': Function not implemented

I can't seem to find in the UI how or where I can adjust whatever is needed to be adjusted to remedy this problem.
 
  • Like
Reactions: nulluserid
I created a `bastion.my.domain.net` directory and copied the pve-ssl.pem file into it.

However, while the complaint about the pve-ssl.pem file not existing went away, the following log persisted (as does the HTTP 596 response code) -
Dec 29 21:55:35 bastion pveproxy[3576]: Could not verify remote node certificate 'E4:8A:15:16:B4:15:62:F6:6C:CC:DF:43:2E:6E:9F:E5:11:D6:9F:F3:37:50:B1:F5:17:9A:B1:A3:07:CB:06:36' with list of pinned certificates, refreshing cache
 
I also ran:

pvecm updatecerts -f

This, however, didn't help: the above error about remote node certificate verification stopped printing to logs, but requests still results in a 596 response.
 
Sorry - yes, this error (certificate error) is a misdirection; it's not the underlying cause (or wasn't in my case).

I ended up adding a Role and assigning these permissions. This fixed it for me.

1642628041673.png
 
no - the node parameter in the path should contain the short name, not the FQDN..
 
@fabian - if you're referring to the symlink escapade, that had nothing to do with the underlying problem/was a misdirection. (And, if the short path name should be used, it's curious that there would be that error message in the logs while using it, yes?)

The bug I was referring to was the 596 error. I couldn't find where that error code was documented specifically for Proxmox, but my understanding of other HTTP REST APIs suggests that 596 is both the wrong code for account/role permission issues, as well as the wrong class of errors. I can't think of a single valid reason why proxmox might return a 5xx certificate error in this case - that just misdirects any and all diagnosis to a, arguably, much more complicated domain of diagnosis (SSL certificates).

HTTP 403, 405, 406, 409, 412 - hell, even 418 "I'm a teapot" would be both more accurate and more helpful than any 5xx class error. Massive redirection.
 
you connect to one node, and tell it to proxy the request to another node - but use a nodename as target that doesn't exist (by accidently using a wrong variant of an existing nodename, but that is hard for a computer to determine ;)). that means the proxied connection fails to find (and validate) the certificate (logged on the server side) and the HTTP lib gives us a 'can't find service' (which actually means -> can't find the node you requested) which we return to the client. whether you had some permission issue on top of that is hard to tell - but permission checks return 401/403 normally (and in rare cases, 500), not 596.

you don't need to create any extra dirs and copy certificate files around, you just need to make the request with a correct node name (which is always just the hostname, not the FQDN).
 
  • Like
Reactions: MisterGrand
@fabian I think you may have misread this thread or have otherwise understood the problem.

The only time I reference the host by it's full DNS name was when I connected to it.

The symlink/directory copy was merely a diagnostic step to see if it changed the outcome at all. It did not, and was reverted. The log continues to propagate with each connection attempt, even after I've remedied the problem.

I was always making a correctly formatted request, it was simply that proxmox was returning the 596 for a permission problem. When I changed the role permissions and assigned it to the user, it worked.

5xx errors = server error class
4xx errors = permission problems

Proxmox returns 596 (an invalid/non-RFC HTTP response) when it should be returning a 4xx error, because the failure was due to permissions, not a certificate.
 
no, you are missing the point :)

this request ::ffff:10.9.8.20 - terraform@pam!terraform [29/12/2021:21:55:35 -0700] "POST /api2/json/nodes/bastion.my.domain.net/lxc HTTP/1.1" 596 - is malformed (the 'node' path parameter has the wrong format, so the request cannot be proxied, which is a server error - in this case it gets mapped to 596 "service not found" because the node you connect to fails to find the other node you request - it could just as well be 500, but that would be less specific). if you do POST /api2/json/nodes/bastion/lxc HTTP/1.1 it should work (or return some other status code) - if that still gives you the same error, something is wrong with hostname resolution on your node most likely. I have no idea what you did to remedy the issue, and possibly you managed to find a workaround that doesn't involve making the request proper, but that might have other side-effects as well.

missing permissions will give you 403, or rarely 400 if error handling doesn't differentiate between wrong combination of parameters and required permissions. you'll get 401 if you are not logged in, and 500 for generic errors / failed actions / .. there is no way missing permissions only cause a 596 - it is possible for something causing a 596 to be combined with missing permissions though, in which case you'd need to fix both for obvious reasons ;)
 
Thank you. I understand your point, you just didn't understand the conditions sufficiently to comment.

I don't know what to tell you to help you understand. I did not change how I referred to the hosts, I was always referring to the hosts by their correct full DNS name, not the short name. I have no changed how that works.

variable "pvehost" { default = "bastion.my.domain.net" }
variable "pvedest" { default = "bastion"}

(pvhost being where connections are made, and pvdest being which host in the cluster gets the instances specified - so clearly precisely as you say it should be defined, for pvedest).

I'm telling you that I was getting 596 responses, then changed exactly thing - adding a role and assigning it to the user - and that remedied the problem.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!