Can't join cluster through GUI

altano

Well-Known Member
Apr 6, 2019
45
14
48
40
California, US
alan.norbauer.com
I wasn't able to join my new cluster through the GUI and I couldn't figure out why. I made sure all the prerequisites from the wiki were met (same version of PVE, same system clock, etc), but everytime I tried to join I got a "401 Authentication Failure" message.

Fortunately, it instantly worked from the command line. I SSH'd into the machine I was adding to the cluster and ran

Code:
<HOSTNAME># pvecm add <IP>
Please enter superuser (root) password for '<IP>': ************
Establishing API connection with host '<IP>'
The authenticity of host '<IP>' can't be established.
X509 SHA256 key fingerprint is <FINGERPRINT>.
Are you sure you want to continue connecting (yes/no)? yes
Login succeeded.
Request addition of this node
Join request OK, finishing setup locally
stopping pve-cluster service
backup old database to '/var/lib/pve-cluster/backup/config-1586315242.sql.gz'
waiting for quorum...OK
(re)generate node files
generate new node certificate
merge authorized SSH keys and known hosts
generated new node certificate, restart pveproxy and pvedaemon services
successfully added node '<HOSTNAME>' to cluster.

I'm currently on PVE 6.1-8.
 
Hmm, that's pretty weird as both CLI and API share most of the code, so if one did not worked out the other shouldn't either, especially with such things like "401 Authentication Failure" errors.

Any additional information in the syslog, or /var/log/pveproxy/access.log*?
 
I see this in access.log on the host I created the cluster on:

Code:
root@<CLUSTER-CREATING-NODE>:/var/log# less pveproxy/access.log | grep <JOINING-NODE-IP>
<JOINING-NODE-IP> - - [07/04/2020:17:40:27 -0700] "POST /api2/json/access/ticket HTTP/1.1" 401 13
<JOINING-NODE-IP> - - [07/04/2020:18:00:29 -0700] "POST /api2/json/access/ticket HTTP/1.1" 401 13
<JOINING-NODE-IP> - - [07/04/2020:18:48:13 -0700] "POST /api2/json/access/ticket HTTP/1.1" 401 13
<JOINING-NODE-IP> - - [07/04/2020:20:07:21 -0700] "POST /api2/json/access/ticket HTTP/1.1" 200 685
<JOINING-NODE-IP> - root@pam [07/04/2020:20:07:21 -0700] "POST /api2/json/cluster/config/nodes/<JOINING-NODE-NAME> HTTP/1.1" 200 677

The first three 401s are the failed cluster joins through the GUI. The last one is the successful join through the CLI. I can clearly see corresponding auth failures in auth.log:

Code:
root@<CLUSTER-CREATING-NODE>:/var/log# less auth.log | grep "authentication failure"
Apr 7 17:40:24 <CLUSTER-CREATING-NODE> IPCC.xs[20747]: pam_unix(common-auth:auth): authentication failure; logname= uid=0 euid=0 tty= ruser= rhost= user=root
Apr 7 18:00:26 <CLUSTER-CREATING-NODE> IPCC.xs[93288]: pam_unix(common-auth:auth): authentication failure; logname= uid=0 euid=0 tty= ruser= rhost= user=root
Apr  7 18:48:10 <CLUSTER-CREATING-NODE> IPCC.xs[93288]: pam_unix(common-auth:auth): authentication failure; logname= uid=0 euid=0 tty= ruser= rhost=  user=root

Does the GUI handle the prompt I got and accepted on the CLI?:

Code:
The authenticity of host '<IP>' can't be established.
X509 SHA256 key fingerprint is <FINGERPRINT>.
Are you sure you want to continue connecting (yes/no)? yes
 
Does the GUI handle the prompt I got and accepted on the CLI?:

Yes, sure, you have to supply the fingerprint in the join dialogue (it gets automatically copied if you use the cluster join info).

Apr 7 17:40:24 <CLUSTER-CREATING-NODE> IPCC.xs[20747]: pam_unix(common-auth:auth): authentication failure; logname= uid=0 euid=0 tty= ruser= rhost= user=root

So auth error on pam level, seems more like you had not the correct credentials? The fingerprint is correct for sure, as else you wouldn't even had come that far..
 
Or possible some really weird characters in the password, I mean we should cope with that, but I did not tried out things like a cloud emoji since a bit :)
 
So auth error on pam level, seems more like you had not the correct credentials? The fingerprint is correct for sure, as else you wouldn't even had come that far..
Not sure where I had the chance to have an incorrect credential. I was never prompted for credentials. I copied the cluster info from the machine in the cluster and pasted it into the node that was trying to join.

I just decoded the copied cluster information (JSON.parse(atob('.....'))) and I don't see credentials in there. Did the cluster join fail before I was able to be prompted for credentials?

Or possible some really weird characters in the password, I mean we should cope with that, but I did not tried out things like a cloud emoji since a bit :)

The only special characters in my password are ASCII special characters (shift + number keys) which I never had the opportunity to even enter.

‍♀
 
I was never prompted for credentials.
there is a password field on the cluster-join dialog. you cannot even click join without entering anything there, so what did you enter?
 
there is a password field on the cluster-join dialog. you cannot even click join without entering anything there, so what did you enter?
Oh interesting. I just pasted the info, many fields got filled out, and I clicked "Join". Perhaps it didn't fill out the password field, I didn't notice, and the field wasn't actually required so it let me attempt to join with no password?

I can't check anymore because I'm joined and can't bring that dialog up anymore.
 
Dominik is right, you cannot click on join without filling in the password field, you can check on another node or also the source code:
https://git.proxmox.com/?p=pve-mana...8f9df68553c8b44dd33b8f418b288d4f;hb=HEAD#l316
Password field starts at line 316 and has the "allowBlank" property set to false, which tells the ExtJS framework that it is in fact required: https://docs.sencha.com/extjs/6.0.1/classic/Ext.form.field.Text.html#cfg-allowBlank

Albeit, we may want to see if we can catch the "401 Authentication Failure" which the backend gets from the other node here and clear+focus the password field again, to light a bigger spotlight on it - as in that case it's highly probably the issue.
 
OH I just had an epiphany. I bet LastPass auto-filled the password field with my Proxmox management interface password. I will verify this the next time I setup a new node but that might be a while.

If you'd like to prevent this from happening (because you can't fix people not noticing things), might I suggest you change this input field to have the autocomplete="new-password" attribute which will prevent password managers from filling the password?
 
That seems like a pretty reasonable explanation. :)

If you'd like to prevent this from happening (because you can't fix people not noticing things), might I suggest you change this input field to have the autocomplete="new-password" attribute which will prevent password managers from filling the password?

Hmm, if people use similar root passwords for a set of soon to be cluster nodes they could argue that they want it to get autofilled.
In a Proxmox VE cluster, which is multi-master, all node are on the same trust basis after all, so this would be valid.

But I'll keep it in mind for the case where it isn't easy to do the "clear password field if the joining node gets auth. error", as then an unnoticed autofill should be cleared (and not reset) after the first wrong try. From my experience, lots of GUIs clear select the contents of the password field on authentication error.

Thanks for the suggestion.
 
We're super in the weeds here and I definitely don't think this is important, and I appreciate you indulging me, so if you'd like to keep going:

That sounds like a valid case you might want to support but I don't think that's how (good?) password managers work. LastPass at least treats the current origin as a security boundary to prevent fishing attacks, e.g. I can't autofill the password for google.com while I'm on evil.com.

So let's say my proxmox management interfaces are:
When trying to join the cluster I'd be on https://some-node-NOT-in-cluster:8006 and I'd want to fill the password for some-node-already-in-cluster. If I search for that in LastPass I can pull up the entry, but there is absolutely no "fill out forms with this password" option because the origin doesn't match (and it doesn't want to make phishing too easy). Instead, I get a dropdown that lets me copy the password and manually paste it into the form if I insist. That autofill button only appears for an entry that matches my current origin, e.g. I can only autofill while already on some-node-already-in-cluster.

What I think happened in my case is that it erroneously autofilled the password for some-node-NOT-in-cluster, it would ONLY ever autofill this, and that is wrong. I think.
 
That's not what I meant, I did not want to propose cross site leaking of passwords :)
What I meant is that you have two nodes, both different origins, but you use the same password for both.
So the password manager fills it out here for node A, and due to ones policy to use the same root password for all nodes of the same (future) cluster, it works also even if this is for Node B - so it can be seen as feature by some.

Also, thinking more about the "new-password" hint I think that could still lead to the same confusion with other password managers, as they could prefill it with a new generated password and add that to their password database when submitting the Join form. This would then result in the exact same confusion as you ran into here.

We're super in the weeds here and I definitely don't think this is important, and I appreciate you indulging me, so if you'd like to keep going:
For sure we are, but after all I'm just happy this is some grave bug or the like, but a simple UX thing which could be made friendlier :)
 
OH I just had an epiphany. I bet LastPass auto-filled the password field with my Proxmox management interface password. I will verify this the next time I setup a new node but that might be a while.

If you'd like to prevent this from happening (because you can't fix people not noticing things), might I suggest you change this input field to have the autocomplete="new-password" attribute which will prevent password managers from filling the password?

Thanks! That hint helped a lot.
I also assumed that the password field was decoded from the join string, but actually Lastpass filled in the local machines' root password instead.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!