Adding new nodes fails gui shows connection refused

C

Chris Rivera

Guest
The last 3 nodes we put up have given us issues... takes a few installs to actually get it running not sure whats wrong with it.

When we run pvecm add ******* it goes thru the process then fails at

Starting pve cluster filesystem : pve-clustercan't create shared ssh key database '/etc/pve/priv/authorized_keys'

Then starts cman then cluster sevices... It then shows up in the web gui but if i click on the node or any of the storage i get a connection refused.


/etc/hosts points to the ip and hostname


***.**.**.*** proxmox14.fortatrust.com proxmox14 pvelocalhost



/etc/network/interfaces lo is configured


root@proxmox14:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback


/etc/pve/priv/authorized_keys is an empty file....


  • Can i just copy this from another node and put it on this node or do i need to run a command for this node to generate the ssh keys?
  • How can i find what node is causing this issue and solve it?

Last three nodes we added presented this problem. Would like to solve it so when we add another node all we need to do is run pvecm add {cluster-nodeip}


Tried to restart apache:

root@proxmox14:~# /etc/init.d/apache2 restart
Syntax error on line 13 of /etc/apache2/sites-enabled/pve-redirect.conf:
SSLCertificateFile: file '/etc/pve/local/pve-ssl.pem' does not exist or is empty
Action 'configtest' failed.
The Apache error log may have more information.
failed!




I also removed it from the cluster and add it back
restarted pve-cluster & cman on all cluster nodes to remove proxmox14. Added node 14 back to the cluster using the -f to force flag to get it done

root@proxmox14:~# pvecm add 63.217.249.159 -f
copy corosync auth key
stopping pve-cluster service
Stopping pve cluster filesystem: pve-cluster.
backup old database
Starting pve cluster filesystem : pve-clustercan't create shared ssh key databas e '/etc/pve/priv/authorized_keys'
.
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... [ OK ]
Waiting for quorum... [ OK ]
Starting fenced... [ OK ]
Starting dlm_controld... [ OK ]
Tuning DLM kernel config... [ OK ]
Unfencing self... [ OK ]
waiting for quorum...OK
generating node certificates
unable to create directory '/etc/pve/priv' - File exists


Still same problem exists.

All nodes are online & in quorum.
 
apache was the problem...


root@proxmox14:~# /etc/init.d/apache2 restart
Syntax error on line 13 of /etc/apache2/sites-enabled/pve-redirect.conf:
SSLCertificateFile: file '/etc/pve/local/pve-ssl.pem' does not exist or is empty
Action 'configtest' failed.
The Apache error log may have more information.
failed!

SOLUTION: copy /etc/pve/local/pve-ssl.pem from another node to this node

root@proxmox14:~# /etc/init.d/apache2 restart
Syntax error on line 14 of /etc/apache2/sites-enabled/pve-redirect.conf:
SSLCertificateKeyFile: file '/etc/pve/local/pve-ssl.key' does not exist or is empty
Action 'configtest' failed.
The Apache error log may have more information.
failed!



SOLUTION: copy /etc/pve/local/pve-ssl.key from another node to this node

restart apache:

root@proxmox14:~# /etc/init.d/apache2 restart
Restarting web server: apache2


My remaining problem.... migrations cannot be handled an i am guessing it has to do with

'/etc/pve/priv/authorized_keys'

Can i just copy that file from another node or do these keys get generated per node?
 
/etc/pve/priv/authorized_keys was blank.... so i went looking for my original working file..

/etc/pve/priv/authorized_keys.bak copied that over to authorized_keys..... removed node14 from the cluster. Added it back again using the -f force flag

everything is good to go


- - - Updated - - -

Migrations to this node still do not work


May 17 10:26:36 # /usr/bin/ssh -o 'BatchMode=yes' root@************ /bin/true
May 17 10:26:36 Host key verification failed.
May 17 10:26:36 ERROR: migration aborted (duration 00:00:14): Can't connect to destination address using public key
TASK ERROR: migration aborted
 
Node is online but migrations still dont work.

I was able to provision a openvz container which is running and online. if i type vzlist i see its running and the ip address.

In proxmox gui there are no cts listed on the node... If i povision another vm it will use the same vm id as the current vm on the node.

How can i solve this? Why would the gui not show a ct that was created and is running on the node?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!