Node in quorum but no ssh access (between nodes)

sassriverrat

New Member
Jan 23, 2023
7
0
1
Good Afternoon,

I've tried clearing ssh keys, public keys, etc. I have a server that was working in the cluster and has stopped. I had three servers in the cluster (let's call them server1, 2, 3, and 4).

I recently added server4. All were reporting well in the cluster, but 3 wasn't communicating properly (no summary info on gui from other nodes, couldn't migrate vm's, etc). I reloaded proxmox on server3 and re-added it to the cluster and it works, but server1 started acting up the same way. Now, I can migrate and do everything between servers2,3,4 but server1 remains it's own entity for the purpose of discussion.

Ideas?
 
you can give "pvecm updatecerts" a try:
https://pve.proxmox.com/pve-docs/pvecm.1.html
Code:
pvecm updatecerts [OPTIONS]

Update node certificates (and generate all needed files/directories).

--force <boolean>
Force generation of new SSL certificate.

--silent <boolean>
Ignore errors (i.e. when cluster has no quorum).



Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
unfortunately that didn't work. I also found that in both the /etc/pve/priv/known_hosts and etc/ssh/ssh_known_hosts file, server1 is the and it's IP address are the only ssh-rsa keys, yet the other servers can all ssh back and forth between each other. This finding is true in all 4 servers, and in any one server, if I go to edit the file, it's being edited in another server (yes, I have it open in multiple places).
 
/etc/pve/ is a special clustered filesystem that is shared across all nodes. A change from one node will be reflected on all others.

And as you can see here the ssh_known_hosts is a link that leads to the clustered filesystem:
Code:
ls -al /etc/ssh/ssh_known_hosts
lrwxrwxrwx 1 root root 25 Jan 23 16:28 /etc/ssh/ssh_known_hosts -> /etc/pve/priv/known_hosts

server1 is the and it's IP address are the only ssh-rsa keys
I cant quiet parse what you are saying here, but I suspect this might have something to do with it:
I've tried clearing ssh keys, public keys, etc.

Your system is currently in an unpredictable state. This thread https://forum.proxmox.com/threads/migration-issue-after-replacing-a-node.13968/ might be helpful. Review it carefully, there are some steps in there that might help you to recover proper cluster operations.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Ok. So I went into all 4 nodes into /root/.ssh/known_hosts and deleted ALL lines just to be sure. I then ssh'd between all nodes except server1. And server1 continues to now allow me to ssh out to other devices, but shows in the cluster. Yes, I can ssh into server1 from my computer directly, just not between proxmox nodes.