lost ssh access - but ssh from node to node works?! - SOLVED

hachiman

New Member
Jan 29, 2025
11
2
3
Hi everyone,
I just realized I somehow lost ssh access to my two nodes in the cluster.

HTTPS - works.

SSH from node 1 to 2 - works also.

SSH to other servers in the same subnet - works also.

Did I do something stupid on the firewall settings in the DC? Right now I dont plan to use the firewall there.
This is the DC settings (no Sec Groups either):
Screenshot 2025-04-02 at 18.57.52.png

This is the node:
Screenshot 2025-04-02 at 18.58.35.png

Screenshot 2025-04-02 at 18.58.51.png

Ping is also not possible. With ICMP blocked I think it might still be FW related?

Any idea is highly appreciated!

Thx!
 
  • Like
Reactions: aabraham
Hello there,

Is there any other firewall (iptables, nftables) installed on the node ?
Not that I can see.
Actually it is two nodes. No ufw, no iptables on the nodes.
This is why I thought of a mishap on the Datacenter level, but I cant figure out where.

I see that my other node 10.100.1.111 can ssh in on this node - so sshd is running and ssh access inside the cluster is possible:

Code:
systemctl status sshd
● ssh.service - OpenBSD Secure Shell server
     Loaded: loaded (/lib/systemd/system/ssh.service; enabled; preset: enabled)
     Active: active (running) since Tue 2025-03-25 21:55:11 CET; 1 week 1 day ago
       Docs: man:sshd(8)
             man:sshd_config(5)
    Process: 1445 ExecStartPre=/usr/sbin/sshd -t (code=exited, status=0/SUCCESS)
   Main PID: 1466 (sshd)
      Tasks: 1 (limit: 112895)
     Memory: 4.8M
        CPU: 127ms
     CGroup: /system.slice/ssh.service
             └─1466 "sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups"

Apr 02 17:18:12 pve2 sshd[3685153]: pam_env(sshd:session): deprecated reading of user environment enabled
Apr 02 17:59:46 pve2 sshd[3698797]: Accepted publickey for root from 10.100.1.111 port 52640 ssh2: RSA SHA256:NbMuMP4rOOuSgNuB>
Apr 02 17:59:46 pve2 sshd[3698797]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Apr 02 17:59:46 pve2 sshd[3698797]: pam_env(sshd:session): deprecated reading of user environment enabled
Apr 02 18:56:22 pve2 sshd[3717254]: Accepted publickey for root from 10.100.1.111 port 42028 ssh2: RSA SHA256:NbMuMP4rOOuSAMWuB>
Apr 02 18:56:22 pve2 sshd[3717254]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Apr 02 18:56:22 pve2 sshd[3717254]: pam_env(sshd:session): deprecated reading of user environment enabled
Apr 03 08:11:05 pve2 sshd[3978833]: Accepted publickey for root from 10.100.1.111 port 52916 ssh2: RSA SHA256:NbMuMP4rOOuSAMuMtsgNuB>
Apr 03 08:11:05 pve2 sshd[3978833]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Apr 03 08:11:05 pve2 sshd[3978833]: pam_env(sshd:session): deprecated reading of user environment enabled
 
Hi everyone,
I just realized I somehow lost ssh access to my two nodes in the cluster.

HTTPS - works.

SSH from node 1 to 2 - works also.

SSH to other servers in the same subnet - works also.

Did I do something stupid on the firewall settings in the DC? Right now I dont plan to use the firewall there.
This is the DC settings (no Sec Groups either):
View attachment 84436

This is the node:
View attachment 84437

View attachment 84438

Ping is also not possible. With ICMP blocked I think it might still be FW related?

Any idea is highly appreciated!

Thx!
Hi, could you elaborate on what exactly is causing the problem and what you are attempting to do?
 
Hi, could you elaborate on what exactly is causing the problem and what you are attempting to do?
Sure. The thing is quite easy, I try to ssh into my two nodes.

SSH from my laptop or any other device fails - basically a timeout:
Code:
ssh -v root@10.10.1.12
OpenSSH_9.8p1, LibreSSL 3.3.6
debug1: Reading configuration data /Users/me/.ssh/config
debug1: Reading configuration data /Users/me/.colima/ssh_config
debug1: /Users/me/.ssh/config line 3: Applying options for *
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 21: include /etc/ssh/ssh_config.d/* matched no files
debug1: /etc/ssh/ssh_config line 54: Applying options for *
debug1: Authenticator provider $SSH_SK_PROVIDER did not resolve; disabling
debug1: Connecting to 10.10.1.12 [10.10.1.12] port 22.

When I use the web interface to access the shell, I can ssh from one node to another.
It appears to be something at the DC level, because the SSHd is running, keys have been authorized, etc. pp.
Even password login is allowed, but also times out.

I havent set any firewall rules. Especially since i setup a brand new 3rd node. I can ssh into that one. And comparing the FW rules it looks the same.
No iptables or ufw on the nodes.
 
Sure. The thing is quite easy, I try to ssh into my two nodes.

SSH from my laptop or any other device fails - basically a timeout:
Code:
ssh -v root@10.10.1.12
OpenSSH_9.8p1, LibreSSL 3.3.6
debug1: Reading configuration data /Users/me/.ssh/config
debug1: Reading configuration data /Users/me/.colima/ssh_config
debug1: /Users/me/.ssh/config line 3: Applying options for *
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 21: include /etc/ssh/ssh_config.d/* matched no files
debug1: /etc/ssh/ssh_config line 54: Applying options for *
debug1: Authenticator provider $SSH_SK_PROVIDER did not resolve; disabling
debug1: Connecting to 10.10.1.12 [10.10.1.12] port 22.

When I use the web interface to access the shell, I can ssh from one node to another.
It appears to be something at the DC level, because the SSHd is running, keys have been authorized, etc. pp.
Even password login is allowed, but also times out.

I havent set any firewall rules. Especially since i setup a brand new 3rd node. I can ssh into that one. And comparing the FW rules it looks the same.
No iptables or ufw on the nodes.
How do you mean "when I use the web interface to access the shell"? You mean the web interface for the host or the nodes that aren't reachable via ssh?
 
I meant the in browser shell access. But I solved the issue now. The corosync network had a wrong subnet mask. This overlapped with my client network. which resulted in these problems.
 
  • Like
Reactions: aabraham
I meant the in browser shell access. But I solved the issue now. The corosync network had a wrong subnet mask. This overlapped with my client network. which resulted in these problems.
Hi hachimann, I'm glad you were able to resolve your issue.
 
  • Like
Reactions: hachiman