I have just set up a three node cluster, extending my single host setup.
Each node can ping the others fine.
From pve2 and pve3 I can ssh to pve2 pve3 and pve1 fine. Nmap also shows all ports open.
All nodes use a vlan ID of 5.
My issue is that from pve1 ssh and other services cannot contact pve2 or pve3, but it works fine in the other direction.
Also from pve1's web GUI trying to access info on pve2 or pve3 yields a Timeout.
But from pve3 or pve2 to pve1 works fine
All three nodes also report quorum is good as well so I belive they can communicate enough for that.
The system time for all three is within a second so I do not think that is the issue
After writing the below I noticed there was a version mismatch between the nodes pve1 was 8.1.0, and pve2&3 were 8.1.3
so I updated each node to v8.1.4 and I am still experiencing the issue.
Below I've attached a bunch of info that hopefully has the answer. If I need to attach anything else please let me know
nmap works as expected for pve2 and pve3:
pve2:
SSH works fine
pve3:
SSh is also good here
However, from pve1 although I can ping all nodes nmap cannot see any open ports for any of the other nodes.
SSH also times out for them from pve1:
Below I will now attach more config info for pve1
corosync.conf:
pvecm nodes:
pvecm status
Here is the info from one of the working nodes (pve3)
/etc/network/interfaces
corosync.conf:
pvecm nodes and status
I hope the issue is something simple, and I do not have to reinstall proxmox on pve1 as pve1 is the original node and has all of the running VMs and containers on it.
Thanks anyone who can help.
pve1 = 172.x.x.10
pve2 = 172.x.x.20
pve3 = 172.x.x.30
Each node can ping the others fine.
From pve2 and pve3 I can ssh to pve2 pve3 and pve1 fine. Nmap also shows all ports open.
All nodes use a vlan ID of 5.
My issue is that from pve1 ssh and other services cannot contact pve2 or pve3, but it works fine in the other direction.
Also from pve1's web GUI trying to access info on pve2 or pve3 yields a Timeout.
But from pve3 or pve2 to pve1 works fine
All three nodes also report quorum is good as well so I belive they can communicate enough for that.
The system time for all three is within a second so I do not think that is the issue
After writing the below I noticed there was a version mismatch between the nodes pve1 was 8.1.0, and pve2&3 were 8.1.3
so I updated each node to v8.1.4 and I am still experiencing the issue.
Below I've attached a bunch of info that hopefully has the answer. If I need to attach anything else please let me know
nmap works as expected for pve2 and pve3:
pve2:
Code:
root@pve2:~# nmap pve1
Starting Nmap 7.93 ( https://nmap.org ) at 2024-02-08 17:12 EST
Nmap scan report for pve1 (172.x.x10)
Host is up (0.00028s latency).
rDNS record for 172.x.x.10: pve1.x.com
Not shown: 997 closed tcp ports (reset)
PORT STATE SERVICE
22/tcp open ssh
111/tcp open rpcbind
3128/tcp open squid-http
Nmap done: 1 IP address (1 host up) scanned in 0.22 seconds
root@pve2:~# nmap pve3
Starting Nmap 7.93 ( https://nmap.org ) at 2024-02-08 17:12 EST
Nmap scan report for pve3 (172.x.x.30)
Host is up (0.00047s latency).
rDNS record for 172.x.x.30: pve3.x.com
Not shown: 997 closed tcp ports (reset)
PORT STATE SERVICE
22/tcp open ssh
111/tcp open rpcbind
3128/tcp open squid-http
Nmap done: 1 IP address (1 host up) scanned in 0.25 seconds
SSH works fine
Code:
root@pve2:~# ssh pve1
Linux pve1 6.2.16-3-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.16-3 (2023-06-17T05:58Z) x86_64
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Thu Feb 8 17:08:43 2024 from 172.16.8.10
root@pve1:~#
root@pve2:~# ssh pve3
Linux pve3 6.5.11-4-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-4 (2023-11-20T10:19Z) x86_64
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Thu Feb 8 16:32:23 2024
root@pve3:~#
pve3:
Code:
root@pve3:~# nmap pve1
Starting Nmap 7.93 ( https://nmap.org ) at 2024-02-08 17:12 EST
Nmap scan report for pve1 (172.x.x.10)
Host is up (0.00029s latency).
rDNS record for 172.x.x.10: pve1.x.com
Not shown: 997 closed tcp ports (reset)
PORT STATE SERVICE
22/tcp open ssh
111/tcp open rpcbind
3128/tcp open squid-http
Nmap done: 1 IP address (1 host up) scanned in 0.18 seconds
root@pve3:~# nmap pve2
Starting Nmap 7.93 ( https://nmap.org ) at 2024-02-08 17:12 EST
Nmap scan report for pve2 (172.x.x.20)
Host is up (0.00062s latency).
rDNS record for 172.x.x.20: pve2.x.com
Not shown: 997 closed tcp ports (reset)
PORT STATE SERVICE
22/tcp open ssh
111/tcp open rpcbind
3128/tcp open squid-http
Nmap done: 1 IP address (1 host up) scanned in 0.19 seconds
SSh is also good here
Code:
root@pve3:~# ssh pve1
Linux pve1 6.2.16-3-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.16-3 (2023-06-17T05:58Z) x86_64
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Thu Feb 8 17:09:02 2024 from 172.16.8.20
root@pve1:~# ^C
root@pve1:~#
logout
Connection to pve1 closed.
root@pve3:~# ssh pve2
Linux pve2 6.5.11-4-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-4 (2023-11-20T10:19Z) x86_64
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Thu Feb 8 17:12:33 2024
root@pve2:~#
logout
Connection to pve2 closed.
However, from pve1 although I can ping all nodes nmap cannot see any open ports for any of the other nodes.
Code:
root@pve1:~# ping pve2
PING pve2.x.com (172.x.x.20) 56(84) bytes of data.
64 bytes from pve2.x.com (172.x.x.20): icmp_seq=1 ttl=63 time=0.757 ms
64 bytes from pve2.x.com (172.x.x.20): icmp_seq=2 ttl=63 time=1.13 ms
^C
--- pve2.x.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1019ms
rtt min/avg/max/mdev = 0.757/0.943/1.130/0.186 ms
root@pve1:~# ping pve3
PING pve3.x.com (172.x.x.30) 56(84) bytes of data.
64 bytes from pve3.x.com (172.x.x.30): icmp_seq=1 ttl=63 time=0.675 ms
64 bytes from pve3.x.com (172.x.x.30): icmp_seq=2 ttl=63 time=0.989 ms
^C
--- pve3.x.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1004ms
rtt min/avg/max/mdev = 0.675/0.832/0.989/0.157 ms
Nmap fails to see ports on the other two hosts:
root@pve1:~# nmap pve2
Starting Nmap 7.93 ( https://nmap.org ) at 2024-02-08 17:03 EST
Nmap scan report for pve2 (172.x.x.20)
Host is up (0.00038s latency).
rDNS record for 172.x.x.20: pve2.x.com
All 1000 scanned ports on pve2 (172.x.x.20) are in ignored states.
Not shown: 1000 filtered tcp ports (no-response)
MAC Address: D8:9E:F3:DD:7F:76 (Dell)
root@pve1:~# nmap pve3
Starting Nmap 7.93 ( https://nmap.org ) at 2024-02-08 17:09 EST
Nmap scan report for pve3 (172.x.x.30)
Host is up (0.00069s latency).
rDNS record for 172.16.8.30: pve3.x.com
All 1000 scanned ports on pve3 (172.x.x.30) are in ignored states.
Not shown: 1000 filtered tcp ports (no-response)
MAC Address: E4:54:E8:3A:3C:5F (Dell)
Nmap done: 1 IP address (1 host up) scanned in 21.27 seconds
SSH also times out for them from pve1:
Code:
root@pve1:~# ssh -vvv pve2
OpenSSH_9.2p1 Debian-2, OpenSSL 3.0.9 30 May 2023
debug1: Reading configuration data /root/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no files
debug1: /etc/ssh/ssh_config line 21: Applying options for *
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts' -> '/root/.ssh/known_hosts'
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts2' -> '/root/.ssh/known_hosts2'
debug2: resolving "pve2" port 22
debug3: resolve_host: lookup pve2:22
debug3: ssh_connect_direct: entering
debug1: Connecting to pve2 [172.x.x.20] port 22.
debug3: set_sock_tos: set socket 3 IP_TOS 0x10
debug1: connect to address 172.x.x.20 port 22: Connection timed out
ssh: connect to host pve2 port 22: Connection timed out
Below I will now attach more config info for pve1
/etc/interfaces:
Code:
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!
auto lo
iface lo inet loopback
iface enp1s0 inet manual
auto vmbr0
iface vmbr0 inet manual
bridge-ports enp1s0
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
iface wlo1 inet manual
auto vmbr0.5
iface vmbr0.5 inet static
address 172.x.x.10/24
gateway 172.x.x.254
corosync.conf:
Code:
logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: pve1
nodeid: 1
quorum_votes: 1
ring0_addr: 172.x.x.10
}
node {
name: pve2
nodeid: 2
quorum_votes: 1
ring0_addr: 172.x.x.20
}
node {
name: pve3
nodeid: 3
quorum_votes: 1
ring0_addr: 172.x.x.30
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: LAB-01
config_version: 3
interface {
linknumber: 0
}
ip_version: ipv4-6
link_mode: passive
secauth: on
version: 2
}
pvecm nodes:
Code:
root@pve1:/etc/pve# pvecm nodes
Membership information
----------------------
Nodeid Votes Name
1 1 pve1 (local)
2 1 pve2
3 1 pve3
pvecm status
Code:
root@pve1:/etc/pve# pvecm status
Cluster information
-------------------
Name: LAB-01
Config Version: 3
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Thu Feb 8 17:26:02 2024
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000001
Ring ID: 1.16
Quorate: Yes
Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 172.x.x.10 (local)
0x00000002 1 172.x.x.20
0x00000003 1 172.x.x.30
Here is the info from one of the working nodes (pve3)
/etc/network/interfaces
Code:
auto lo
iface lo inet loopback
iface enp2s0 inet manual
auto vmbr0.5
iface vmbr0.5 inet static
address 172.x.x.30
gateway 172.x.x.254
auto vmbr0
iface vmbr0 inet static
bridge-ports enp2s0
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
source /etc/network/interfaces.d/*
corosync.conf:
Code:
logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: pve1
nodeid: 1
quorum_votes: 1
ring0_addr: 172.x.x.10
}
node {
name: pve2
nodeid: 2
quorum_votes: 1
ring0_addr: 172.x.x.20
}
node {
name: pve3
nodeid: 3
quorum_votes: 1
ring0_addr: 172.x.x.30
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: LAB-01
config_version: 3
interface {
linknumber: 0
}
ip_version: ipv4-6
link_mode: passive
secauth: on
version: 2
}
Code:
root@pve3:~# pvecm nodes
Membership information
----------------------
Nodeid Votes Name
1 1 pve1
2 1 pve2
3 1 pve3 (local)
root@pve3:~# pvecm status
Cluster information
-------------------
Name: LAB-01
Config Version: 3
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Thu Feb 8 17:32:59 2024
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000003
Ring ID: 1.16
Quorate: Yes
Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 172.x.x.10
0x00000002 1 172.x.x.20
0x00000003 1 172.x.x.30 (local)
I hope the issue is something simple, and I do not have to reinstall proxmox on pve1 as pve1 is the original node and has all of the running VMs and containers on it.
Thanks anyone who can help.
Last edited: