Proxmox cluster node communication issues

Dregs

New Member
Feb 8, 2024
5
0
1
I have just set up a three node cluster, extending my single host setup.

pve1 = 172.x.x.10
pve2 = 172.x.x.20
pve3 = 172.x.x.30

Each node can ping the others fine.

From pve2 and pve3 I can ssh to pve2 pve3 and pve1 fine. Nmap also shows all ports open.
All nodes use a vlan ID of 5.

My issue is that from pve1 ssh and other services cannot contact pve2 or pve3, but it works fine in the other direction.
Also from pve1's web GUI trying to access info on pve2 or pve3 yields a Timeout.
But from pve3 or pve2 to pve1 works fine

All three nodes also report quorum is good as well so I belive they can communicate enough for that.

The system time for all three is within a second so I do not think that is the issue

After writing the below I noticed there was a version mismatch between the nodes pve1 was 8.1.0, and pve2&3 were 8.1.3
so I updated each node to v8.1.4 and I am still experiencing the issue.

Below I've attached a bunch of info that hopefully has the answer. If I need to attach anything else please let me know
nmap works as expected for pve2 and pve3:

pve2:
Code:
root@pve2:~# nmap pve1
Starting Nmap 7.93 ( https://nmap.org ) at 2024-02-08 17:12 EST
Nmap scan report for pve1 (172.x.x10)
Host is up (0.00028s latency).
rDNS record for 172.x.x.10: pve1.x.com
Not shown: 997 closed tcp ports (reset)
PORT     STATE SERVICE
22/tcp   open  ssh
111/tcp  open  rpcbind
3128/tcp open  squid-http

Nmap done: 1 IP address (1 host up) scanned in 0.22 seconds
root@pve2:~# nmap pve3
Starting Nmap 7.93 ( https://nmap.org ) at 2024-02-08 17:12 EST
Nmap scan report for pve3 (172.x.x.30)
Host is up (0.00047s latency).
rDNS record for 172.x.x.30: pve3.x.com
Not shown: 997 closed tcp ports (reset)
PORT     STATE SERVICE
22/tcp   open  ssh
111/tcp  open  rpcbind
3128/tcp open  squid-http

Nmap done: 1 IP address (1 host up) scanned in 0.25 seconds


SSH works fine
Code:
root@pve2:~# ssh pve1
Linux pve1 6.2.16-3-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.16-3 (2023-06-17T05:58Z) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Thu Feb  8 17:08:43 2024 from 172.16.8.10
root@pve1:~#


root@pve2:~# ssh pve3
Linux pve3 6.5.11-4-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-4 (2023-11-20T10:19Z) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Thu Feb  8 16:32:23 2024
root@pve3:~#

pve3:
Code:
root@pve3:~# nmap pve1
Starting Nmap 7.93 ( https://nmap.org ) at 2024-02-08 17:12 EST
Nmap scan report for pve1 (172.x.x.10)
Host is up (0.00029s latency).
rDNS record for 172.x.x.10: pve1.x.com
Not shown: 997 closed tcp ports (reset)
PORT     STATE SERVICE
22/tcp   open  ssh
111/tcp  open  rpcbind
3128/tcp open  squid-http

Nmap done: 1 IP address (1 host up) scanned in 0.18 seconds
root@pve3:~# nmap pve2
Starting Nmap 7.93 ( https://nmap.org ) at 2024-02-08 17:12 EST
Nmap scan report for pve2 (172.x.x.20)
Host is up (0.00062s latency).
rDNS record for 172.x.x.20: pve2.x.com
Not shown: 997 closed tcp ports (reset)
PORT     STATE SERVICE
22/tcp   open  ssh
111/tcp  open  rpcbind
3128/tcp open  squid-http

Nmap done: 1 IP address (1 host up) scanned in 0.19 seconds

SSh is also good here
Code:
root@pve3:~# ssh pve1
Linux pve1 6.2.16-3-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.16-3 (2023-06-17T05:58Z) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Thu Feb  8 17:09:02 2024 from 172.16.8.20
root@pve1:~# ^C
root@pve1:~#
logout
Connection to pve1 closed.
root@pve3:~# ssh pve2
Linux pve2 6.5.11-4-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-4 (2023-11-20T10:19Z) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Thu Feb  8 17:12:33 2024
root@pve2:~#
logout
Connection to pve2 closed.


However, from pve1 although I can ping all nodes nmap cannot see any open ports for any of the other nodes.
Code:
root@pve1:~# ping pve2
PING pve2.x.com (172.x.x.20) 56(84) bytes of data.
64 bytes from pve2.x.com (172.x.x.20): icmp_seq=1 ttl=63 time=0.757 ms
64 bytes from pve2.x.com (172.x.x.20): icmp_seq=2 ttl=63 time=1.13 ms
^C
--- pve2.x.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1019ms
rtt min/avg/max/mdev = 0.757/0.943/1.130/0.186 ms
root@pve1:~# ping pve3
PING pve3.x.com (172.x.x.30) 56(84) bytes of data.
64 bytes from pve3.x.com (172.x.x.30): icmp_seq=1 ttl=63 time=0.675 ms
64 bytes from pve3.x.com (172.x.x.30): icmp_seq=2 ttl=63 time=0.989 ms
^C
--- pve3.x.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1004ms
rtt min/avg/max/mdev = 0.675/0.832/0.989/0.157 ms

Nmap fails to see ports on the other two hosts:
root@pve1:~# nmap pve2
Starting Nmap 7.93 ( https://nmap.org ) at 2024-02-08 17:03 EST
Nmap scan report for pve2 (172.x.x.20)
Host is up (0.00038s latency).
rDNS record for 172.x.x.20: pve2.x.com
All 1000 scanned ports on pve2 (172.x.x.20) are in ignored states.
Not shown: 1000 filtered tcp ports (no-response)
MAC Address: D8:9E:F3:DD:7F:76 (Dell)
root@pve1:~# nmap pve3
Starting Nmap 7.93 ( https://nmap.org ) at 2024-02-08 17:09 EST
Nmap scan report for pve3 (172.x.x.30)
Host is up (0.00069s latency).
rDNS record for 172.16.8.30: pve3.x.com
All 1000 scanned ports on pve3 (172.x.x.30) are in ignored states.
Not shown: 1000 filtered tcp ports (no-response)
MAC Address: E4:54:E8:3A:3C:5F (Dell)


Nmap done: 1 IP address (1 host up) scanned in 21.27 seconds

SSH also times out for them from pve1:
Code:
root@pve1:~# ssh -vvv pve2
OpenSSH_9.2p1 Debian-2, OpenSSL 3.0.9 30 May 2023
debug1: Reading configuration data /root/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no files
debug1: /etc/ssh/ssh_config line 21: Applying options for *
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts' -> '/root/.ssh/known_hosts'
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts2' -> '/root/.ssh/known_hosts2'
debug2: resolving "pve2" port 22
debug3: resolve_host: lookup pve2:22
debug3: ssh_connect_direct: entering
debug1: Connecting to pve2 [172.x.x.20] port 22.
debug3: set_sock_tos: set socket 3 IP_TOS 0x10
debug1: connect to address 172.x.x.20 port 22: Connection timed out
ssh: connect to host pve2 port 22: Connection timed out


Below I will now attach more config info for pve1

/etc/interfaces:
Code:
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

iface enp1s0 inet manual

auto vmbr0
iface vmbr0 inet manual
        bridge-ports enp1s0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094

iface wlo1 inet manual

auto vmbr0.5
iface vmbr0.5 inet static
        address 172.x.x.10/24
        gateway 172.x.x.254

corosync.conf:
Code:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 172.x.x.10
  }
  node {
    name: pve2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 172.x.x.20
  }
  node {
    name: pve3
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 172.x.x.30
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: LAB-01
  config_version: 3
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

pvecm nodes:
Code:
root@pve1:/etc/pve# pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         1          1 pve1 (local)
         2          1 pve2
         3          1 pve3

pvecm status
Code:
root@pve1:/etc/pve# pvecm status
Cluster information
-------------------
Name:             LAB-01
Config Version:   3
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Thu Feb  8 17:26:02 2024
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          1.16
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 172.x.x.10 (local)
0x00000002          1 172.x.x.20
0x00000003          1 172.x.x.30

Here is the info from one of the working nodes (pve3)
/etc/network/interfaces
Code:
auto lo
iface lo inet loopback

iface enp2s0 inet manual

auto vmbr0.5
iface vmbr0.5 inet static
        address 172.x.x.30
        gateway 172.x.x.254

auto vmbr0
iface vmbr0 inet static
        bridge-ports enp2s0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094


source /etc/network/interfaces.d/*

corosync.conf:
Code:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 172.x.x.10
  }
  node {
    name: pve2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 172.x.x.20
  }
  node {
    name: pve3
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 172.x.x.30
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: LAB-01
  config_version: 3
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}
pvecm nodes and status
Code:
root@pve3:~# pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         1          1 pve1
         2          1 pve2
         3          1 pve3 (local)
root@pve3:~# pvecm status
Cluster information
-------------------
Name:             LAB-01
Config Version:   3
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Thu Feb  8 17:32:59 2024
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000003
Ring ID:          1.16
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 172.x.x.10
0x00000002          1 172.x.x.20
0x00000003          1 172.x.x.30 (local)

I hope the issue is something simple, and I do not have to reinstall proxmox on pve1 as pve1 is the original node and has all of the running VMs and containers on it.

Thanks anyone who can help.
 
Last edited:
I have just set up a three node cluster, extending my single host setup.

pve1 = 172.x.x.10
pve2 = 172.x.x.20
pve3 = 172.x.x.30

Each node can ping the others fine.

From pve2 and pve3 I can ssh to pve2 pve3 and pve1 fine. Nmap also shows all ports open.
All nodes use a vlan ID of 5.

My issue is that from pve1 ssh and other services cannot contact pve2 or pve3, but it works fine in the other direction.
Also from pve1's web GUI trying to access info on pve2 or pve3 yields a Timeout.
But from pve3 or pve2 to pve1 works fine

All three nodes also report quorum is good as well so I belive they can communicate enough for that.

The system time for all three is within a second so I do not think that is the issue

After writing the below I noticed there was a version mismatch between the nodes pve1 was 8.1.0, and pve2&3 were 8.1.3
so I updated each node to v8.1.4 and I am still experiencing the issue.

Below I've attached a bunch of info that hopefully has the answer. If I need to attach anything else please let me know
nmap works as expected for pve2 and pve3:

pve2:
Code:
root@pve2:~# nmap pve1
Starting Nmap 7.93 ( https://nmap.org ) at 2024-02-08 17:12 EST
Nmap scan report for pve1 (172.x.x10)
Host is up (0.00028s latency).
rDNS record for 172.x.x.10: pve1.x.com
Not shown: 997 closed tcp ports (reset)
PORT     STATE SERVICE
22/tcp   open  ssh
111/tcp  open  rpcbind
3128/tcp open  squid-http

Nmap done: 1 IP address (1 host up) scanned in 0.22 seconds
root@pve2:~# nmap pve3
Starting Nmap 7.93 ( https://nmap.org ) at 2024-02-08 17:12 EST
Nmap scan report for pve3 (172.x.x.30)
Host is up (0.00047s latency).
rDNS record for 172.x.x.30: pve3.x.com
Not shown: 997 closed tcp ports (reset)
PORT     STATE SERVICE
22/tcp   open  ssh
111/tcp  open  rpcbind
3128/tcp open  squid-http

Nmap done: 1 IP address (1 host up) scanned in 0.25 seconds


SSH works fine
Code:
root@pve2:~# ssh pve1
Linux pve1 6.2.16-3-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.16-3 (2023-06-17T05:58Z) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Thu Feb  8 17:08:43 2024 from 172.16.8.10
root@pve1:~#


root@pve2:~# ssh pve3
Linux pve3 6.5.11-4-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-4 (2023-11-20T10:19Z) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Thu Feb  8 16:32:23 2024
root@pve3:~#

pve3:
Code:
root@pve3:~# nmap pve1
Starting Nmap 7.93 ( https://nmap.org ) at 2024-02-08 17:12 EST
Nmap scan report for pve1 (172.x.x.10)
Host is up (0.00029s latency).
rDNS record for 172.x.x.10: pve1.x.com
Not shown: 997 closed tcp ports (reset)
PORT     STATE SERVICE
22/tcp   open  ssh
111/tcp  open  rpcbind
3128/tcp open  squid-http

Nmap done: 1 IP address (1 host up) scanned in 0.18 seconds
root@pve3:~# nmap pve2
Starting Nmap 7.93 ( https://nmap.org ) at 2024-02-08 17:12 EST
Nmap scan report for pve2 (172.x.x.20)
Host is up (0.00062s latency).
rDNS record for 172.x.x.20: pve2.x.com
Not shown: 997 closed tcp ports (reset)
PORT     STATE SERVICE
22/tcp   open  ssh
111/tcp  open  rpcbind
3128/tcp open  squid-http

Nmap done: 1 IP address (1 host up) scanned in 0.19 seconds

SSh is also good here
Code:
root@pve3:~# ssh pve1
Linux pve1 6.2.16-3-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.16-3 (2023-06-17T05:58Z) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Thu Feb  8 17:09:02 2024 from 172.16.8.20
root@pve1:~# ^C
root@pve1:~#
logout
Connection to pve1 closed.
root@pve3:~# ssh pve2
Linux pve2 6.5.11-4-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-4 (2023-11-20T10:19Z) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Thu Feb  8 17:12:33 2024
root@pve2:~#
logout
Connection to pve2 closed.


However, from pve1 although I can ping all nodes nmap cannot see any open ports for any of the other nodes.
Code:
root@pve1:~# ping pve2
PING pve2.x.com (172.x.x.20) 56(84) bytes of data.
64 bytes from pve2.x.com (172.x.x.20): icmp_seq=1 ttl=63 time=0.757 ms
64 bytes from pve2.x.com (172.x.x.20): icmp_seq=2 ttl=63 time=1.13 ms
^C
--- pve2.x.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1019ms
rtt min/avg/max/mdev = 0.757/0.943/1.130/0.186 ms
root@pve1:~# ping pve3
PING pve3.x.com (172.x.x.30) 56(84) bytes of data.
64 bytes from pve3.x.com (172.x.x.30): icmp_seq=1 ttl=63 time=0.675 ms
64 bytes from pve3.x.com (172.x.x.30): icmp_seq=2 ttl=63 time=0.989 ms
^C
--- pve3.x.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1004ms
rtt min/avg/max/mdev = 0.675/0.832/0.989/0.157 ms

Nmap fails to see ports on the other two hosts:
root@pve1:~# nmap pve2
Starting Nmap 7.93 ( https://nmap.org ) at 2024-02-08 17:03 EST
Nmap scan report for pve2 (172.x.x.20)
Host is up (0.00038s latency).
rDNS record for 172.x.x.20: pve2.x.com
All 1000 scanned ports on pve2 (172.x.x.20) are in ignored states.
Not shown: 1000 filtered tcp ports (no-response)
MAC Address: D8:9E:F3:DD:7F:76 (Dell)
root@pve1:~# nmap pve3
Starting Nmap 7.93 ( https://nmap.org ) at 2024-02-08 17:09 EST
Nmap scan report for pve3 (172.x.x.30)
Host is up (0.00069s latency).
rDNS record for 172.16.8.30: pve3.x.com
All 1000 scanned ports on pve3 (172.x.x.30) are in ignored states.
Not shown: 1000 filtered tcp ports (no-response)
MAC Address: E4:54:E8:3A:3C:5F (Dell)


Nmap done: 1 IP address (1 host up) scanned in 21.27 seconds

SSH also times out for them from pve1:
Code:
root@pve1:~# ssh -vvv pve2
OpenSSH_9.2p1 Debian-2, OpenSSL 3.0.9 30 May 2023
debug1: Reading configuration data /root/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no files
debug1: /etc/ssh/ssh_config line 21: Applying options for *
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts' -> '/root/.ssh/known_hosts'
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts2' -> '/root/.ssh/known_hosts2'
debug2: resolving "pve2" port 22
debug3: resolve_host: lookup pve2:22
debug3: ssh_connect_direct: entering
debug1: Connecting to pve2 [172.x.x.20] port 22.
debug3: set_sock_tos: set socket 3 IP_TOS 0x10
debug1: connect to address 172.x.x.20 port 22: Connection timed out
ssh: connect to host pve2 port 22: Connection timed out


Below I will now attach more config info for pve1

/etc/interfaces:
Code:
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

iface enp1s0 inet manual

auto vmbr0
iface vmbr0 inet manual
        bridge-ports enp1s0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094

iface wlo1 inet manual

auto vmbr0.5
iface vmbr0.5 inet static
        address 172.x.x.10/24
        gateway 172.x.x.254

corosync.conf:
Code:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 172.x.x.10
  }
  node {
    name: pve2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 172.x.x.20
  }
  node {
    name: pve3
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 172.x.x.30
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: LAB-01
  config_version: 3
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

pvecm nodes:
Code:
root@pve1:/etc/pve# pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         1          1 pve1 (local)
         2          1 pve2
         3          1 pve3

pvecm status
Code:
root@pve1:/etc/pve# pvecm status
Cluster information
-------------------
Name:             LAB-01
Config Version:   3
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Thu Feb  8 17:26:02 2024
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          1.16
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 172.x.x.10 (local)
0x00000002          1 172.x.x.20
0x00000003          1 172.x.x.30

Here is the info from one of the working nodes (pve3)
/etc/network/interfaces
Code:
auto lo
iface lo inet loopback

iface enp2s0 inet manual

auto vmbr0.5
iface vmbr0.5 inet static
        address 172.x.x.30
        gateway 172.x.x.254

auto vmbr0
iface vmbr0 inet static
        bridge-ports enp2s0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094


source /etc/network/interfaces.d/*

corosync.conf:
Code:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 172.x.x.10
  }
  node {
    name: pve2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 172.x.x.20
  }
  node {
    name: pve3
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 172.x.x.30
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: LAB-01
  config_version: 3
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}
pvecm nodes and status
Code:
root@pve3:~# pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         1          1 pve1
         2          1 pve2
         3          1 pve3 (local)
root@pve3:~# pvecm status
Cluster information
-------------------
Name:             LAB-01
Config Version:   3
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Thu Feb  8 17:32:59 2024
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000003
Ring ID:          1.16
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 172.x.x.10
0x00000002          1 172.x.x.20
0x00000003          1 172.x.x.30 (local)

I hope the issue is something simple, and I do not have to reinstall proxmox on pve1 as pve1 is the original node and has all of the running VMs and containers on it.

Thanks anyone who can help.
Hi,
since all hosts seem to be able to ping each other, the network layer seems fine to me. So I would guess there is a firewall rule on node2 and node3 in place, which drops packets from node1 for a certain port range. Please check the output of iptables-save from node2 and node3.
 
Hi,
since all hosts seem to be able to ping each other, the network layer seems fine to me. So I would guess there is a firewall rule on node2 and node3 in place, which drops packets from node1 for a certain port range. Please check the output of iptables-save from node2 and node3.
Here is the iptables-save output for each node.

pve1
Code:
root@pve1:~# iptables-save
# Generated by iptables-save v1.8.9 on Fri Feb  9 09:08:16 2024
*raw
:PREROUTING ACCEPT [10728208:24740100044]
:OUTPUT ACCEPT [5911423:74479063215]
COMMIT
# Completed on Fri Feb  9 09:08:16 2024
# Generated by iptables-save v1.8.9 on Fri Feb  9 09:08:16 2024
*filter
:INPUT ACCEPT [10582760:24722790761]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [5911426:74479063799]
COMMIT
# Completed on Fri Feb  9 09:08:16 2024
pve2
Code:
root@pve2:~# iptables-save
# Generated by iptables-save v1.8.9 on Fri Feb  9 09:06:02 2024
*raw
:PREROUTING ACCEPT [2908687:600234401]
:OUTPUT ACCEPT [2731687:584182248]
COMMIT
# Completed on Fri Feb  9 09:06:02 2024
# Generated by iptables-save v1.8.9 on Fri Feb  9 09:06:02 2024
*filter
:INPUT ACCEPT [2825866:587586090]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [2731688:584182496]
COMMIT
# Completed on Fri Feb  9 09:06:02 2024
pve3
Code:
root@pve3:~# iptables-save
# Generated by iptables-save v1.8.9 on Fri Feb  9 09:08:45 2024
*raw
:PREROUTING ACCEPT [2970381:635294704]
:OUTPUT ACCEPT [2749246:598332052]
COMMIT
# Completed on Fri Feb  9 09:08:45 2024
# Generated by iptables-save v1.8.9 on Fri Feb  9 09:08:45 2024
*filter
:INPUT ACCEPT [2887661:622650636]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [2749248:598332352]
COMMIT
# Completed on Fri Feb  9 09:08:45 2024
 
Have you rebooted pve2 and pve3 since they were added to the cluster? And have you rebooted pve1 since you ran all the updates? If no, I would recommend rebooting all 3 nodes just to rule out any updates that have not been fully applied. But I agree with @Chris that since nmap mentions 'filtered' ports, that it sounds like a firewall issue.
 
Have you rebooted pve2 and pve3 since they were added to the cluster? And have you rebooted pve1 since you ran all the updates? If no, I would recommend rebooting all 3 nodes just to rule out any updates that have not been fully applied. But I agree with @Chris that since nmap mentions 'filtered' ports, that it sounds like a firewall issue.
I have rebooted all nodes as part of the update process. I've just rebooted all nodes again and the problem is still occurring. I have not explicitly set any firewall rules on any of the nodes as of now. And since those nodes are currently hooked into the same switch, I do not believe my main firewall configs would affect them.
 
I have rebooted all nodes as part of the update process. I've just rebooted all nodes again and the problem is still occurring. I have not explicitly set any firewall rules on any of the nodes as of now. And since those nodes are currently hooked into the same switch, I do not believe my main firewall configs would affect them.
Since the nodes can ping each other, and you have verified that the services are up and running by showing that node2 and node3 can connect and communicate with each other, the only option I see here is something blocking connections from node1. Further, you showed that your nodes do not have the PVE firewall enabled, so probably something outside of the Proxmox VE nodes not letting the traffic pass.
 
Since the nodes can ping each other, and you have verified that the services are up and running by showing that node2 and node3 can connect and communicate with each other, the only option I see here is something blocking connections from node1. Further, you showed that your nodes do not have the PVE firewall enabled, so probably something outside of the Proxmox VE nodes not letting the traffic pass.
Got it. I am going to go into the main firewall and create a temporary allow everything for the VLAN and see if that will fix the problem.
 
Got it. I am going to go into the main firewall and create a temporary allow everything for the VLAN and see if that will fix the problem.
@Chris Adding the allow all rule did not fix the issue on its own. Looking at the logs, I saw that some very high ports were being blocked via a state violation rule. Disabling state checking on my allow everything rule allows the cluster to work as expected.

These questions are kind of out of scope for this forum but.
Why would a firewall behind two layers of switches affect local connectivity. The local switch should have handled that, I'd think.

I'd also like to ask some advice on restoring the firewall, since I do not want to keep an allow everything rule for longer than necessary.
 
Why would a firewall behind two layers of switches affect local connectivity. The local switch should have handled that, I'd think.
My guess is that the network packets are not taking the path you are expecting for some reason, but without knowing the network this is probably out of scope for me to answer. Same goes for the firewall setup, you need at least the ports described in the docs to be open, see https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_ports_used_by_proxmox_ve
 
  • Like
Reactions: Dregs
Why would a firewall behind two layers of switches affect local connectivity. The local switch should have handled that, I'd think.
It may depend on how you have the 2 switches tied together. For instance, if you have spanning tree enabled, or link aggregation is incorrectly setup, the packets may be taking a weird path, as @Chris as suggested. And because it is on a VLAN, it may not be traveling via the correct port on the switch. You will need to check all your switch configurations. It is also possible that you have a bad switchport, or cable somewhere.
 
  • Like
Reactions: Chris

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!