Calling on the hive mind: networking issue from hell

proxwolfe

Well-Known Member
Jun 20, 2020
526
61
48
50
Hi,

so I have a small home lab cluster of 3 PVE nodes and one additional PBS. And I have the following networks (each on separate NICs/cables/switches):

- Management
- Corosync
- Ceph
- Backup

The PVE nodes are connected to all networks. The PBS in only connected to Management and Backup.

All PVE nodes see each other on all networks and can see the PBS both on the management network as well as on the backup network.

Recently I replaced one of the PVE nodes with newer hardware. This new node can see (ping) all the other nodes on all networks (including on backup) but it cannot see the PBS.

How can that be? If the NIC on the new node connected to the backup network were defective, the new node should not be able to see the other nodes on that network, it should see nothing at all. But it can see the other nodes also on the backup network. And the other nodes also can see the new node on the backup network. So why can the new node not see the PBS???

There is no name resolution involved, I am pinging the nodes' IP addresses.

There are no vlans.

There is no routing across subnets involved. The nodes and the PBS are all connected via a switch only.

There is no separate firewall in between the nodes and/or the PBS. The onboard firewalls of all nodes are inactive. I have not installed any other firewalls.

The error message I am getting is "Destination Host Unreachable".

What is going on?

Is it possible that the new node's NIC is defective after all and PVE somehow routes the pings on the PVE only networks around the broken NIC (and that only works within the PVE cluster but not outside and therefore does not work with PBS)? I know, this sounds crazy but it is the best (only) explanation I have come up so far.

From what I understand, "Destination Host unreachable" means that the new node does not know where to find the ping target (although it is on the same subnet).

Any ideas?

Thanks

Edit: Added some of the requested info. More to follow...
 
Last edited:
This new node can see (ping) all the other nodes on all networks (including on backup) but it cannot see the PBS.
By "ping ip.ad.dr.es" or by "ping pbsname"? Make sure all hosts have identical "name <--> ipaddress" mapping, be it via identical "/etc/hosts" or by a DNS server.

The PVE nodes are connected to all networks.
So there is no chance that a router forbids that connection. No routing involved.

Edit: what error message did you actually get? Can you post a few lines of an actually executed command?


You did not tell us many details about the network, so my reply goes as usual: please start by giving us some information, like the copy-n-pasted output of some commands, run on a PVE host (either via SSH or via "Datacenter --> <one Node> --> Shell", both ways allow copy-n-paste):

PVE System information:
  • pveversion -v
Basic network information:
  • ip address show # currently active IP addresses on one NODE
  • ip route show # currently active routing table on one NODE
  • ip link show # currently active links on one NODE
  • cat /etc/network/interfaces # configuration of the network
  • cat /etc/resolv.conf # DNS resolver settings

PBS Information
  • proxmox-backup-manager version
PBS Basic network information:
  • ip address show # currently active IP addresses on the PBS
  • ip route show # currently active routing table on the PBS
  • cat /etc/network/interfaces # configuration of the network
  • ss -tlpn # which process is listening on which address/port
Those are examples. You may add/edit commands and options if you can enrich the information given. Oh, and please put each command in a separate [CODE]...[/CODE]-block for better readability.
 
Last edited:
By "ping ip.ad.dr.es" or by "ping pbsname"? Make sure all hosts have identical "name <--> ipaddress" mapping, be it via identical "/etc/hosts" or by a DNS server.


So there is no chance that a router forbids that connection. No routing involved.

Edit: what error message did you actually get? Can you post a few lines of an actually executed command?
Good point, I have updated my OP with comments on this.
 
PVE System information:
  • pveversion -v
Code:
pveversion -v
proxmox-ve: 8.3.0 (running kernel: 6.8.12-7-pve)
pve-manager: 8.3.3 (running version: 8.3.3/f157a38b211595d6)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.12-7
proxmox-kernel-6.8.12-7-pve-signed: 6.8.12-7
proxmox-kernel-6.8.12-5-pve-signed: 6.8.12-5
proxmox-kernel-6.8.12-2-pve-signed: 6.8.12-2
proxmox-kernel-6.5.13-6-pve-signed: 6.5.13-6
proxmox-kernel-6.5: 6.5.13-6
proxmox-kernel-6.5.11-4-pve-signed: 6.5.11-4
ceph: 18.2.4-pve3
ceph-fuse: 18.2.4-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.2.0
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.10
libpve-cluster-perl: 8.0.10
libpve-common-perl: 8.2.9
libpve-guest-common-perl: 5.1.6
libpve-http-server-perl: 5.1.2
libpve-network-perl: 0.10.0
libpve-rs-perl: 0.9.1
libpve-storage-perl: 8.3.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.5.0-1
proxmox-backup-client: 3.3.2-1
proxmox-backup-file-restore: 3.3.2-2
proxmox-firewall: 0.6.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.3.1
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.4
pve-cluster: 8.0.10
pve-container: 5.2.3
pve-docs: 8.3.1
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.1.0
pve-firmware: 3.14-3
pve-ha-manager: 4.0.6
pve-i18n: 3.3.3
pve-qemu-kvm: 9.0.2-4
pve-xtermjs: 5.3.0-3
qemu-server: 8.3.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve1
PBS Information
  • proxmox-backup-manager version
Code:
proxmox-backup-manager version
proxmox-backup-server 3.3.2-1 running version: 3.2.2
 
So you are still at...
The error message I am getting is "Destination Host Unreachable".
...? For a simple "ping"?

In that case I would really like to see the output of both "Basic network information:" blocks from post #2 and also the actual ping command with the error message.


Just to reiterate the obvious --> if this...
There is no routing across subnets involved. The nodes and the PBS are all connected via a switch only.
...is correct then all systems using an IP address in the same IP-network should be able to reach it other. So my profane suspicion is that this is not the case.

Did you copy some configuration onto the new PBS? My usual pitfall is to forget to readjust all configuration to fit to the new system? All hostname occurrences, all IP addresses?

And as far as I can see no VLANs involved, right?
 
So you are still at...

...? For a simple "ping"?
Yes:
Code:
# ping 192.168.252.230
PING 192.168.252.230 (192.168.252.230) 56(84) bytes of data.
From 192.168.252.237 icmp_seq=1 Destination Host Unreachable
From 192.168.252.237 icmp_seq=2 Destination Host Unreachable
From 192.168.252.237 icmp_seq=3 Destination Host Unreachable
From 192.168.252.237 icmp_seq=4 Destination Host Unreachable
^C
--- 192.168.252.230 ping statistics ---
7 packets transmitted, 0 received, +4 errors, 100% packet loss, time 6108ms
pipe 3
Code:
# ping 192.168.252.236
PING 192.168.252.236 (192.168.252.236) 56(84) bytes of data.
64 bytes from 192.168.252.236: icmp_seq=1 ttl=64 time=0.296 ms
64 bytes from 192.168.252.236: icmp_seq=2 ttl=64 time=0.200 ms
64 bytes from 192.168.252.236: icmp_seq=3 ttl=64 time=0.423 ms
^C
--- 192.168.252.236 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2032ms
rtt min/avg/max/mdev = 0.200/0.306/0.423/0.091 ms
Just to reiterate the obvious --> if this...
...is correct then all systems using an IP address in the same IP-network should be able to reach it other. So my profane suspicion is that this is not the case.
Right? That's why I don't understand this. It simply should not happen.

Did you copy some configuration onto the new PBS? My usual pitfall is to forget to readjust all configuration to fit to the new system? All hostname occurrences, all IP addresses?

The PBS has remained the same. I replaced one of the PVE nodes. And apart from the config files that are automatically replicated between the nodes, I have not copied over any config. Particularly the network config I have done manually.

And as far as I can see no VLANs involved, right?
That's right, no vlans. (I have added that info now to my OP above as well).