Proxmox Backup Server datastore status unknow/not active in 1 node out of 3 (cluster)

Chris · May 2, 2024

uzair said:
I have checked routing as well in the Node A and it is correctly set up whereas about firewall there are no rules configured and physical connectivity is also rightly setup. Moreover, I have verified ipv6 addresses as well and those are also correctly set up but the error which I am facing is stating "no route to host" as described above. Any suggestions ?

View attachment 67429

So this would suggest that the route chosen is via vmbr1, which according to your comments is a dedicated network for Ceph, so I guess that is not want you intended. Shouldn't the traffic to the PBS go via vmbr0? It would make things easier to use different subnets for different interfaces. Also, I just now noticed that you have an OVS Bridge on node A and a Linux bridge on node B. That should however not influence routing I guess.

uzair · May 2, 2024

I have a node D configured in a similar manner with the only difference is that in Node D there is no OVS bridge and rather linux bridge has been used. In both the nodes the route chosen is via vmbr1. Also about the comment part that interface is dedicated for backups in general.

The only difference between the two nodes that I can see is that of the OVS bridge but what I have read so far is OVS offers better throughput and latency compared to the linux bridge so l think this wouldn't cause this trouble like you mentioned.

uzair · May 3, 2024

The error "BACKUPs_POOL: error fetching datastores - 500 Can't connect to [fd00:dc:ce:192:168:90::15]:8007 (No route to host) (500)" is indicating the issue with the routing but the route is already configured. Any valuable suggestions would be highly appreciated.

Chris · May 3, 2024

uzair said:
The error "BACKUPs_POOL: error fetching datastores - 500 Can't connect to [fd00:dc:ce:192:168:90::15]:8007 (No route to host) (500)" is indicating the issue with the routing but the route is already configured. Any valuable suggestions would be highly appreciated.

Please post the output of ip address, ip route and ip neigh in code tags, so we see the current state of you network on the node not able to connect to the PBS host.

uzair · May 3, 2024

--> IP neigh

Code:

root@node-A:~# ip neigh
192.168.90.233 dev vmbr0 lladdr 5e:cd:44:8d:cf:ac STALE
192.168.90.1 dev vmbr0 lladdr e4:8d:8c:3a:ec:89 REACHABLE
192.168.90.14 dev vmbr0 lladdr 0c:c4:7a:32:6c:9b REACHABLE
192.168.90.4 dev vmbr0 lladdr 66:db:39:c6:81:8a STALE
192.168.90.15 dev vmbr0 lladdr 9c:5c:8e:50:38:c9 STALE
192.168.90.2 dev vmbr0 lladdr 66:98:96:83:71:2c STALE
192.168.90.12 dev vmbr0 lladdr 0c:c4:7a:32:0e:3e STALE
192.168.90.10 dev vmbr0 lladdr a6:a1:bf:8a:b3:5a STALE
192.168.90.13 dev vmbr0  FAILED
fe80::6498:96ff:fe83:712c dev vmbr0 lladdr 66:98:96:83:71:2c STALE
fd00:dc:ce:192:168:90:0:12 dev vmbr1  INCOMPLETE
fe80::9e5c:8eff:fe50:38c9 dev vmbr0 lladdr 9c:5c:8e:50:38:c9 STALE
fd00:dc:ce:192:168:90:0:15 dev vmbr1  FAILED
fd00:dc:cc:192:168:90:0:22 dev enp1s0f1  INCOMPLETE
fe80::5ccd:44ff:fe8d:cfac dev vmbr0 lladdr 5e:cd:44:8d:cf:ac STALE
fd00:dc:cc:192:168:90:0:12 dev enp1s0f1 lladdr 0c:c4:7a:32:0e:3f REACHABLE
fe80::1a66:daff:fe75:5c8c dev vmbr0 lladdr 18:66:da:75:5c:8c STALE
fe80::ec4:7aff:fe32:6c9a dev enp1s0f1 lladdr 0c:c4:7a:32:6c:9a REACHABLE
fe80::c4df:f1ff:fef8:b6b5 dev vmbr0 lladdr c6:df:f1:f8:b6:b5 STALE
fe80::a4a1:bfff:fe8a:b35a dev vmbr0 lladdr a6:a1:bf:8a:b3:5a STALE
fd00:dc:ce:192:168:90:0:14 dev vmbr1 lladdr b4:96:91:29:bb:96 router STALE
fd00:dc:cc:192:168:90:0:24 dev enp1s0f1  INCOMPLETE
fe80::2cc5:d2ff:fe93:15a8 dev vmbr1 lladdr 2e:c5:d2:93:15:a8 STALE
fd00:dc:cc:192:168:90:0:14 dev enp1s0f1 lladdr 0c:c4:7a:32:6c:9a REACHABLE
fe80::b696:91ff:fe29:bb96 dev vmbr1 lladdr b4:96:91:29:bb:96 router STALE
fe80::ec4:7aff:fe32:6d5d dev enp1s0f1 lladdr 0c:c4:7a:32:6d:5d STALE
fd00:dc:ce:192:168:90:0:13 dev vmbr1  router INCOMPLETE
fd00:dc:cc:192:168:90:0:23 dev enp1s0f1  INCOMPLETE
fe80::ec4:7aff:fe32:e3f dev enp1s0f1 lladdr 0c:c4:7a:32:0e:3f REACHABLE
fd00:dc:cc:192:168:90:0:13 dev enp1s0f1  INCOMPLETE

--> IP address

Code:

root@node-A:~# ip address
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP group default qlen 1000
    link/ether b4:96:91:29:bb:2e brd ff:ff:ff:ff:ff:ff
    inet6 fe80::b696:91ff:fe29:bb2e/64 scope link
       valid_lft forever preferred_lft forever
3: enp1s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr0 state UP group default qlen 1000
    link/ether 0c:c4:7a:32:6c:9c brd ff:ff:ff:ff:ff:ff
4: enp1s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 0c:c4:7a:32:6c:9d brd ff:ff:ff:ff:ff:ff
    inet6 fd00:dc:cc:192:168:90:0:11/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::ec4:7aff:fe32:6c9d/64 scope link
       valid_lft forever preferred_lft forever
5: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether b2:25:76:19:5e:b8 brd ff:ff:ff:ff:ff:ff
6: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether b4:96:91:29:bb:2e brd ff:ff:ff:ff:ff:ff
    inet6 fd00:dc:ce:192:168:90:0:11/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::5007:a6ff:feec:5e4c/64 scope link
       valid_lft forever preferred_lft forever
7: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 0c:c4:7a:32:6c:9c brd ff:ff:ff:ff:ff:ff
    inet 192.168.90.11/24 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet6 fd00:dc:cc:192:168:90:0:21/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::ec4:7aff:fe32:6c9c/64 scope link
       valid_lft forever preferred_lft forever
9: vmbr0v704: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether f2:bc:30:5a:16:65 brd ff:ff:ff:ff:ff:ff
10: enp1s0f0.704@enp1s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0v704 state UP group default qlen 1000
    link/ether 0c:c4:7a:32:6c:9c brd ff:ff:ff:ff:ff:ff
17: tap101i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0v704 state UNKNOWN group default qlen 1000
    link/ether ae:32:36:93:e0:c1 brd ff:ff:ff:ff:ff:ff

--> Route

Code:

root@node-A:~# ip route
default via 192.168.90.1 dev vmbr0 proto kernel onlink
192.168.90.0/24 dev vmbr0 proto kernel scope link src 192.168.90.11

Chris · May 4, 2024

Can you exclude that this is a bad cable? Can you reach other hosts in the same subnet via IPv6 ping?

uzair · May 4, 2024

Didn't get your statement regarding excluding this is a bad cable. Yes, I can ping. Below is the screenshot of ping from Node A to Node D. Moreover, in the "ip neigh" table I see that for PBS ipv6 "fd00:dc:ce:192:168:90::15" the neighboring status is failed although the network settings seems correct as of now.

Chris · May 4, 2024

uzair said:
Didn't get your statement regarding excluding this is a bad cable. Yes, I can ping. Below is the screenshot of ping from Node A to Node D. Moreover, in the "ip neigh" table I see that for PBS ipv6 "fd00:dc:ce:192:168:90::15" the neighboring status is failed although the network settings seems correct as of now.

View attachment 67518

What I meant is if you can exclude that there is a bad cable, not allowing you to correctly send packets via that interface. But since you can ping other hosts via that interface, that is not the case. So it seems that the issue is rather on the PBS side then. Try to dump the icmp packets on the PBS side via tcpdump to see if the packets are incoming on the PBS or if the connection issue is somewhere in-between.

uzair · May 5, 2024

I'm not sure with the way I did tcpdump but below is the output from the PBS of the tcpdump.

Interface settings on PBS

ens19 configured with ipv6 and I see that there are some pending changes here as well for ipv6 but I haven't pressed apply configuration as yet to avoid any further issues and as I'm already able to ping ipv6 of PBS from Node D so that is also one of the reasons that I didn't apply configuration.

Chris · May 6, 2024

uzair said:
I'm not sure with the way I did tcpdump but below is the output from the PBS of the tcpdump.

View attachment 67604

Interface settings on PBS

ens19 configured with ipv6 and I see that there are some pending changes here as well for ipv6 but I haven't pressed apply configuration as yet to avoid any further issues and as I'm already able to ping ipv6 of PBS from Node D so that is also one of the reasons that I didn't apply configuration.
View attachment 67605

What I meant was to check if the ping packets from node A reach the PBS host, so dump tcpdump -i vmbr0 icmp6 on the PBS while pinging the host. Is there a firewall setup on the PBS host? How are node A and the PBS host connected?

uzair · May 6, 2024

No, there isn't any firewall setup on the PBS host. PBS and Node A are connected via ens19 interface just like Node D is connected to the PBS. Below is the output from the PBS for "tcpdump -i ens19 icmp6". I have tried this for Node D and I was able to capture the packets but not for Node A.

Search

Search

Proxmox Backup Server datastore status unknow/not active in 1 node out of 3 (cluster)

Chris

Proxmox Staff Member

uzair

Member

uzair

Member

Chris

Proxmox Staff Member

uzair

Member

Chris

Proxmox Staff Member

uzair

Member

Chris

Proxmox Staff Member

uzair

Member

Chris

Proxmox Staff Member

uzair

Member

We value your privacy