Proxmox Backup Server datastore status unknow/not active in 1 node out of 3 (cluster)

noire · Jul 31, 2023

Hi there,

i have successfully added the PBS datastore into the datacenter/cluster and backups jobs of 2 nodes (DENG-SRV + SRV1) out of 3 are up and running, while the 3rd node (SRV2) shows under "Status" enabled "yes" Active "no" giving out a question mark on the datastore name followed by "unknow" when hovering on it.

Therefore when the backup job is supposed to start on SRV2 node i get:
"TASK ERROR: could not activate storage 'PBS-DATASTORE': PBS-DATASTORE: error fetching datastores - 500 Can't connect to 192.168.1.30:8007 (Connection timed out)."
There is no IP address conflict.

I just added the PBS datatore right from datacenter > storage, all went automatically smooth for the DENG-SRV + SRV1 nodes but not as good for the SRV2 node. I've tried to remove the storage couple of times already out of the datacenter and re-added it.
Gave some (random)

systemctl restart pvedaemon
systemctl restart pveproxy
systemctl restart pvestatd

on the SRV2 node but nothing changed.

Running PVE 8.0.3 on all nodes (fresh installs) and a VM'ed PBS 3.0-1 (fresh install).
The datastore is a 30T NFS share hosted on Truenas Core 13 mounted on PBS. Only the PBS is allowed to access the resource.
I am assuming aint a permission issue coz 2 nodes are already backupping via PBS.

Attached a pic about GUI and fstab on SRV2 node.

Anyone has a clue?

Thanks!

Chris · Jul 31, 2023

noire said:
"TASK ERROR: could not activate storage 'PBS-DATASTORE': PBS-DATASTORE: error fetching datastores - 500 Can't connect to 192.168.1.30:8007 (Connection timed out)."

Hi,
since the other two nodes can reach the PBS, this might be a network/firewall configuration issue. Please post the output of the following commands from the node which has the connection issues

Code:

iptables-save
ip addr
ip route
cat /etc/network/interfaces

noire · Jul 31, 2023

Chris said:
Hi,
since the other two nodes can reach the PBS, this might be a network/firewall configuration issue. Please post the output of the following commands from the node which has the connection issues

Code:

iptables-save ip addr ip route cat /etc/network/interfaces

Hi Chris thanks a lot for your reply.
I just want to add that this morning til couple hours ago all the 3 nodes had PBS-DATASTORE in unknow status, i just logged in now and magically also SRV2 node has PSB-DATASTORE correctly mounted, but it takes a lot of time before showing its status (GUI wheel spins for approx 30-40 secs while for the other 2 nodes status is displayed instantly).

Anyway here is the output of SRV2 node as you requested:

Code:

root@SRV2:~# iptables-save
ip addr
ip route
cat /etc/network/interfaces
# Generated by iptables-save v1.8.9 on Mon Jul 31 11:44:40 2023
*raw
:PREROUTING ACCEPT [40909715:9347983927]
:OUTPUT ACCEPT [39632043:8416420367]
COMMIT
# Completed on Mon Jul 31 11:44:40 2023
# Generated by iptables-save v1.8.9 on Mon Jul 31 11:44:40 2023
*filter
:INPUT ACCEPT [40365502:9253155413]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [39632043:8416420367]
COMMIT
# Completed on Mon Jul 31 11:44:40 2023
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: enp37s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr2 state UP group default qlen 1000
    link/ether 04:7c:16:5b:80:4c brd ff:ff:ff:ff:ff:ff
3: enp42s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr1 state UP group default qlen 1000
    link/ether 04:7c:16:5b:80:4b brd ff:ff:ff:ff:ff:ff
4: enp35s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 6c:b3:11:3d:4d:5e brd ff:ff:ff:ff:ff:ff
5: enp36s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 6c:b3:11:3d:4d:5e brd ff:ff:ff:ff:ff:ff
6: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 04:7c:16:5b:80:4b brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.21/24 scope global vmbr1
       valid_lft forever preferred_lft forever
    inet6 fe80::67c:16ff:fe5b:804b/64 scope link
       valid_lft forever preferred_lft forever
7: vmbr2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 04:7c:16:5b:80:4c brd ff:ff:ff:ff:ff:ff
    inet6 fe80::67c:16ff:fe5b:804c/64 scope link
       valid_lft forever preferred_lft forever
8: tap104i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master fwbr104i0 state UNKNOWN group default qlen 1000
    link/ether 92:07:94:fb:18:6f brd ff:ff:ff:ff:ff:ff
9: fwbr104i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether a6:ff:73:90:e4:de brd ff:ff:ff:ff:ff:ff
10: fwpr104p0@fwln104i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr2 state UP group default qlen 1000
    link/ether 52:25:a0:55:a9:7e brd ff:ff:ff:ff:ff:ff
11: fwln104i0@fwpr104p0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master fwbr104i0 state UP group default qlen 1000
    link/ether 62:80:66:a3:09:42 brd ff:ff:ff:ff:ff:ff
12: tap105i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master fwbr105i0 state UNKNOWN group default qlen 1000
    link/ether 4a:f3:39:db:f7:a9 brd ff:ff:ff:ff:ff:ff
13: fwbr105i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 32:4a:75:51:12:86 brd ff:ff:ff:ff:ff:ff
14: fwpr105p0@fwln105i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr1 state UP group default qlen 1000
    link/ether da:89:d0:87:bc:2e brd ff:ff:ff:ff:ff:ff
15: fwln105i0@fwpr105p0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master fwbr105i0 state UP group default qlen 1000
    link/ether 22:6d:74:fc:43:46 brd ff:ff:ff:ff:ff:ff
16: tap106i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master fwbr106i0 state UNKNOWN group default qlen 1000
    link/ether be:76:6e:2c:62:6d brd ff:ff:ff:ff:ff:ff
17: fwbr106i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 8a:b8:15:d1:3a:66 brd ff:ff:ff:ff:ff:ff
18: fwpr106p0@fwln106i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr2 state UP group default qlen 1000
    link/ether 9e:85:d4:5f:72:f0 brd ff:ff:ff:ff:ff:ff
19: fwln106i0@fwpr106p0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master fwbr106i0 state UP group default qlen 1000
    link/ether a2:3c:d3:1a:2b:84 brd ff:ff:ff:ff:ff:ff
20: tap107i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master fwbr107i0 state UNKNOWN group default qlen 1000
    link/ether 0a:93:85:e6:a4:46 brd ff:ff:ff:ff:ff:ff
21: fwbr107i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 5e:91:a8:d7:00:27 brd ff:ff:ff:ff:ff:ff
22: fwpr107p0@fwln107i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr1 state UP group default qlen 1000
    link/ether 1a:b5:f1:bd:6f:96 brd ff:ff:ff:ff:ff:ff
23: fwln107i0@fwpr107p0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master fwbr107i0 state UP group default qlen 1000
    link/ether 1e:61:8e:00:2b:0d brd ff:ff:ff:ff:ff:ff
default via 192.168.1.1 dev vmbr1 proto kernel onlink
192.168.1.0/24 dev vmbr1 proto kernel scope link src 192.168.1.21
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

iface enp37s0 inet manual
#ONBOARD NIC 1GBE

iface enp42s0 inet manual
#ONBOARD NIC 2.5GBE

iface enp35s0 inet manual

iface enp36s0 inet manual

auto vmbr1
iface vmbr1 inet static
    address 192.168.1.21/24
    gateway 192.168.1.1
    bridge-ports enp42s0
    bridge-stp off
    bridge-fd 0
#ONBOARD NIC 2.5GBE

auto vmbr2
iface vmbr2 inet manual
    bridge-ports enp37s0
    bridge-stp off
    bridge-fd 0
#ONBOARD NIC 1GBE

Thanks!

Chris · Jul 31, 2023

Anything in the journal around the time the issue shows up? Please post journalctl --since <DATETIME> --until <DATETIME> for the time-span of interest.

noire · Jul 31, 2023

Hi Chris here you go (node SRV2). Job was planned to start at 02:10

Code:

root@SRV2:~# journalctl --since "2023-07-31 02:08:00" --until "2023-07-31 02:12:00"
Jul 31 02:08:02 SRV2 pvestatd[1266]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Jul 31 02:08:09 SRV2 pvestatd[1266]: PBS-DATASTORE: error fetching datastores - 500 Can't connect to 192.168.1.30:8007 (Connection timed out)
Jul 31 02:08:09 SRV2 pvestatd[1266]: status update time (7.219 seconds)
Jul 31 02:08:19 SRV2 pvestatd[1266]: PBS-DATASTORE: error fetching datastores - 500 Can't connect to 192.168.1.30:8007 (Connection timed out)
Jul 31 02:08:19 SRV2 pvestatd[1266]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Jul 31 02:08:19 SRV2 pvestatd[1266]: status update time (7.219 seconds)
Jul 31 02:08:22 SRV2 pvestatd[1266]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Jul 31 02:08:29 SRV2 pvestatd[1266]: PBS-DATASTORE: error fetching datastores - 500 Can't connect to 192.168.1.30:8007 (Connection timed out)
Jul 31 02:08:29 SRV2 pvestatd[1266]: status update time (7.195 seconds)
Jul 31 02:08:33 SRV2 pvestatd[1266]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Jul 31 02:08:40 SRV2 pvestatd[1266]: PBS-DATASTORE: error fetching datastores - 500 Can't connect to 192.168.1.30:8007 (Connection timed out)
Jul 31 02:08:40 SRV2 pvestatd[1266]: status update time (7.197 seconds)
Jul 31 02:08:42 SRV2 pvestatd[1266]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Jul 31 02:08:49 SRV2 pvestatd[1266]: PBS-DATASTORE: error fetching datastores - 500 Can't connect to 192.168.1.30:8007 (Connection timed out)
Jul 31 02:08:49 SRV2 pvestatd[1266]: status update time (7.227 seconds)
Jul 31 02:08:59 SRV2 pvestatd[1266]: PBS-DATASTORE: error fetching datastores - 500 Can't connect to 192.168.1.30:8007 (Connection timed out)
Jul 31 02:08:59 SRV2 pvestatd[1266]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Jul 31 02:08:59 SRV2 pvestatd[1266]: status update time (7.255 seconds)
Jul 31 02:09:09 SRV2 pvestatd[1266]: PBS-DATASTORE: error fetching datastores - 500 Can't connect to 192.168.1.30:8007 (Connection timed out)
Jul 31 02:09:09 SRV2 pvestatd[1266]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Jul 31 02:09:09 SRV2 pvestatd[1266]: status update time (7.271 seconds)
Jul 31 02:09:19 SRV2 pvestatd[1266]: PBS-DATASTORE: error fetching datastores - 500 Can't connect to 192.168.1.30:8007 (Connection timed out)
Jul 31 02:09:20 SRV2 pvestatd[1266]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Jul 31 02:09:20 SRV2 pvestatd[1266]: status update time (7.187 seconds)
Jul 31 02:09:22 SRV2 pvestatd[1266]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Jul 31 02:09:29 SRV2 pvestatd[1266]: PBS-DATASTORE: error fetching datastores - 500 Can't connect to 192.168.1.30:8007 (Connection timed out)
Jul 31 02:09:29 SRV2 pvestatd[1266]: status update time (7.247 seconds)
Jul 31 02:09:39 SRV2 pvestatd[1266]: PBS-DATASTORE: error fetching datastores - 500 Can't connect to 192.168.1.30:8007 (Connection timed out)
Jul 31 02:09:39 SRV2 pvestatd[1266]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Jul 31 02:09:39 SRV2 pvestatd[1266]: status update time (7.225 seconds)
Jul 31 02:09:42 SRV2 pvestatd[1266]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Jul 31 02:09:49 SRV2 pvestatd[1266]: PBS-DATASTORE: error fetching datastores - 500 Can't connect to 192.168.1.30:8007 (Connection timed out)
Jul 31 02:09:49 SRV2 pvestatd[1266]: status update time (7.196 seconds)
Jul 31 02:09:52 SRV2 pvestatd[1266]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Jul 31 02:09:59 SRV2 pvestatd[1266]: PBS-DATASTORE: error fetching datastores - 500 Can't connect to 192.168.1.30:8007 (Connection timed out)
Jul 31 02:09:59 SRV2 pvestatd[1266]: status update time (7.231 seconds)
Jul 31 02:10:00 SRV2 pvescheduler[1922678]: <root@pam> starting task UPID:SRV2:001D5677:0451D9CA:64C6FBD8:vzdump::root@pam:
Jul 31 02:10:07 SRV2 pvescheduler[1922679]: could not activate storage 'PBS-DATASTORE': PBS-DATASTORE: error fetching datastores - 500 Can't connect to 192.168.1.30:800>
Jul 31 02:10:07 SRV2 postfix/pickup[1911108]: 3982E100ECD: uid=0 from=<root>
Jul 31 02:10:07 SRV2 postfix/cleanup[1922700]: 3982E100ECD: message-id=<20230731001007.3982E100ECD@SRV2.SRV2>
Jul 31 02:10:07 SRV2 postfix/qmgr[1235]: 3982E100ECD: from=<root@SRV2.SRV2>, size=1639, nrcpt=1 (queue active)
Jul 31 02:10:07 SRV2 postfix/smtp[1922702]: 3982E100ECD: to=<soc-2@srl.it>, relay=mx.srl.it[123.123.123.123]:25, delay=0.07, delays=0.01/0/0.04/0.01, dsn=5.1.0>
Jul 31 02:10:07 SRV2 postfix/cleanup[1922700]: 4C589100ECE: message-id=<20230731001007.4C589100ECE@SRV2.SRV2>
Jul 31 02:10:07 SRV2 postfix/bounce[1922703]: 3982E100ECD: sender non-delivery notification: 4C589100ECE
Jul 31 02:10:07 SRV2 postfix/qmgr[1235]: 4C589100ECE: from=<>, size=3636, nrcpt=1 (queue active)
Jul 31 02:10:07 SRV2 postfix/qmgr[1235]: 3982E100ECD: removed
Jul 31 02:10:07 SRV2 proxmox-mail-fo[1922705]: SRV2 proxmox-mail-forward[1922705]: forward mail to <services@srl.it>
Jul 31 02:10:07 SRV2 postfix/pickup[1911108]: 5003D100ECD: uid=65534 from=<root>
Jul 31 02:10:07 SRV2 postfix/cleanup[1922700]: 5003D100ECD: message-id=<20230731001007.4C589100ECE@SRV2.SRV2>
Jul 31 02:10:07 SRV2 postfix/local[1922704]: 4C589100ECE: to=<root@SRV2.SRV2>, relay=local, delay=0.02, delays=0/0/0/0.01, dsn=2.0.0, status=sent (delivered to command:>
Jul 31 02:10:07 SRV2 postfix/qmgr[1235]: 4C589100ECE: removed
Jul 31 02:10:07 SRV2 postfix/qmgr[1235]: 5003D100ECD: from=<root@SRV2.SRV2>, size=3807, nrcpt=1 (queue active)
Jul 31 02:10:07 SRV2 postfix/smtp[1922702]: 5003D100ECD: to=<services@srl.it>, relay=mx.rl.it[123.123.123.123]:25, delay=0.06, delays=0.01/0/0.04/0.01, dsn=5.>
Jul 31 02:10:07 SRV2 postfix/qmgr[1235]: 5003D100ECD: removed
Jul 31 02:10:07 SRV2 postfix/cleanup[1922700]: 60B5F100ECE: message-id=<20230731001007.60B5F100ECE@SRV2.SRV2>
Jul 31 02:10:09 SRV2 pvestatd[1266]: PBS-DATASTORE: error fetching datastores - 500 Can't connect to 192.168.1.30:8007 (Connection timed out)
Jul 31 02:10:10 SRV2 pvestatd[1266]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Jul 31 02:10:10 SRV2 pvestatd[1266]: status update time (7.232 seconds)
Jul 31 02:10:12 SRV2 pvestatd[1266]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Jul 31 02:10:19 SRV2 pvestatd[1266]: PBS-DATASTORE: error fetching datastores - 500 Can't connect to 192.168.1.30:8007 (Connection timed out)
Jul 31 02:10:19 SRV2 pvestatd[1266]: status update time (7.264 seconds)
Jul 31 02:10:22 SRV2 pvestatd[1266]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Jul 31 02:10:29 SRV2 pvestatd[1266]: PBS-DATASTORE: error fetching datastores - 500 Can't connect to 192.168.1.30:8007 (Connection timed out)
Jul 31 02:10:29 SRV2 pvestatd[1266]: status update time (7.249 seconds)
Jul 31 02:10:32 SRV2 pvestatd[1266]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Jul 31 02:10:39 SRV2 pvestatd[1266]: PBS-DATASTORE: error fetching datastores - 500 Can't connect to 192.168.1.30:8007 (Connection timed out)
Jul 31 02:10:39 SRV2 pvestatd[1266]: status update time (7.224 seconds)
Jul 31 02:10:43 SRV2 pvestatd[1266]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Jul 31 02:10:50 SRV2 pvestatd[1266]: PBS-DATASTORE: error fetching datastores - 500 Can't connect to 192.168.1.30:8007 (Connection timed out)
Jul 31 02:10:50 SRV2 pvestatd[1266]: status update time (7.224 seconds)
Jul 31 02:10:52 SRV2 pvestatd[1266]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Jul 31 02:10:59 SRV2 pvestatd[1266]: PBS-DATASTORE: error fetching datastores - 500 Can't connect to 192.168.1.30:8007 (Connection timed out)
Jul 31 02:10:59 SRV2 pvestatd[1266]: status update time (7.243 seconds)
Jul 31 02:11:02 SRV2 pvestatd[1266]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Jul 31 02:11:09 SRV2 pvestatd[1266]: PBS-DATASTORE: error fetching datastores - 500 Can't connect to 192.168.1.30:8007 (Connection timed out)
Jul 31 02:11:09 SRV2 pvestatd[1266]: status update time (7.228 seconds)
Jul 31 02:11:19 SRV2 pvestatd[1266]: PBS-DATASTORE: error fetching datastores - 500 Can't connect to 192.168.1.30:8007 (Connection timed out)
Jul 31 02:11:19 SRV2 pvestatd[1266]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Jul 31 02:11:19 SRV2 pvestatd[1266]: status update time (7.234 seconds)
Jul 31 02:11:29 SRV2 pvestatd[1266]: PBS-DATASTORE: error fetching datastores - 500 Can't connect to 192.168.1.30:8007 (Connection timed out)
Jul 31 02:11:30 SRV2 pvestatd[1266]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Jul 31 02:11:30 SRV2 pvestatd[1266]: status update time (7.203 seconds)
Jul 31 02:11:39 SRV2 pvestatd[1266]: PBS-DATASTORE: error fetching datastores - 500 Can't connect to 192.168.1.30:8007 (Connection timed out)
Jul 31 02:11:39 SRV2 pvestatd[1266]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Jul 31 02:11:39 SRV2 pvestatd[1266]: status update time (7.236 seconds)
Jul 31 02:11:42 SRV2 pvestatd[1266]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Jul 31 02:11:49 SRV2 pvestatd[1266]: PBS-DATASTORE: error fetching datastores - 500 Can't connect to 192.168.1.30:8007 (Connection timed out)
Jul 31 02:11:49 SRV2 pvestatd[1266]: status update time (7.193 seconds)
Jul 31 02:11:52 SRV2 pvestatd[1266]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5
Jul 31 02:11:59 SRV2 pvestatd[1266]: PBS-DATASTORE: error fetching datastores - 500 Can't connect to 192.168.1.30:8007 (Connection timed out)
Jul 31 02:11:59 SRV2 pvestatd[1266]: status update time (7.206 seconds)

Chris · Jul 31, 2023

command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: exit code 5

These errors seem unrelated, they should be related to a LVM storage, but your local-lvm seems to be fine? Check via pvesm status. Can you also share the storage config cat /etc/pve/storage.cfg and double check the arp neighbors via ip neigh.

Is there something in the logs on the PBS side? The symptoms you describe fit to duplicate IP addresses, please double check if you can ping the host and maybe check via tcpdump that the pakets are reaching the correct host.

noire · Aug 2, 2023

Hey Chris, sorry for the late answer was travelling.

I have shutdown the PBS and pinged its IP address to check if there could be a conflict but got no answer to ping.
I have shutdown the TrueNAS and pinged its IP address to check if there could be a conflict but got no answer to ping.

The PBS NFS share it is constantly questiomarking itself and then reappearing as correctly reachable, now it is happening on all nodes even on a 4th node (SRV3) which i just added today.

As far i can understand PBS aint showing no errors in its Administration > Syslogs. I have even just restored a backup from 07/31 but right after PBS share went on questionmark again

Here is pvesm status SRV2 node

Here is SRV2 node cat /etc/pve/storage.cfg

Here is ip neigh SRV2 node

Thanks

Chris · Aug 3, 2023

noire said:
Hey Chris, sorry for the late answer was travelling.

I have shutdown the PBS and pinged its IP address to check if there could be a conflict but got no answer to ping.
I have shutdown the TrueNAS and pinged its IP address to check if there could be a conflict but got no answer to ping.

The PBS NFS share it is constantly questiomarking itself and then reappearing as correctly reachable, now it is happening on all nodes even on a 4th node (SRV3) which i just added today.

View attachment 53766

As far i can understand PBS aint showing no errors in its Administration > Syslogs. I have even just restored a backup from 07/31 but right after PBS share went on questionmark again

View attachment 53764

Here is pvesm status SRV2 node
View attachment 53760

Here is SRV2 node cat /etc/pve/storage.cfg
View attachment 53761

Here is ip neigh SRV2 node

View attachment 53762

Thanks

There is an intermitten connectivity issue. Is the PBS VM part of a backup job itself? Is the PBS WebUI reachable while the node shows the connection errors? Try pinging the VM while the issue is visible.

noire · Aug 3, 2023

Hey Chris, the PBS webui becomes unresponsive after a while i've just checked and the PBS itself isnt being backed up. I tried to log in via shell and started to ping around and everything works but the webui, restarted and it worked again. I have then made a fresh install and after the installation reboot it wasnt reachable again, so logged in via shell and gave a reboot and it was reachable again via webui. So now im setting up all from scratch

noire · Aug 7, 2023

Hey Chris after many reinstalls, PBS its always not connecting to webui. I have even tried to reinstall it on different nodes but no luck. Should i go bare metal? Thanks

Chris · Aug 8, 2023

noire said:
Hey Chris after many reinstalls, PBS its always not connecting to webui. I have even tried to reinstall it on different nodes but no luck. Should i go bare metal? Thanks

You could try a bare metal install, yes. However, I find it strange that there are no errors in the journal of you PBS VM which might indicate where the issue is. Please attach the full journal since boot once the issue appears again, without prior reboot. You can get it on the cli via journalctl -b > journal.txt

uzair · May 1, 2024

Hello,

Sorry for intervening during your conversation. I am also facing somewhat similar issue to the one which noire is facing. I'm not sure what routing issue is there as one of my other node is working fine and I am facing issue with this node only. Your assistance will be highly appreciated. Below attached is the screenshot.

uzair · May 1, 2024

Also I see that I have some pending changes to be applied in my PBS but what is making me confuse is that my other node is still working without applying these pending changes. Till now I haven't applied these changes as I am not sure what effect this could have on my other nodes.

Chris · May 2, 2024

uzair said:
Also I see that I have some pending changes to be applied in my PBS but what is making me confuse is that my other node is still working without applying these pending changes. Till now I haven't applied these changes as I am not sure what effect this could have on my other nodes.

View attachment 67353

Hi,
can you ping the PBS host from the PVE host? The screenshot might hint us that your current IP configuration does not match what you expect it to be. Please check the output of ip address on the PBS host and double check the storage configuration on both PVE hosts by running cat /etc/pve/storage.cfg.

uzair · May 2, 2024

Hi,

yes, I am able to ping my ipv4 address of the PBS and below is the storage configuration output from the PVE hosts with one of them is working fine and the other one is not.Thank you for your timely response.

Configuration output of Node A for which PBS datastore is showing question mark.

dir: local
path /var/lib/vz
content iso,backup,vztmpl

lvmthin: local-lvm
thinpool data
vgname pve
content rootdir,images

zfspool: HDD_POOL
pool HDD_POOL
content images,rootdir
mountpoint /HDD_POOL
nodes node-D

pbs: BACKUPs_POOL
datastore Storage01
server fd00:dc:ce:192:168:90::15
content backup
fingerprint 0a:65:70:f3:0d:6d:69:d4:5c:60:30:9a:ae:04:79:6b:bf:1d:68:8b:ad:65:d5:b1:cc:97:70:8f:78:4c:7d:48
prune-backups keep-all=1
username vebackups@pbs

zfspool: local_ssd_mirror
pool local_ssd_mirror
content images,rootdir
mountpoint /local_ssd_mirror
nodes node-A

dir: nodeD-zfs-backups
path /HDD_POOL/backups
content backup
nodes node-D
prune-backups keep-all=1
shared 0

Configuration of Node D for which PBS datastore is working fine.
dir: local
path /var/lib/vz
content iso,backup,vztmpl

lvmthin: local-lvm
thinpool data
vgname pve
content rootdir,images

zfspool: HDD_POOL
pool HDD_POOL
content images,rootdir
mountpoint /HDD_POOL
nodes node-D

pbs: BACKUPs_POOL
datastore Storage01
server fd00:dc:ce:192:168:90::15
content backup
fingerprint 0a:65:70:f3:0d:6d:69:d4:5c:60:30:9a:ae:04:79:6b:bf:1d:68:8b:ad:65:d5:b1:cc:97:70:8f:78:4c:7d:48
prune-backups keep-all=1
username vebackups@pbs

zfspool: local_ssd_mirror
pool local_ssd_mirror
content images,rootdir
mountpoint /local_ssd_mirror
nodes node-A

dir: nodeD-zfs-backups
path /HDD_POOL/backups
content backup
nodes node-D
prune-backups keep-all=1
shared 0

Chris · May 2, 2024

uzair said:
ping my ipv4 address of the PBS

But you are using IPv6 to connect to the host... Check IPv6 connectivity or use IPv4 to connect to the host instead.

uzair · May 2, 2024

I am not able to receive ping response from Node A to PBS. Below is the output. Can you please suggest what changes do I need to make in order for it to work ? I am new to Proxmox environment so I don't have in-depth knowledge with the below mentioned settings in the screenshots. Moreover, just for your info we have a very critical VM running in Node A so this is why I am reluctant in doing some testing myself and asking expert opinion. Thanks for the timely response.

Ping from Node A to PBS.

Network configuration of Node A

Network configuration of Node D which is working fine with PBS

Chris · May 2, 2024

uzair said:
I am not able to receive ping response from Node A to PBS. Below is the output. Can you please suggest what changes do I need to make in order for it to work ? I am new to Proxmox environment so I don't have in-depth knowledge with the below mentioned settings in the screenshots. Moreover, just for your info we have a very critical VM running in Node A so this is why I am reluctant in doing some testing myself and asking expert opinion. Thanks for the timely response.

Ping from Node A to PBS.

View attachment 67414

Network configuration of Node A
View attachment 67415

Network configuration of Node D which is working fine with PBS
View attachment 67416

Well,
not knowing about your network layout, I can only guess based on what you provided. Is the IPv6 setup correct? Double check network mask, routes ecc., eventual firewall rules blocking traffic... Is your physical setup correct? I see you attach enp1s0f1 as bridge port to vmbr0 on node D, however enp1s0f0 on node A. You can use ethtool -p <ifname> to identify the NIC by blinking its LED.

uzair · May 2, 2024

There are no firewall rules. Thank you Chris, I will again go through all these settings.

uzair · May 2, 2024

I have checked routing as well in the Node A and it is correctly set up whereas about firewall there are no rules configured and physical connectivity is also rightly setup. Moreover, I have verified ipv6 addresses as well and those are also correctly set up but the error which I am facing is stating "no route to host" as described above. Any suggestions ?

Proxmox Backup Server datastore status unknow/not active in 1 node out of 3 (cluster)

New Member

Attachments

Proxmox Staff Member

New Member

Proxmox Staff Member

New Member

Proxmox Staff Member

New Member

Attachments

Proxmox Staff Member

New Member

New Member

Proxmox Staff Member

Member

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Member