[SOLVED] PVE can not connect to PBS; Error 500 can not get datastores

Jul 23, 2021
18
1
3
25
Hello everyone,

for my problem, i found this "old" thread but unfortunately, the only solution proposed by this thread was different from the original poster and mine, so i'd like to give this topic a bump.
I already had this problem in a different setup and was able to resolve it with a fresh install of PBS. This time however, a fresh install did not solve the problem, neither on PVE or PBS side. In the cluster, 2 out of 3 nodes can not connect to the PBS, one of these nodes was newly installed with PVE 7 at the time.

Apparently, according to ssh -vv no secure connection can be established between a PVE server and the PBS.

Attached files contain package versions of the PBS the two non-compliant PVE servers and the ssh -vv output.

I'd be very grateful for any comments and tipps.
 

Attachments

PVE -> PBS communication does not go over SSH..

do you get the error when attempting to setup/use PBS inside PVE, or also if you attempt to communicate with plain 'proxmox-backup-client' (e.g., proxmox-backup-client list --repository "USER@REALM@HOST:DATASTORE")
 
PVE -> PBS communication does not go over SSH..

do you get the error when attempting to setup/use PBS inside PVE, or also if you attempt to communicate with plain 'proxmox-backup-client' (e.g., proxmox-backup-client list --repository "USER@REALM@HOST:DATASTORE")
Yes i know, the test with ssh was a recommendation from someone to test connectivity in the first place. All nodes can ping each other and the PBS, but the two nodes that can not connect to the PBS can not ssh into it either. The node that works properly can also establish an ssh connection to the PBS.

And also yes, i get a timeout error when i try:

Bash:
root@middle-hp:~# proxmox-backup-client list --repository  10.0.1.11:Backups
Password for "root@pam": **************
Error: http request timed out
 
that does sound like a networking or firewall issue IMHO. note that there might be some rule that does not block ping, but does block all (or most) other traffic, which would match up with your symptoms.

I suggest the following next steps:
- check firewall rules (iptables-save is a good starting point)
- check network setup/topology (any hops in between that might do firewalling?)
- check with tcpdump on both ends - does the traffic reach PBS at all?
- check with nc -l -p 1234 and nc IP 1234 (provided no firewall rules block that ;))
 
that does sound like a networking or firewall issue IMHO. note that there might be some rule that does not block ping, but does block all (or most) other traffic, which would match up with your symptoms.

I suggest the following next steps:
- check firewall rules (iptables-save is a good starting point)
- check network setup/topology (any hops in between that might do firewalling?)
- check with tcpdump on both ends - does the traffic reach PBS at all?
- check with nc -l -p 1234 and nc IP 1234 (provided no firewall rules block that ;))
Just checked all of that:
- iptables was not installed in the first place on PBS, therefore iptables-save did not yield any output
- The Cluster and the PBS is only connected via a Juniper switch and 10G DAC cables; the traffic is contained in a VLAN on this switch. There's also definitively no firewalling happening on the switch
- tcpdump on the PBS while attempting to connect to it from a PVE host via CLI with proxmox-backup-client list --repository 10.0.1.11:Backups resulted in the attached txt file
- netcat (nc) in both directions did not yield any output, but if the sending side was closed with ctrl+c, the listening side also ended automatically.

Any more ideas?
Because otherwise i'm seriously considering a complete reinstall for all machines due to overall unexpected behavior as described in my other open thread on PVE...
 

Attachments

are you using jumbo frames? are you sure those are setup correctly everywhere (all hosts + switch ports)? (tcpdump has mss 8960, which would indicate MTU of 9000..)? could you re-run the tcpdump with tcpdump running on both nodes and filtering for 'host 10.0.1.9' (PBS) / 'host 10.0.1.1' (PVE) without 'src'?

nc with the commands I gave will simply mirror input on one end at the other, so no input == no output is expected. that the connection worked is already a hint that it's not networking in general that is broken ;) but a simple nc connection might not generate a big enough packet to trigger any MTU issues, for example. while a full TLS HTTP handshake easily can.
 
  • Like
Reactions: Spinning_rust
are you using jumbo frames? are you sure those are setup correctly everywhere (all hosts + switch ports)? (tcpdump has mss 8960, which would indicate MTU of 9000..)? could you re-run the tcpdump with tcpdump running on both nodes and filtering for 'host 10.0.1.9' (PBS) / 'host 10.0.1.1' (PVE) without 'src'?

nc with the commands I gave will simply mirror input on one end at the other, so no input == no output is expected. that the connection worked is already a hint that it's not networking in general that is broken ;) but a simple nc connection might not generate a big enough packet to trigger any MTU issues, for example. while a full TLS HTTP handshake easily can.
Yes, i was using jumbo frames and yes, using normal frames resolved the issue :D

The switch probably was not properly configured and causing the issue

Thanks a lot for the support, i'm marking this thread as solved.
 
  • Like
Reactions: fabian

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!