Frequent errors after Upgrade to PVE8.0.3/PBS3.0-1

Jun 28, 2023
17
0
1
Hello,
after we upgraded our PVE to 8.0.3 we got frequent errormessages on all PVE nodes.
We didnt got problems so far, the GUI dont indicates any errors and is accessable. The Backups and Restores still running fine.
We didnt changed any configuration or network settings.
We didnt got any of these errors before.

So we tried to fix the problem with an PBS Upgrade to 3.0-1. Nothing changed. Still errors:

Code:
2023-07-16T00:03:48.880490+02:00 proxmox-05 pvestatd[1679]: pbs-01-365: error fetching datastores - 500 Can't connect to 192.168.19.2:8007 (Connection reset by peer)
2023-07-16T00:04:08.966170+02:00 proxmox-05 pvestatd[1679]: pbs-01-365: error fetching datastores - 500 Can't connect to 192.168.19.2:8007 (Connection reset by peer)
2023-07-16T00:10:58.590090+02:00 proxmox-05 pvestatd[1679]: pbs-01-90: error fetching datastores - 500 Can't connect to 192.168.19.2:8007 (Connection reset by peer)
2023-07-16T00:15:28.800254+02:00 proxmox-05 pvestatd[1679]: proxmox-backup-client failed: Error: error trying to connect: Connection reset by peer (os error 104)
2023-07-16T00:22:18.469544+02:00 proxmox-05 pvestatd[1679]: pbs-01-30: error fetching datastores - 500 Can't connect to 192.168.19.2:8007 (Connection reset by peer)
2023-07-16T00:44:39.148728+02:00 proxmox-05 pvestatd[1679]: pbs-01-90: error fetching datastores - 500 Can't connect to 192.168.19.2:8007 (Connection reset by peer)
2023-07-16T00:45:48.661338+02:00 proxmox-05 pvestatd[1679]: pbs-01-365: error fetching datastores - 500 Can't connect to 192.168.19.2:8007 (Connection reset by peer)
2023-07-16T00:46:48.384470+02:00 proxmox-05 pvestatd[1679]: pbs-01-365: error fetching datastores - 500 Can't connect to 192.168.19.2:8007 (Connection reset by peer)
2023-07-16T00:51:19.271210+02:00 proxmox-05 pvestatd[1679]: proxmox-backup-client failed: Error: error trying to connect: Connection reset by peer (os error 104)
2023-07-16T00:57:58.551048+02:00 proxmox-05 pvestatd[1679]: pbs-01-180: error fetching datastores - 500 Can't connect to 192.168.19.2:8007 (Connection reset by peer)

As you can see these errors happen very often.
On the PBS side we cant see any errors that can help us.

How can we go further?
 
Last edited:
this seems like an occasional network hickup - is the "pbs-01" the same for all those storage definitions?
 
this seems like an occasional network hickup - is the "pbs-01" the same for all those storage definitions?
Yes. All PVE Nodes use the PBS-01 host.

On the network side we dont changed anything. These messages appear sychronized with the update, so that cant be coincidence.
We searched logs from the past and these errors not appere once.

We would also expect an error while using the webinterface or doing tasks? Shouldn't be like that? Our network monitoring didn't see any problems either.
 
well, pvestatd will query each storage every 10s, and you only see the error every few minutes and only for a single storage at a time, so it's only a tiny percentage of requests that get their connection reset. any errors on the PBS side in the access log?
 
well, pvestatd will query each storage every 10s, and you only see the error every few minutes and only for a single storage at a time, so it's only a tiny percentage of requests that get their connection reset. any errors on the PBS side in the access log?
Thanks for these information!
At the Accesslogs we didnt see any error. All 200.

We added additional monitoring for these explicit case and you are right. All our PVE Nodes got rarely network problems between the PBS-01 host and themselfs. We will look further into it.

Nevertheless we find the behavior strange because we never got these messages before, but directly after the upgrade.
Are these checks changed with the PVE-Upgrade?
Further, any idea why the errorcodes are different from another? Does exist multiple checks here?
 
no, but the different error messages might just be different places where the error is encountered. I don't think there was any change directly related to that part of the code, but of course, the whole Debian base changed, so there is lots of components that might have slightly different behaviour now..
 
no, but the different error messages might just be different places where the error is encountered. I don't think there was any change directly related to that part of the code, but of course, the whole Debian base changed, so there is lots of components that might have slightly different behaviour now..

Final Conclusion:
We found a misconfigured vlan as reason for this errors. This was always wrong. We think with the update the behavior of the checks changed in any form so that this became visible.
Thanks for your Help!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!