Frequent errors after Upgrade to PVE8.0.3/PBS3.0-1

ITmarcapo · Jul 17, 2023

Hello,
after we upgraded our PVE to 8.0.3 we got frequent errormessages on all PVE nodes.
We didnt got problems so far, the GUI dont indicates any errors and is accessable. The Backups and Restores still running fine.
We didnt changed any configuration or network settings.
We didnt got any of these errors before.

So we tried to fix the problem with an PBS Upgrade to 3.0-1. Nothing changed. Still errors:

Code:

2023-07-16T00:03:48.880490+02:00 proxmox-05 pvestatd[1679]: pbs-01-365: error fetching datastores - 500 Can't connect to 192.168.19.2:8007 (Connection reset by peer)
2023-07-16T00:04:08.966170+02:00 proxmox-05 pvestatd[1679]: pbs-01-365: error fetching datastores - 500 Can't connect to 192.168.19.2:8007 (Connection reset by peer)
2023-07-16T00:10:58.590090+02:00 proxmox-05 pvestatd[1679]: pbs-01-90: error fetching datastores - 500 Can't connect to 192.168.19.2:8007 (Connection reset by peer)
2023-07-16T00:15:28.800254+02:00 proxmox-05 pvestatd[1679]: proxmox-backup-client failed: Error: error trying to connect: Connection reset by peer (os error 104)
2023-07-16T00:22:18.469544+02:00 proxmox-05 pvestatd[1679]: pbs-01-30: error fetching datastores - 500 Can't connect to 192.168.19.2:8007 (Connection reset by peer)
2023-07-16T00:44:39.148728+02:00 proxmox-05 pvestatd[1679]: pbs-01-90: error fetching datastores - 500 Can't connect to 192.168.19.2:8007 (Connection reset by peer)
2023-07-16T00:45:48.661338+02:00 proxmox-05 pvestatd[1679]: pbs-01-365: error fetching datastores - 500 Can't connect to 192.168.19.2:8007 (Connection reset by peer)
2023-07-16T00:46:48.384470+02:00 proxmox-05 pvestatd[1679]: pbs-01-365: error fetching datastores - 500 Can't connect to 192.168.19.2:8007 (Connection reset by peer)
2023-07-16T00:51:19.271210+02:00 proxmox-05 pvestatd[1679]: proxmox-backup-client failed: Error: error trying to connect: Connection reset by peer (os error 104)
2023-07-16T00:57:58.551048+02:00 proxmox-05 pvestatd[1679]: pbs-01-180: error fetching datastores - 500 Can't connect to 192.168.19.2:8007 (Connection reset by peer)

As you can see these errors happen very often.
On the PBS side we cant see any errors that can help us.

How can we go further?

fabian · Jul 17, 2023

this seems like an occasional network hickup - is the "pbs-01" the same for all those storage definitions?

ITmarcapo · Jul 17, 2023

fabian said:
this seems like an occasional network hickup - is the "pbs-01" the same for all those storage definitions?

Yes. All PVE Nodes use the PBS-01 host.

On the network side we dont changed anything. These messages appear sychronized with the update, so that cant be coincidence.
We searched logs from the past and these errors not appere once.

We would also expect an error while using the webinterface or doing tasks? Shouldn't be like that? Our network monitoring didn't see any problems either.

fabian · Jul 17, 2023

well, pvestatd will query each storage every 10s, and you only see the error every few minutes and only for a single storage at a time, so it's only a tiny percentage of requests that get their connection reset. any errors on the PBS side in the access log?

ITmarcapo · Jul 17, 2023

fabian said:
well, pvestatd will query each storage every 10s, and you only see the error every few minutes and only for a single storage at a time, so it's only a tiny percentage of requests that get their connection reset. any errors on the PBS side in the access log?

Thanks for these information!
At the Accesslogs we didnt see any error. All 200.

We added additional monitoring for these explicit case and you are right. All our PVE Nodes got rarely network problems between the PBS-01 host and themselfs. We will look further into it.

Nevertheless we find the behavior strange because we never got these messages before, but directly after the upgrade.
Are these checks changed with the PVE-Upgrade?
Further, any idea why the errorcodes are different from another? Does exist multiple checks here?

fabian · Jul 17, 2023

no, but the different error messages might just be different places where the error is encountered. I don't think there was any change directly related to that part of the code, but of course, the whole Debian base changed, so there is lots of components that might have slightly different behaviour now..

ITmarcapo · Jul 18, 2023

fabian said:
no, but the different error messages might just be different places where the error is encountered. I don't think there was any change directly related to that part of the code, but of course, the whole Debian base changed, so there is lots of components that might have slightly different behaviour now..

Final Conclusion:
We found a misconfigured vlan as reason for this errors. This was always wrong. We think with the update the behavior of the checks changed in any form so that this became visible.
Thanks for your Help!

Search

Search

Frequent errors after Upgrade to PVE8.0.3/PBS3.0-1

ITmarcapo

New Member

fabian

Proxmox Staff Member

ITmarcapo

New Member

fabian

Proxmox Staff Member

ITmarcapo

New Member

fabian

Proxmox Staff Member

ITmarcapo

New Member