Backups Fail with Connection Reset

spetrillo

Member
Feb 15, 2024
196
9
18
Hello all,

I setup a PBS and began backing up my vms in two locations. The first location is a remote location and I am having no issues with this. The second location is local to the PBS and I seem to be having issues with these. I am getting the following error msgs:

ERROR: backup write data failed: command error: write_data upload error: pipelined request failed: connection reset
INFO: aborting backup job
INFO: stopping kvm after backup task
ERROR: Backup of VM 100 failed - backup write data failed: command error: write_data upload error: pipelined request failed: connection reset


I thought that maybe it was conflicting with the remote location backups, so I tried again just now and it failed again. In looking at the backup jobs on each of the PVE servers I noticed one difference. On the remote location server I am running LVM and using fleecing to local-lvm. On the local server I am running ZFS and using fleecing to local-zfs. The failed backup is doing something as its approximately 15 minutes from the start of the backup to the failure point.

Is there anything around fleecing with ZFS? Anyone know why I am getting this?

Thanks,
Steve
 
I turned off fleecing and tried a backup of one vm. It again failed at approximately 14 minutes. It begins to write and then looks like it does nothing more.
 
Hi,
IIRC, the timeout for data requests to the PBS is 15 minutes and the connection reset error also hints that it might be some network-related issue. Please share the full backup task log from both the Proxmox VE side and from the PBS side. Please also post the output of pveversion -v and proxmox-backup-manager versions --verbose respectively. How exactly is the local PBS set up, i.e. running on dedicated hardware/co-installed with Proxmox VE/as a VM?
 
Hi Fiona,

The PBS is on dedicated hardware and is literally connected on the same switch as the PVE in question. I am also backing up a remote PVE, over a VPN connection. If any server was going to give me problems I would have thought it would be that one, but it backs up to the PBS flawlessly.

Let me give you some info on the config of my environment. Both the PBS and PVE are connected to my mgmt vlan, respectively 192.168.1.5(PVE) and 192.168.1.6(PBS). The PBS is connected to an iSCSI datastore from my NAS, for storing my backups. This datastore is on a different vlan, with the NAS configured for 192.168.2.162 and the PBS is 192.168.2.163. The PVE is also part of this subnet, at 192.168.2.164. I have connected the PBS storage to the PVE, using the PBS address of 192.168.2.163. I attached the screenshot from the PVE.

I am working on the getting the logs and other info requested and will respond this evening.

Thanks,
Steve
 

Attachments

  • Screenshot 2024-07-01 154013.png
    Screenshot 2024-07-01 154013.png
    24 KB · Views: 11
Last edited:
Attached please find the output of the commands you asked for. I am going to run a backup now, so I can obtain the logs from each side.
 

Attachments

OK here are the logs of the failed backup attempt. Let me know if you need anything additional.
Unfortunately, the journal on the Proxmox VE side does not contain much information. You can get the backup task log by double click the task in the bottom in the UI. It should be named VM/CT 100 - Backup or Backup Job.

The PBS log also talks about a connection error:
Jul 01 16:25:57 pbs01 proxmox-backup-proxy[942]: POST /fixed_chunk: 400 Bad Request: error reading a body from connection: timed out, so best to investigate/test the network further.
 
Hi Fiona,

Not sure what to investigate here. As mentioned both devices are literally in the same switch. Maybe I could have a switch problem or a port problem. I will keep an eye on it but the same PBS switch port that is used for this, is used to backup a remote PVE and I have no issues backing up that one. As mentioned the only difference between the PVE in question and the remote PVE is the actual file system. The remote PVE is LVM and the local PVE is ZFS. Is there anything I should be checking on the ZFS front?

As per your recommendation here is the full log from the PVE side.

Thanks,
Steve
 

Attachments

To check if it is storage-related, you can create a dummy VM on a different storage (e.g. the local directory storage) and backup that VM to compare.
 
HI Fiona,

I am back and I think I might understand why the connection was having an issue. I want to pass a config question by you.

As I mentioned earlier my PBS has an iSCSI mount for the backup storage. Its on a separate storage vlan from the front door IP of the PBS. The iSCSI mount is coming from my NAS, and the IP of the iSCSI mount is 192.168.2.162. My PBS has an IP on the storage vlan, which is 192.168.2.163. I then add the PBS storage to the PVE, and so the PVE also has an IP on the storage vlan, which is 192.168.2.164.

When the backup starts how does the PVE send its data. Will it use the storage vlan where the storage is or does it use the IP of the PVE front door? I am starting to believe its going out the front door, and since the storage vlan does not talk to any other vlans, its failing with a connection failure. Can I configure the PVE to use the storage vlan for storage related items, like backup? I could open the storage vlan up to specific IPs but that sorta defeats the purpose of an isolated storage vlan.

Thanks,
Steve
 
HI Fiona,

I am back and I think I might understand why the connection was having an issue. I want to pass a config question by you.

As I mentioned earlier my PBS has an iSCSI mount for the backup storage. Its on a separate storage vlan from the front door IP of the PBS. The iSCSI mount is coming from my NAS, and the IP of the iSCSI mount is 192.168.2.162. My PBS has an IP on the storage vlan, which is 192.168.2.163. I then add the PBS storage to the PVE, and so the PVE also has an IP on the storage vlan, which is 192.168.2.164.

When the backup starts how does the PVE send its data. Will it use the storage vlan where the storage is or does it use the IP of the PVE front door? I am starting to believe its going out the front door, and since the storage vlan does not talk to any other vlans, its failing with a connection failure. Can I configure the PVE to use the storage vlan for storage related items, like backup? I could open the storage vlan up to specific IPs but that sorta defeats the purpose of an isolated storage vlan.

Thanks,
Steve
AFAIK, PVE will only connect to PBS and not talk to the iSCSI storage directly, only PBS will talk to the storage. PVE will use the IP/hostname that is configured in /etc/pve/storage.cfg for the PBS storage. But isn't the remote location connecting to the "front door IP" of the PBS too? You can try changing the IP in /etc/pve/storage.cfg to make sure, but I'd be a bit surprised if that is the issue.
 
So I checked both the local and remote PVEs and they are configured identically:

pbs: proxbkp
datastore backups
server 192.168.2.163
content backup
fingerprint 26:4a:8f:0b:c4:48:58:c8:cc:be:5a:96:c6:85:35:43:e6:28:a4:9f>
prune-backups keep-all=1
username root@pam


This IP is the storage vlan, so they are indeed backing up to the PBS, across the storage vlan? Can I assume this from the above config? I am starting to think I have a switch port problem.
 
So I checked both the local and remote PVEs and they are configured identically:

pbs: proxbkp
datastore backups
server 192.168.2.163
content backup
fingerprint 26:4a:8f:0b:c4:48:58:c8:cc:be:5a:96:c6:85:35:43:e6:28:a4:9f>
prune-backups keep-all=1
username root@pam


This IP is the storage vlan, so they are indeed backing up to the PBS, across the storage vlan? Can I assume this from the above config? I am starting to think I have a switch port problem.
Yes, the communication between PVE <-> PBS should happen via 192.168.2.163 using whatever interface is configured for that IP in PBS and configured to reach that IP in PVE. It's strange though that the initial connection works without issues, it only fails later when the data comes. Can you check with e.g. iperf that the connection is fine? A wild guess, but other weird network issues come to mind where the MTU was the culprit.

What happens if you use the "front door IP" of the PBS instead?
 
  • Like
Reactions: spetrillo
As mentioned I think its a network port thats going bad. I have moved the local PVE to a new switch and I just tested. It works just fine....ughh!

Sorry for the wild goose chase but I am glad I got this going.
 
  • Like
Reactions: fiona
Fiona,

Before I let you go I have a question on iSCSI and network access.

As mentioned I created a separate IP subnet, so that the iSCSI storage from my NAS could be mounted to the PBS, across this dedicated IP subnet. I didnt want storage traffic and network traffic to be intermixed. This means the NAS has its own IP on this subnet and the PBS also has an IP on the subnet. Is there a way to ensure that when I mount the iSCSI storage to the PBS that it is accessed via this separate IP?

Thanks,
Steve
 
As mentioned I created a separate IP subnet, so that the iSCSI storage from my NAS could be mounted to the PBS, across this dedicated IP subnet. I didnt want storage traffic and network traffic to be intermixed. This means the NAS has its own IP on this subnet and the PBS also has an IP on the subnet. Is there a way to ensure that when I mount the iSCSI storage to the PBS that it is accessed via this separate IP?
I'm by no means a network/iSCSI expert, but I presume you can just use that IP in the configuration for the iSCSI portal.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!