[SOLVED] Proxmox Backup Server connected on PVE's secondary NIC

joscha · Nov 4, 2022

Hi,

i have a PVE-Server running standalone. That server has 2 IP-addresses. One has the default gateway set in the gui. The other one has a second routing table wich contains the second default gateway.

The config looks similar to this:

Code:

auto vmbr1
iface vmbr1 inet static
        address 10.1.0.10/24
        gateway 10.1.0.1
        bridge-ports eno1
        bridge-stp off
        bridge-fd 0
#externalManagement

auto vmbr3
iface vmbr3 inet static
        address 10.0.0.10/24
        bridge-ports ens2f0
        bridge-stp off
        bridge-fd 0
        post-up ip route add default via 10.0.0.1 dev vmbr3 table 1001
        post-up ip rule add from 10.0.0.0/24 table 1001

The newly installed proxmox-backup-server is located in the subnet as the second interface.
Connections to both machines, between both machines and from both machines to other hosts work normally as far as i can tell.

When i start a backup onto to proxmox-backup-server it will hang after a short time and then take a long time to timeout.
To rule out potential problems i put the backup-server in the same subnet as the first interface and all problems were gone. Even after i put it back in the subnet of the second interface i can do backups with very small sizes. The connection seems to break at about 200MB of transmitted data.

Below is what happens in syslog at that time.

Code:

pvedaemon[3546437]: INFO: starting new backup job: vzdump 110 --mode snapshot --node xx --storage xx --mailto xx --remove 0 --notes-template '{{guestname}}'
pvedaemon[3546437]: INFO: Starting Backup of VM 110 (qemu)
QEMU[10433]: HTTP/2.0 connection failed
pvedaemon[3546437]: ERROR: Backup of VM 110 failed - backup write data failed: command error: protocol canceled
pvedaemon[3546437]: INFO: Backup job finished with errors
pvedaemon[3546437]: job errors

The "connection failed" from the qemu process is always there when a backup fails. I am pretty sure it is related to the problem.
My best guess is that qemu is supposed to transfer data via http, but just sends it out the wrong interface. Even though the backup-server is on the same subnet as the PVE-server.

Code:

root@s01:/var/log# route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         xx              0.0.0.0         UG    0      0        0 vmbr1
xx(2nd subnet)  0.0.0.0         255.255.255.0   U     0      0        0 vmbr3
xx(1st subnet)  0.0.0.0         255.255.255.0   U     0      0        0 vmbr1

On the backup-server the task log looks like this:

Code:

starting new backup on datastore 'xx': "ns/xx/vm/110/2022-11-04T08:49:48Z"
GET /previous: 400 Bad Request: no valid previous backup
created new fixed index 1 ("ns/xx/vm/110/2022-11-04T08:49:48Z/drive-scsi0.img.fidx")
add blob "/mnt/datastore/xx/ns/xx/vm/110/2022-11-04T08:49:48Z/qemu-server.conf.blob" (345 bytes, comp: 345)
backup failed: connection error: timed out
removing failed backup
TASK ERROR: connection error: timed out
POST /fixed_chunk: 400 Bad Request: error reading a body from connection: timed out

On the PVE-server the task log looks like this:

Code:

INFO: starting new backup job: vzdump 110 --storage xx --notes-template '{{guestname}}' --remove 0 --node xx --mode snapshot
INFO: Starting Backup of VM 110 (qemu)
INFO: Backup started at 2022-11-04 09:49:48
INFO: status = running
INFO: VM Name: xx
INFO: include disk 'scsi0' 'local-zfs:vm-110-disk-0' 32G
INFO: backup mode: snapshot
INFO: bandwidth limit: 22000 KB/s
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/110/2022-11-04T08:49:48Z'
INFO: started backup task '19bcc41e-4a3f-4533-807b-f906563d9cc5'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: existing bitmap was invalid and has been cleared
INFO:   0% (68.0 MiB of 32.0 GiB) in 3s, read: 22.7 MiB/s, write: 12.0 MiB/s
INFO:   0% (212.0 MiB of 32.0 GiB) in 13m 25s, read: 183.9 KiB/s, write: 122.6 KiB/s
ERROR: backup write data failed: command error: protocol canceled
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 110 failed - backup write data failed: command error: protocol canceled
INFO: Failed at 2022-11-04 10:03:13
INFO: Backup job finished with errors
TASK ERROR: job errors

I hope you can point me in the right direction. At the moment i have no idea why the second interface could cause such problems.

Regards
Joscha

joscha · Nov 6, 2022

So, i found the solution.
After some packet captures etc. i finally checked out my routing table again.
The Problem was the second routing table i created to process routed traffic coming in on the second interface.
It did only have a default rule that would forward everything to the gateway. Hosts within the same subnet were out of luck.
After i added another routing-rule for the traffic on the same subnet, everything works fine.

Search

Search

[SOLVED] Proxmox Backup Server connected on PVE's secondary NIC

joscha

New Member

joscha

New Member