Hi,
i have a PVE-Server running standalone. That server has 2 IP-addresses. One has the default gateway set in the gui. The other one has a second routing table wich contains the second default gateway.
The config looks similar to this:
The newly installed proxmox-backup-server is located in the subnet as the second interface.
Connections to both machines, between both machines and from both machines to other hosts work normally as far as i can tell.
When i start a backup onto to proxmox-backup-server it will hang after a short time and then take a long time to timeout.
To rule out potential problems i put the backup-server in the same subnet as the first interface and all problems were gone. Even after i put it back in the subnet of the second interface i can do backups with very small sizes. The connection seems to break at about 200MB of transmitted data.
Below is what happens in syslog at that time.
The "connection failed" from the qemu process is always there when a backup fails. I am pretty sure it is related to the problem.
My best guess is that qemu is supposed to transfer data via http, but just sends it out the wrong interface. Even though the backup-server is on the same subnet as the PVE-server.
On the backup-server the task log looks like this:
On the PVE-server the task log looks like this:
I hope you can point me in the right direction. At the moment i have no idea why the second interface could cause such problems.
Regards
Joscha
i have a PVE-Server running standalone. That server has 2 IP-addresses. One has the default gateway set in the gui. The other one has a second routing table wich contains the second default gateway.
The config looks similar to this:
Code:
auto vmbr1
iface vmbr1 inet static
address 10.1.0.10/24
gateway 10.1.0.1
bridge-ports eno1
bridge-stp off
bridge-fd 0
#externalManagement
auto vmbr3
iface vmbr3 inet static
address 10.0.0.10/24
bridge-ports ens2f0
bridge-stp off
bridge-fd 0
post-up ip route add default via 10.0.0.1 dev vmbr3 table 1001
post-up ip rule add from 10.0.0.0/24 table 1001
The newly installed proxmox-backup-server is located in the subnet as the second interface.
Connections to both machines, between both machines and from both machines to other hosts work normally as far as i can tell.
When i start a backup onto to proxmox-backup-server it will hang after a short time and then take a long time to timeout.
To rule out potential problems i put the backup-server in the same subnet as the first interface and all problems were gone. Even after i put it back in the subnet of the second interface i can do backups with very small sizes. The connection seems to break at about 200MB of transmitted data.
Below is what happens in syslog at that time.
Code:
pvedaemon[3546437]: INFO: starting new backup job: vzdump 110 --mode snapshot --node xx --storage xx --mailto xx --remove 0 --notes-template '{{guestname}}'
pvedaemon[3546437]: INFO: Starting Backup of VM 110 (qemu)
QEMU[10433]: HTTP/2.0 connection failed
pvedaemon[3546437]: ERROR: Backup of VM 110 failed - backup write data failed: command error: protocol canceled
pvedaemon[3546437]: INFO: Backup job finished with errors
pvedaemon[3546437]: job errors
The "connection failed" from the qemu process is always there when a backup fails. I am pretty sure it is related to the problem.
My best guess is that qemu is supposed to transfer data via http, but just sends it out the wrong interface. Even though the backup-server is on the same subnet as the PVE-server.
Code:
root@s01:/var/log# route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default xx 0.0.0.0 UG 0 0 0 vmbr1
xx(2nd subnet) 0.0.0.0 255.255.255.0 U 0 0 0 vmbr3
xx(1st subnet) 0.0.0.0 255.255.255.0 U 0 0 0 vmbr1
On the backup-server the task log looks like this:
Code:
starting new backup on datastore 'xx': "ns/xx/vm/110/2022-11-04T08:49:48Z"
GET /previous: 400 Bad Request: no valid previous backup
created new fixed index 1 ("ns/xx/vm/110/2022-11-04T08:49:48Z/drive-scsi0.img.fidx")
add blob "/mnt/datastore/xx/ns/xx/vm/110/2022-11-04T08:49:48Z/qemu-server.conf.blob" (345 bytes, comp: 345)
backup failed: connection error: timed out
removing failed backup
TASK ERROR: connection error: timed out
POST /fixed_chunk: 400 Bad Request: error reading a body from connection: timed out
On the PVE-server the task log looks like this:
Code:
INFO: starting new backup job: vzdump 110 --storage xx --notes-template '{{guestname}}' --remove 0 --node xx --mode snapshot
INFO: Starting Backup of VM 110 (qemu)
INFO: Backup started at 2022-11-04 09:49:48
INFO: status = running
INFO: VM Name: xx
INFO: include disk 'scsi0' 'local-zfs:vm-110-disk-0' 32G
INFO: backup mode: snapshot
INFO: bandwidth limit: 22000 KB/s
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/110/2022-11-04T08:49:48Z'
INFO: started backup task '19bcc41e-4a3f-4533-807b-f906563d9cc5'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: existing bitmap was invalid and has been cleared
INFO: 0% (68.0 MiB of 32.0 GiB) in 3s, read: 22.7 MiB/s, write: 12.0 MiB/s
INFO: 0% (212.0 MiB of 32.0 GiB) in 13m 25s, read: 183.9 KiB/s, write: 122.6 KiB/s
ERROR: backup write data failed: command error: protocol canceled
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 110 failed - backup write data failed: command error: protocol canceled
INFO: Failed at 2022-11-04 10:03:13
INFO: Backup job finished with errors
TASK ERROR: job errors
I hope you can point me in the right direction. At the moment i have no idea why the second interface could cause such problems.
Regards
Joscha