sync group vm failed: error trying to connect: error connecting - tcp connect error: deadline has elapsed

Rainerle

Renowned Member
Jan 29, 2019
120
33
68
Hi,

playing around with PBS v1.0-5 using a sync job against a remote second installation with the same version.

Versions:
Code:
proxmox-backup: 1.0-4 (running kernel: 5.4.78-1-pve)
proxmox-backup-server: 1.0.5-1 (running version: 1.0.5)
pve-kernel-5.4: 6.3-3
pve-kernel-helper: 6.3-3
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.78-1-pve: 5.4.78-1
pve-kernel-5.4.65-1-pve: 5.4.65-1
ifupdown2: 3.0.0-1+pve3
libjs-extjs: 6.0.1-10
proxmox-backup-docs: 1.0.4-1
proxmox-backup-client: 1.0.5-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-xtermjs: 4.7.0-3
smartmontools: 7.1-pve2
zfsutils-linux: 0.8.5-pve1

Some of the QEMU VMs are not synced from the remote PBS. Errors look like this
Code:
sync group vm/102 failed - error trying to connect: error connecting to https://10.33.40.12:8007/ - tcp connect error: deadline has elapsed

There is no specific error message in the journal on either side.

I have no idea where to look further.

Attached is the full log as a result from using proxmox-backup-manager task log .

Thanks
Rainer
 

Attachments

  • log.txt
    45 KB · Views: 8
if you retry the sync, does it fail for the same groups?
 
Ok, had a look at Administration -> Tasks and the logs there. The error occurs with different VMs. We only have KVM/QEMU VMs, no CTs.
 
Last edited:
then I guess the connection (sometimes) establishes too slow.. what kind of network link do you have between the PBS servers?
 
pbs01 -> 10G -> Firewall (IPSec Tunnel) -> 1G -> Internet -> 1G -> Firewall (IPsec Tunnel End) -> 40G -> pbs02

Code:
root@pbs01:~# ping -M do -s 1472 10.33.40.12
PING 10.33.40.12 (10.33.40.12) 1472(1500) bytes of data.
1480 bytes from 10.33.40.12: icmp_seq=1 ttl=62 time=8.37 ms
1480 bytes from 10.33.40.12: icmp_seq=2 ttl=62 time=8.31 ms
1480 bytes from 10.33.40.12: icmp_seq=3 ttl=62 time=8.36 ms
1480 bytes from 10.33.40.12: icmp_seq=4 ttl=62 time=8.13 ms
1480 bytes from 10.33.40.12: icmp_seq=5 ttl=62 time=8.26 ms
1480 bytes from 10.33.40.12: icmp_seq=6 ttl=62 time=8.13 ms
^C
--- 10.33.40.12 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 12ms
rtt min/avg/max/mdev = 8.126/8.259/8.369/0.111 ms
root@pbs01:~#
 
the connect timeout is 10 seconds.. maybe the firewall/VPN interfers sometimes?
 
The IPsec Tunnel is always up and both firewalls are well below their maximum session limit.

I have seen the 10sec wait time between the error und the next action.
Is there something like a retry setting I could use?

I could run a ping/date job during sync job time to rule out network problems.
 
the 10s are currently hard-coded, but the next sync should pick the skipped group up again without any issue.
 
if you can collect some debugging information (e.g. tcpdump from both ends, trimmed to a time-range +- 30s of the error) and file a bug over at https://bugzilla.proxmox.com it would be appreciated.
 
Ok, so it would have been clever to just let the tcpdump run and wait for the first tcp connect error...
Code:
()
2020-12-10T22:00:00+01:00: Starting datastore sync job 'pbs02:pve-infra-onsite:pve-infra:s-920ce45c-1279'
2020-12-10T22:00:00+01:00: task triggered by schedule 'mon..fri 22:00'
2020-12-10T22:00:00+01:00: Sync datastore 'pve-infra' from 'pbs02/pve-infra-onsite'
2020-12-10T22:00:01+01:00: re-sync snapshot "vm/100/2020-12-07T18:00:01Z"
2020-12-10T22:00:01+01:00: no data changes
2020-12-10T22:00:01+01:00: re-sync snapshot "vm/100/2020-12-07T18:00:01Z" done
2020-12-10T22:00:01+01:00: percentage done: 0.65%
2020-12-10T22:00:01+01:00: sync snapshot "vm/100/2020-12-08T18:00:01Z"
2020-12-10T22:00:01+01:00: sync archive qemu-server.conf.blob
2020-12-10T22:00:01+01:00: sync archive drive-scsi1.img.fidx
2020-12-10T22:00:01+01:00: downloaded 1764823 bytes (8.97 MiB/s)
2020-12-10T22:00:01+01:00: got backup log file "client.log.blob"
2020-12-10T22:00:01+01:00: sync snapshot "vm/100/2020-12-08T18:00:01Z" done
2020-12-10T22:00:01+01:00: percentage done: 0.78%
2020-12-10T22:00:11+01:00: sync group vm/100 failed - error trying to connect: error connecting to https://10.33.40.12:8007/ - tcp connect error: deadline has elapsed
2020-12-10T22:00:11+01:00: re-sync snapshot "vm/101/2020-12-09T18:00:01Z"

It would have only taken seconds...

I had the ping running though...
ping -D -M do -s 1472 10.33.40.12 >/var/log/ping-pbs02.log

Code:
[1607634000.063282] 1480 bytes from 10.33.40.12: icmp_seq=45422 ttl=62 time=8.09 ms
[1607634001.064451] 1480 bytes from 10.33.40.12: icmp_seq=45423 ttl=62 time=8.06 ms
[1607634002.065516] 1480 bytes from 10.33.40.12: icmp_seq=45424 ttl=62 time=8.15 ms
[1607634003.066981] 1480 bytes from 10.33.40.12: icmp_seq=45425 ttl=62 time=8.31 ms
[1607634004.068354] 1480 bytes from 10.33.40.12: icmp_seq=45426 ttl=62 time=8.25 ms
[1607634005.069754] 1480 bytes from 10.33.40.12: icmp_seq=45427 ttl=62 time=8.37 ms
[1607634006.071111] 1480 bytes from 10.33.40.12: icmp_seq=45428 ttl=62 time=8.24 ms
[1607634007.072502] 1480 bytes from 10.33.40.12: icmp_seq=45429 ttl=62 time=8.27 ms
[1607634008.073619] 1480 bytes from 10.33.40.12: icmp_seq=45430 ttl=62 time=8.21 ms
[1607634009.075161] 1480 bytes from 10.33.40.12: icmp_seq=45431 ttl=62 time=8.40 ms
[1607634010.076599] 1480 bytes from 10.33.40.12: icmp_seq=45432 ttl=62 time=8.30 ms
[1607634011.078123] 1480 bytes from 10.33.40.12: icmp_seq=45433 ttl=62 time=8.40 ms
[1607634012.079841] 1480 bytes from 10.33.40.12: icmp_seq=45434 ttl=62 time=8.46 ms
[1607634013.081458] 1480 bytes from 10.33.40.12: icmp_seq=45435 ttl=62 time=8.54 ms
[1607634014.082348] 1480 bytes from 10.33.40.12: icmp_seq=45436 ttl=62 time=7.82 ms
[1607634015.084710] 1480 bytes from 10.33.40.12: icmp_seq=45437 ttl=62 time=8.25 ms
[1607634016.085642] 1480 bytes from 10.33.40.12: icmp_seq=45438 ttl=62 time=7.86 ms
[1607634017.089715] 1480 bytes from 10.33.40.12: icmp_seq=45439 ttl=62 time=9.96 ms
[1607634018.090848] 1480 bytes from 10.33.40.12: icmp_seq=45440 ttl=62 time=10.0 ms
[1607634019.090331] 1480 bytes from 10.33.40.12: icmp_seq=45441 ttl=62 time=8.57 ms
[1607634020.090314] 1480 bytes from 10.33.40.12: icmp_seq=45442 ttl=62 time=7.87 ms
[1607634021.093129] 1480 bytes from 10.33.40.12: icmp_seq=45443 ttl=62 time=8.73 ms
[1607634022.094175] 1480 bytes from 10.33.40.12: icmp_seq=45444 ttl=62 time=8.84 ms
[1607634023.095011] 1480 bytes from 10.33.40.12: icmp_seq=45445 ttl=62 time=7.71 ms
[1607634024.097773] 1480 bytes from 10.33.40.12: icmp_seq=45446 ttl=62 time=8.64 ms
[1607634025.099022] 1480 bytes from 10.33.40.12: icmp_seq=45447 ttl=62 time=8.15 ms

There seems to be no problem around that time with the network...
 
hi,

same problem for me but it's when i want to backup My firewall's VM, so the VPN close even with suspend mode.
 
hi,

same problem for me but it's when i want to backup My firewall's VM, so the VPN close even with suspend mode.

that's not the same problem at all..
 
yes sorry i did not read post #10 well. I had log like #1 post and thought perhaps Vm102 should be a VPN .
 
Oddly I am getting this exact same issue. What is even more weird is I can back up to the exact same remote backup server with the CLI backup client, from the same host that is failing below.

Edit: To add RTT for ping is only 30ms.

ERROR: Backup of VM 100 failed - VM 100 qmp command 'backup' failed - backup connect failed: command error: error trying to connect: error connecting to https://xxxxx:8007/ - tcp connect error: deadline has elapsed

From CLI.
root.pxar: had to upload 113.43 MiB of 5.89 GiB in 25.36s, average speed 4.47 MiB/s). root.pxar: backup was done incrementally, reused 5.78 GiB (98.1%) Uploaded backup catalog (1.79 MiB) Duration: 26.58s End Time: Wed Jan 6 06:38:22 2021
 
Last edited:
Apologies for the double post, but this just got even more weird. Oddly containers will back up fine, to the exact same host, in the same backup process. The container is off currently, potentially there is something I am missing with snapshotting running VM's?

Code:
INFO: starting new backup job: vzdump 100 101 --mailnotification failure --mode snapshot --quiet 1 --storage proxmox-remote-bak --node xxxx --all 0
INFO: Starting Backup of VM 100 (qemu)
INFO: Backup started at 2021-01-07 07:08:37
INFO: status = running
INFO: VM Name: opnsense
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/100/2021-01-07T07:08:37Z'
ERROR: VM 100 qmp command 'backup' failed - backup connect failed: command error: error trying to connect: error connecting to https://xxxxx:8007/ - tcp connect error: deadline has elapsed
INFO: aborting backup job
ERROR: Backup of VM 100 failed - VM 100 qmp command 'backup' failed - backup connect failed: command error: error trying to connect: error connecting to https://xxxxx:8007/ - tcp connect error: deadline has elapsed
INFO: Failed at 2021-01-07 07:08:47
INFO: Starting Backup of VM 101 (lxc)
INFO: Backup started at 2021-01-07 07:08:47
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: CT Name: generic
INFO: including mount point rootfs ('/') in backup
INFO: creating Proxmox Backup Server archive 'ct/101/2021-01-07T07:08:47Z'
INFO: run: /usr/bin/proxmox-backup-client backup --crypt-mode=none pct.conf:/var/tmp/vzdumptmp21109_101/etc/vzdump/pct.conf root.pxar:/mnt/vzsnap0 --include-dev /mnt/vzsnap0/./ --skip-lost-and-found --exclude=/tmp/?* --exclude=/var/tmp/?* --exclude=/var/run/?*.pid --backup-type ct --backup-id 101 --backup-time 1610003327 --repository xxx@pbs@xxxx:backup
INFO: Starting backup: ct/101/2021-01-07T07:08:47Z
INFO: Client name: xxxxx
INFO: Starting backup protocol: Thu Jan  7 07:08:48 2021
INFO: Downloading previous manifest (Thu Jan  7 07:00:58 2021)
INFO: Upload config file '/var/tmp/vzdumptmp21109_101/etc/vzdump/pct.conf' to 'xxxx@pbs@xxxx:8007:backup' as pct.conf.blob
INFO: Upload directory '/mnt/vzsnap0' to 'xxxx@pbs@xxxx:8007:backup' as root.pxar.didx
INFO: root.pxar: had to upload 0 B of 2.34 GiB in 12.24s, average speed 0 B/s).
INFO: root.pxar: backup was done incrementally, reused 2.34 GiB (100.0%)
INFO: Uploaded backup catalog (663.45 KiB)
INFO: Duration: 12.81s
INFO: End Time: Thu Jan  7 07:09:01 2021
INFO: Finished Backup of VM 101 (00:00:14)
INFO: Backup finished at 2021-01-07 07:09:01
INFO: Backup job finished with errors
TASK ERROR: job errors
 
please provide tcdump from both sides, otherwise it will be impossible to analyze what's going on..
 
please provide tcdump from both sides, otherwise it will be impossible to analyze what's going on..

In order to obsfucate the remote IP, I added a port forward so 192.168.1.254 forwards to the remote IP. I tested this beforehand to ensure its working as intended, and I can access it fine, so it isn't interfering :) The same error happens and the VM fails but the CT is fine.

pve host: 192.168.1.100
pbs: 192.168.1.254 (which is actually a remote IP)

The remote IP has been replaced with remote-pbs-ip-address in the pbs dump.

Code:
INFO: starting new backup job: vzdump 100 101 --mailnotification always --mode snapshot --quiet 1 --all 0 --node xxx --storage testremote
INFO: Starting Backup of VM 100 (qemu)
INFO: Backup started at 2021-01-07 09:04:13
INFO: status = running
INFO: VM Name: opnsense
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/100/2021-01-07T09:04:13Z'
ERROR: VM 100 qmp command 'backup' failed - backup connect failed: command error: error trying to connect: error connecting to https://192.168.1.254:8007/ - tcp connect error: deadline has elapsed
INFO: aborting backup job
ERROR: Backup of VM 100 failed - VM 100 qmp command 'backup' failed - backup connect failed: command error: error trying to connect: error connecting to https://192.168.1.254:8007/ - tcp connect error: deadline has elapsed
INFO: Failed at 2021-01-07 09:04:23
INFO: Starting Backup of VM 101 (lxc)
INFO: Backup started at 2021-01-07 09:04:23
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: CT Name: generic
INFO: including mount point rootfs ('/') in backup
INFO: creating Proxmox Backup Server archive 'ct/101/2021-01-07T09:04:23Z'
INFO: run: /usr/bin/proxmox-backup-client backup --crypt-mode=none pct.conf:/var/tmp/vzdumptmp30042_101/etc/vzdump/pct.conf root.pxar:/mnt/vzsnap0 --include-dev /mnt/vzsnap0/./ --skip-lost-and-found --exclude=/tmp/?* --exclude=/var/tmp/?* --exclude=/var/run/?*.pid --backup-type ct --backup-id 101 --backup-time 1610010263 --repository xxx@pbs@192.168.1.254:backup
INFO: Starting backup: ct/101/2021-01-07T09:04:23Z
INFO: Client name: billxxxyprox
INFO: Starting backup protocol: Thu Jan  7 09:04:24 2021
INFO: Downloading previous manifest (Thu Jan  7 07:08:47 2021)
INFO: Upload config file '/var/tmp/vzdumptmp30042_101/etc/vzdump/pct.conf' to 'bakuser@pbs@192.168.1.254:8007:backup' as pct.conf.blob
INFO: Upload directory '/mnt/vzsnap0' to 'bakuser@pbs@192.168.1.254:8007:backup' as root.pxar.didx
INFO: root.pxar: had to upload 0 B of 2.34 GiB in 11.88s, average speed 0 B/s).
INFO: root.pxar: backup was done incrementally, reused 2.34 GiB (100.0%)
INFO: Uploaded backup catalog (663.45 KiB)
INFO: Duration: 12.54s
INFO: End Time: Thu Jan  7 09:04:36 2021
INFO: Finished Backup of VM 101 (00:00:14)
INFO: Backup finished at 2021-01-07 09:04:37
INFO: Backup job finished with errors
TASK ERROR: job errors

version info
Code:
proxmox-ve: 6.3-1 (running kernel: 5.4.73-1-pve)
pve-manager: 6.3-2 (running version: 6.3-2/22f57405)
pve-kernel-5.4: 6.3-1
pve-kernel-helper: 6.3-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-6
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.3-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.5-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-1
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-1
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1

qm status 100 --verbose output

Code:
blockstat:
        ide2:
                unmap_operations: 0
                rd_total_time_ns: 10418244
                wr_highest_offset: 0
                failed_flush_operations: 0
                invalid_rd_operations: 0
                failed_rd_operations: 0
                flush_total_time_ns: 0
                failed_wr_operations: 0
                timed_stats:
                wr_total_time_ns: 0
                idle_time_ns: 11139765199707
                failed_unmap_operations: 0
                unmap_total_time_ns: 0
                unmap_merged: 0
                account_failed: 1
                unmap_bytes: 0
                wr_operations: 0
                rd_bytes: 148116
                wr_merged: 0
                flush_operations: 0
                rd_merged: 0
                account_invalid: 1
                wr_bytes: 0
                invalid_wr_operations: 0
                rd_operations: 100
                invalid_unmap_operations: 0
                invalid_flush_operations: 0
        scsi0:
                failed_wr_operations: 0
                flush_total_time_ns: 0
                wr_total_time_ns: 2923175937
                timed_stats:
                unmap_operations: 0
                rd_total_time_ns: 1682399177
                failed_rd_operations: 0
                invalid_rd_operations: 0
                failed_flush_operations: 0
                wr_highest_offset: 8263651328
                wr_bytes: 347197440
                account_invalid: 1
                flush_operations: 0
                rd_merged: 0
                invalid_flush_operations: 0
                invalid_wr_operations: 0
                invalid_unmap_operations: 0
                rd_operations: 9288
                wr_operations: 9166
                unmap_bytes: 0
                unmap_total_time_ns: 0
                unmap_merged: 0
                account_failed: 1
                idle_time_ns: 1810386025
                failed_unmap_operations: 0
                wr_merged: 0
                rd_bytes: 147186688
cpus: 1
disk: 0
diskread: 147334804
diskwrite: 347197440
maxdisk: 10737418240
maxmem: 1073741824
mem: 708528093
name: opnsense
netin: 0
netout: 0
pid: 2643
proxmox-support:
        pbs-library-version: 1.0.2 (18d5b98ab1bec4004178a0db6f2debb83bfa9165)
        pbs-dirty-bitmap: 1
        query-bitmap-info: 1
        pbs-dirty-bitmap-migration: 1
qmpstatus: running
status: running
template:
uptime: 11158
vmid: 100
 

Attachments

  • local-host-tcpdump.log
    166.7 KB · Views: 3
  • remote-pbs-tcpdump.log
    139.5 KB · Views: 1
Last edited:
can you also include "pveversion -v" and "qm status 100 --verbose" output?
 
Code:
name: opnsense

is this by chance the VM that provides your connection to the PBS server? ;)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!