Intermittent backup problems, need some hints on how to debug

Ovidiu

Active Member
Apr 27, 2014
311
10
38
hi there,

1 Proxmox server and 1 PBS with different providers. Both servers are up to date.
Backups worked fine through tailscale (wireguard based VPN) then went from taking on average 10 minutes to 5-8 hours with similar backup sizes.
Switched back to connecting to PBS via public IP, same problems. Eventually backups worked fine again with public IP and VPN.
A few weeks later they completely stopped working in any way.

tried this from PX to PBS:

proxmox-backup-client benchmark --repository root@pam@PBS-public-IP:datastore-name

and that command just freezes - I aborted after 15 minutes. The task shows up on the PBS but nothing happens.
Tried a few times consecutively and eventually it worked and gave these numbers:

Code:
Uploaded 10 chunks in 14 seconds.
Time per request: 1432043 microseconds.
TLS speed: 2.93 MB/s
SHA256 speed: 1579.20 MB/s
Compression speed: 590.72 MB/s
Decompress speed: 1048.82 MB/s
AES256/GCM speed: 2739.27 MB/s
Verify speed: 613.92 MB/s
┌───────────────────────────────────┬────────────────────┐
│ Name                              │ Value              │
╞═══════════════════════════════════╪════════════════════╡
│ TLS (maximal backup upload speed) │ 2.93 MB/s (0%)     │
├───────────────────────────────────┼────────────────────┤
│ SHA256 checksum computation speed │ 1579.20 MB/s (78%) │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 compression speed    │ 590.72 MB/s (79%)  │
├───────────────────────────────────┼────────────────────┤
│ ZStd level 1 decompression speed  │ 1048.82 MB/s (88%) │
├───────────────────────────────────┼────────────────────┤
│ Chunk verification speed          │ 613.92 MB/s (81%)  │
├───────────────────────────────────┼────────────────────┤
│ AES256 GCM encryption speed       │ 2739.27 MB/s (75%) │
└───────────────────────────────────┴────────────────────┘

trying a few times again, the benchmark simply freezes again with nothing happening on the PBS aka no CPU/DISK/network load.

Code:
apt-cache policy proxmox-backup-client
proxmox-backup-client:
  Installed: 1.1.6-2
  Candidate: 1.1.6-2

I really need a hint at where to look? Network problems? origin or target? Proxmox problems? PBS problems? I'm a bit lost right now :-(
 

dcsapak

Proxmox Staff Member
Staff member
Feb 1, 2016
8,062
992
163
34
Vienna
aside from it being just *very* slow you could check what the process does

when the benchmark hangs, do
Code:
ps ax | grep proxmox-backup-client

this gives you the PID in the first column
with the pid do
Code:
grep -r '' /proc/<PID>/task/*/stack

edit: the output of the second command might not help though.., you could also run it with 'strace', like 'strace -f -yy proxmox-backup-client ....'
 
Last edited:

Ovidiu

Active Member
Apr 27, 2014
311
10
38
thanks. Does this help?

Code:
ps ax | grep proxmox-backup-client
30426 pts/2    Sl+    0:00 proxmox-backup-client benchmark --repository root@pam@PBS-public-IP:datastore
30816 pts/3    S+     0:00 grep proxmox-backup-client


grep -r '' /proc/30426/task/*/stack
/proc/30426/task/30426/stack:[<0>] futex_wait_queue_me+0xb9/0x120
/proc/30426/task/30426/stack:[<0>] futex_wait+0x151/0x270
/proc/30426/task/30426/stack:[<0>] do_futex+0x32c/0xb80
/proc/30426/task/30426/stack:[<0>] __x64_sys_futex+0x143/0x180
/proc/30426/task/30426/stack:[<0>] do_syscall_64+0x57/0x190
/proc/30426/task/30426/stack:[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
/proc/30426/task/30431/stack:[<0>] futex_wait_queue_me+0xb9/0x120
/proc/30426/task/30431/stack:[<0>] futex_wait+0x151/0x270
/proc/30426/task/30431/stack:[<0>] do_futex+0x32c/0xb80
/proc/30426/task/30431/stack:[<0>] __x64_sys_futex+0x143/0x180
/proc/30426/task/30431/stack:[<0>] do_syscall_64+0x57/0x190
/proc/30426/task/30431/stack:[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
/proc/30426/task/30432/stack:[<0>] ep_poll+0x40a/0x430
/proc/30426/task/30432/stack:[<0>] do_epoll_wait+0xb7/0xd0
/proc/30426/task/30432/stack:[<0>] __x64_sys_epoll_wait+0x1e/0x30
/proc/30426/task/30432/stack:[<0>] do_syscall_64+0x57/0x190
/proc/30426/task/30432/stack:[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
/proc/30426/task/30433/stack:[<0>] futex_wait_queue_me+0xb9/0x120
/proc/30426/task/30433/stack:[<0>] futex_wait+0x151/0x270
/proc/30426/task/30433/stack:[<0>] do_futex+0x32c/0xb80
/proc/30426/task/30433/stack:[<0>] __x64_sys_futex+0x143/0x180
/proc/30426/task/30433/stack:[<0>] do_syscall_64+0x57/0x190
/proc/30426/task/30433/stack:[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
/proc/30426/task/30434/stack:[<0>] futex_wait_queue_me+0xb9/0x120
/proc/30426/task/30434/stack:[<0>] futex_wait+0x151/0x270
/proc/30426/task/30434/stack:[<0>] do_futex+0x32c/0xb80
/proc/30426/task/30434/stack:[<0>] __x64_sys_futex+0x143/0x180
/proc/30426/task/30434/stack:[<0>] do_syscall_64+0x57/0x190
/proc/30426/task/30434/stack:[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
 

Ovidiu

Active Member
Apr 27, 2014
311
10
38
edit: the output of the second command might not help though.., you could also run it with 'strace', like 'strace -f -yy proxmox-backup-client ....'
Shall I pipe it to a log file and attach it here or what am I looking for?

something like
Code:
strace -f -yy proxmox-backup-client benchmark --repository root@pam@PBS-public-IP:datastore > logfile.log

btw. I've just had 5-10 benchmarks working consecutively. I have no idea what is going on since all I did so far was execute the commands mentioned in this thread :-(
 

Ovidiu

Active Member
Apr 27, 2014
311
10
38
oh, I have news. The latest frozen benchmark just produced an error:

Code:
proxmox-backup-client benchmark --repository root@pam@PBS-public-IP:datastore


Message from syslogd@proxmox-host at May 26 16:33:46 ...
 kernel:[112192.699946] watchdog: BUG: soft lockup - CPU#1 stuck for 25s! [kvm:14958]

btw. proxmox is installed on a VPS with nested virtualization enabled in case it matters. This is just a test-case currently, no mission-critical production server I just needed a few small VMs and the VPS provider did not have any "fitting" options so I took a bigger one and created 3 small VMs with Proxmox. There is no load, there are basically 3 idle Ubuntu-based VMs running on this Proxmox instance.
 

dcsapak

Proxmox Staff Member
Staff member
Feb 1, 2016
8,062
992
163
34
Vienna

dcsapak

Proxmox Staff Member
Staff member
Feb 1, 2016
8,062
992
163
34
Vienna
mhmm ok, from the strace looks like the connections works, but very slowly (there is a 'Resouce temporarily unavailable' error every few kb of data sent) maybe the connection/server is simply overloaded? broken cable/disk?

is there anything in the syslog/journal on the client and/or server side?
 

Ovidiu

Active Member
Apr 27, 2014
311
10
38
talking about ressources there is almost no noticable usage. I am analyzing both server's stats with netdata and I see no peak during the benchmarks. at all. they are idle.

checked the logs and didn't find much on both.

on the target (PBS) I see a few failed benchmarks which say "broken pipe"

Code:
May 26 14:59:20 ict proxmox-backup-proxy[997]: starting new benchmark on datastore 'hyper': "host/benchmark/2021-05-26T12:59:20Z"
May 26 14:59:29 ict proxmox-backup-proxy[997]: UPLOAD END 4194304 bytes
May 26 14:59:30 ict proxmox-backup-proxy[997]: UPLOAD END 4194304 bytes
May 26 14:59:31 ict proxmox-backup-proxy[997]: UPLOAD END 4194304 bytes
May 26 14:59:32 ict proxmox-backup-proxy[997]: UPLOAD END 4194304 bytes
May 26 14:59:33 ict proxmox-backup-proxy[997]: UPLOAD END 4194304 bytes
May 26 14:59:33 ict proxmox-backup-proxy[997]: UPLOAD END 4194304 bytes
May 26 14:59:34 ict proxmox-backup-proxy[997]: UPLOAD END 4194304 bytes
May 26 14:59:34 ict proxmox-backup-proxy[997]: UPLOAD END 4194304 bytes
May 26 14:59:35 ict proxmox-backup-proxy[997]: UPLOAD END 4194304 bytes
May 26 14:59:35 ict proxmox-backup-proxy[997]: benchmark finished successfully
May 26 14:59:35 ict proxmox-backup-proxy[997]: removing backup snapshot "/backup/pbs/hyper/host/benchmark/2021-05-26T12:59:20Z"
May 26 14:59:35 ict proxmox-backup-proxy[997]: TASK OK
May 26 14:59:57 ict proxmox-backup-proxy[997]: starting new benchmark on datastore 'hyper': "host/benchmark/2021-05-26T12:59:56Z"
May 26 15:03:14 ict proxmox-backup-proxy[997]: UPLOAD END 4194304 bytes
May 26 15:06:24 ict proxmox-backup-proxy[997]: UPLOAD END 4194304 bytes
May 26 15:07:28 ict proxmox-backup-proxy[997]: benchmark finished successfully
May 26 15:07:28 ict proxmox-backup-proxy[997]: TASK OK
May 26 15:07:28 ict proxmox-backup-proxy[997]: Upload error: error reading a body from connection: broken pipe
May 26 15:07:28 ict proxmox-backup-proxy[997]: removing backup snapshot "/backup/pbs/hyper/host/benchmark/2021-05-26T12:59:56Z"
May 26 15:07:28 ict proxmox-backup-proxy[997]: starting new benchmark on datastore 'hyper': "host/benchmark/2021-05-26T13:07:27Z"
May 26 15:08:45 ict proxmox-backup-proxy[997]: benchmark finished successfully
May 26 15:08:45 ict proxmox-backup-proxy[997]: Upload error: error reading a body from connection: broken pipe
May 26 15:08:45 ict proxmox-backup-proxy[997]: removing backup snapshot "/backup/pbs/hyper/host/benchmark/2021-05-26T13:07:27Z"
May 26 15:08:45 ict proxmox-backup-proxy[997]: TASK OK
May 26 15:17:01 ict CRON[13604]: pam_unix(cron:session): session opened for user root by (uid=0)
May 26 15:17:01 ict CRON[13605]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
May 26 15:17:01 ict CRON[13604]: pam_unix(cron:session): session closed for user root
May 26 15:41:06 ict proxmox-backup-proxy[997]: starting new benchmark on datastore 'hyper': "host/benchmark/2021-05-26T13:41:06Z"
May 26 15:41:14 ict proxmox-backup-proxy[997]: UPLOAD END 4194304 bytes
May 26 15:41:16 ict proxmox-backup-proxy[997]: UPLOAD END 4194304 bytes
May 26 15:41:17 ict proxmox-backup-proxy[997]: UPLOAD END 4194304 bytes
May 26 15:41:18 ict proxmox-backup-proxy[997]: UPLOAD END 4194304 bytes
May 26 15:41:19 ict proxmox-backup-proxy[997]: UPLOAD END 4194304 bytes
May 26 15:41:19 ict proxmox-backup-proxy[997]: UPLOAD END 4194304 bytes
May 26 15:41:20 ict proxmox-backup-proxy[997]: UPLOAD END 4194304 bytes
May 26 15:41:20 ict proxmox-backup-proxy[997]: UPLOAD END 4194304 bytes
May 26 15:41:21 ict proxmox-backup-proxy[997]: UPLOAD END 4194304 bytes
May 26 15:41:21 ict proxmox-backup-proxy[997]: benchmark finished successfully
May 26 15:41:21 ict proxmox-backup-proxy[997]: removing backup snapshot "/backup/pbs/hyper/host/benchmark/2021-05-26T13:41:06Z"
May 26 15:41:21 ict proxmox-backup-proxy[997]: TASK OK
May 26 15:41:30 ict proxmox-backup-proxy[997]: starting new benchmark on datastore 'hyper': "host/benchmark/2021-05-26T13:41:30Z"
May 26 15:41:44 ict proxmox-backup-proxy[997]: benchmark finished successfully
May 26 15:41:44 ict proxmox-backup-proxy[997]: Upload error: error reading a body from connection: broken pipe
May 26 15:41:44 ict proxmox-backup-proxy[997]: removing backup snapshot "/backup/pbs/hyper/host/benchmark/2021-05-26T13:41:30Z"
May 26 15:41:44 ict proxmox-backup-proxy[997]: TASK OK

while on the source (PX) the only noteworthy entry was about that watchdog kernel error I mentioned earlier but that only occured once so I does seem unlikely to be related to often failing benchmarks.

I've already tried a few mtr tests and I see no packet loss. Proxmox does hardly use any ressoruces, like I said its an idle test system.

Any other pointers about where to look for the cause of the problem? Since the benchmark tests work fine from another PX server I had guessed the problem was with the server where I am running the benchmark tests and not with the PBS system.
 

dcsapak

Proxmox Staff Member
Staff member
Feb 1, 2016
8,062
992
163
34
Vienna
Any other pointers about where to look for the cause of the problem?
for now it looks like a network issue... how does your network look like between the client/server? anything unusual in between? maybe anything that could intervene in tls/http2 traffic?
 

Ovidiu

Active Member
Apr 27, 2014
311
10
38
for now it looks like a network issue... how does your network look like between the client/server? anything unusual in between? maybe anything that could intervene in tls/http2 traffic?
nothing special. PX is on a VPS from a German provider, PBS is on a KVM from a US provider.
tried mtr - no issues found.
tried iperf3 between the two and TCP is ok, plenty of lost packets when using udp hence I switched back from using the wireguard VPN solution to plain simply connecting to the public IP of the PBS.

:-( ok thanks for taking the time to look into it, I guess I'll have to do a few more test from different server s and see whether I can figure out if the problem lies with the PX or the PBS servers or their providers.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!