[SOLVED] Using backup client from within 1 core VM stuck with FUTEX_WAIT_PRIVATE

HansTheAdmin

New Member
Mar 18, 2021
3
0
1
54
Hi.
I have problem using proxmox-backup-client backup ... from within instance on the OpenStack.
There is proper connection between client and server, there is proper permission (can do proxmox-backup-client login).
However after few seconds of copying files, whole process stop but there is no exit from pbc with any error.
Debugging with strace, gave me some more info, however - still don't have where could be a problem.
Trying stap futexes.stp giving me some addresses with "lock", but in strace log, there is no error I can understand.

Environment:
VM type qemu, running on OpenStack.
Ubuntu: 20.04 (installed with apt install as in guide on this forum).
Volume: extended partition, no LVMs
PBC: 1.0.9-1 (however I tried also 1.0.10-1 from testing and 1.0.10-1 from pve repo).
[Edit]
Meanwhile I've tested on different instances with clean Ubuntu 20.04 and 18.04 - the same behavior (process just stopped on different files).

It's strange, because first it stopped while trying to archive some System.map from old kernel, then after autoremove old kernels it stops on some grub module in /boot. Then I was trying to backup only /etc for example - again, starts copying and stuck on some .conf file in apparmor/.
When I'm trying only pxar whole volume with pxar create ... - no problem at all. Rsync whole volume to the same PBS - again - no problems. Only using proxmox-backup-client backup - always wait forever (I've waited once 12h).

Here is part from stap futexes.stp:
Code:
proxmox-backup-[1826124] lock 0x7f61dbf419d0 contended 1 times, 510 avg us
proxmox-backup-[1826124] lock 0x7f61dbd40510 contended 1 times, 10000091 avg us

And here are parts related in strace output:
Code:
     0.000091 [00007f61dbf63496] rt_sigaction(SIGRTMIN, {sa_handler=0x7f61dbf55bf0, sa_mask=[], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x7f61dbf633c0}, NULL, 8) = 0
     0.000182 [00007f61dbf63496] rt_sigaction(SIGRT_1, {sa_handler=0x7f61dbf55c90, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART|SA_SIGINFO, sa_restorer=0x7f61dbf633c0}, NULL, 8) = 0
     0.000097 [00007f61dbf55f24] rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
     0.000098 [00007f61dc55ced4] prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
     0.000322 [00007f61dc55d3eb] brk(NULL) = 0x55dc92b63000
     0.000094 [00007f61dc55d3eb] brk(0x55dc92b84000) = 0x55dc92b84000
     0.000239 [00007f61dbf604a7] futex(0x7f61dc8216bc, FUTEX_WAKE_PRIVATE, 2147483647) = 0
     0.000210 [00007f61dbf604a7] futex(0x7f61dc8216c8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
     0.000168 [00007f61dc55bac7] poll([{fd=0, events=0}, {fd=1, events=0}, {fd=2, events=0}], 3, 0) = 0 (Timeout)
     0.000124 [00007f61dc48c2e6] rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[PIPE], sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7f61dc48c210}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
     0.000276 [00007f61dc55c65c] openat(AT_FDCWD, "/proc/self/maps", O_RDONLY|O_CLOEXEC) = 4
     0.000280 [00007f61dc55ced4] prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
     0.000103 [00007f61dc556689] fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
     0.000113 [00007f61dc55c768] read(4, "55dc90f94000-55dc9102b000 r--p 0"..., 1024) = 1024
     0.000148 [00007f61dc55c768] read(4, "ibdl-2.31.so\n7f61dbf4c000-7f61db"..., 1024) = 1024
     0.000108 [00007f61dc55c768] read(4, "0f000 fc:01 17690               "..., 1024) = 1024
     0.000118 [00007f61dc55c768] read(4, " 2988                       /usr"..., 1024) = 1024
     0.000116 [00007f61dc55c768] read(4, "bssl.so.1.1\n7f61dc437000-7f61dc4"..., 1024) = 1024
     0.000104 [00007f61dc55c768] read(4, "1dc63b000 r--p 00000000 fc:01 17"..., 1024) = 1024
     0.000103 [00007f61dc55c768] read(4, "c812000-7f61dc813000 ---p 001cf0"..., 1024) = 1024
     0.000106 [00007f61dc55c768] read(4, "0 fc:01 5509                    "..., 1024) = 1024
     0.000122 [00007f61dc55c768] read(4, "00-7f61dc916000 rw-p 00009000 fc"..., 1024) = 1024
     0.000103 [00007f61dc55c4ab] close(4) = 0
     0.000151 [00007f61dbf63684] sched_getaffinity(1826124, 32, [0]) = 8
     0.000159 [00007f61dc48c2e6] rt_sigaction(SIGSEGV, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
     0.000101 [00007f61dc48c2e6] rt_sigaction(SIGSEGV, {sa_handler=0x55dc915e2b70, sa_mask=[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_SIGINFO, sa_restorer=0x7f61dc48c210}, NULL, 8) = 0
     0.000093 [00007f61dc48c2e6] rt_sigaction(SIGBUS, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
     0.000089 [00007f61dc48c2e6] rt_sigaction(SIGBUS, {sa_handler=0x55dc915e2b70, sa_mask=[], sa_flags=SA_RESTORER|SA_ONSTACK|SA_SIGINFO, sa_restorer=0x7f61dc48c210}, NULL, 8) = 0
     0.000092 [00007f61dc48cb4b] sigaltstack(NULL, {ss_sp=NULL, ss_flags=SS_DISABLE, ss_size=0}) = 0
     0.000143 [00007f61dc561a46] mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f61dc91c000
     0.000114 [00007f61dc561b0b] mprotect(0x7f61dc91c000, 4096, PROT_NONE) = 0
     0.000090 [00007f61dc48cb4b] sigaltstack({ss_sp=0x7f61dc91d000, ss_flags=0, ss_size=8192}, NULL) = 0
     0.000294 [00007f61dc56189d] getrandom("\xa5\xba\x68\x3e\xb2\xcf\x83\xff\x73\x52\x8f\x10\xb5\x88\x82\x6d", 16, GRND_NONBLOCK) = 16
     0.000809 [00007f61dc55d3eb] brk(0x55dc92ba8000) = 0x55dc92ba8000
     0.000169 [00007f61dc55d3eb] brk(0x55dc92ba0000) = 0x55dc92ba0000
     0.000384 [00007f61dc556eab] openat(AT_FDCWD, "/proc/self/cgroup", O_RDONLY|O_CLOEXEC) = 4
     0.000212 [00007f61dc557142] read(4, "12:freezer:/\n11:blkio:/\n10:rdma:"..., 8192) = 349
     0.000192 [00007f61dc557987] close(4) = 0
     0.000093 [00007f61dc556eab] openat(AT_FDCWD, "/proc/self/mountinfo", O_RDONLY|O_CLOEXEC) = 4
     0.000124 [00007f61dc557142] read(4, "25 30 0:23 / /sys rw,nosuid,node"..., 8192) = 3423
     0.000316 [00007f61dc557987] close(4) = 0
     0.000116 [00007f61dc556eab] openat(AT_FDCWD, "/sys/fs/cgroup/cpu,cpuacct/cpu.cfs_quota_us", O_RDONLY|O_CLOEXEC) = 4
     0.000129 [00007f61dc557142] read(4, "-1\n", 32) = 3
     0.000133 [00007f61dc557142] read(4, "", 29) = 0
     0.000099 [00007f61dc557987] close(4) = 0
     0.000099 [00007f61dc54b96e] sched_getaffinity(0, 128, [0]) = 8
     0.000133 [00007f61dc568c2b] epoll_create1(EPOLL_CLOEXEC) = 4
     0.000100 [00007f61dc56848b] eventfd2(0, EFD_CLOEXEC|EFD_NONBLOCK) = 5
     0.000095 [00007f61dc568c5e] epoll_ctl(4, EPOLL_CTL_ADD, 5, {EPOLLIN|EPOLLRDHUP|EPOLLET, {u32=2147483648, u64=2147483648}}) = 0
     0.000113 [00007f61dc557a0b] dup(4) = 6
     0.000190 [00007f61dc56993e] socketpair(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0, [7, 8]) = 0
     0.000169 [00007f61dc55c5a4] fcntl(7, F_DUPFD_CLOEXEC, 0) = 9
     0.000121 [00007f61dc568c5e] epoll_ctl(6, EPOLL_CTL_ADD, 9, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=0, u64=0}}) = 0
     0.000173 [00007f61dc48c2e6] rt_sigaction(SIGCHLD, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
     0.000120 [00007f61dc48c2e6] rt_sigaction(SIGCHLD, {sa_handler=0x55dc915c9b30, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART|SA_SIGINFO|SA_NOCLDSTOP, sa_restorer=0x7f61dc48c210}, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
     0.000192 [00007f61dbf604a7] futex(0x7f61dbf4d0c8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
     0.000130 [00007f61dc561a46] mmap(NULL, 2101248, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f61dbd41000
     0.000105 [00007f61dc561b0b] mprotect(0x7f61dbd42000, 2097152, PROT_READ|PROT_WRITE) = 0
     0.000109 [00007f61dc568285] clone(child_stack=0x7f61dbf40d70, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[1826125], tls=0x7f61dbf41700, child_tidptr=0x7f61dbf419d0) = 1826125
     0.000237 [00007f61dc56189d] futex(0x7f61dbf44bd0, FUTEX_WAIT_PRIVATE, 1, NULLstrace: Process 1826125 attached
<unfinished ...>
[pid 1826125]      0.000163 [00007f61dbf5759a] set_robust_list(0x7f61dbf419e0, 24) = 0
[pid 1826125]      0.000146 [00007f61dc48cb4b] sigaltstack(NULL, {ss_sp=NULL, ss_flags=SS_DISABLE, ss_size=0}) = 0
[pid 1826125]      0.000182 [00007f61dc561a46] mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f61dc919000
[pid 1826125]      0.000226 [00007f61dc561b0b] mprotect(0x7f61dc919000, 4096, PROT_NONE) = 0
[pid 1826125]      0.000154 [00007f61dc48cb4b] sigaltstack({ss_sp=0x7f61dc91a000, ss_flags=0, ss_size=8192}, NULL) = 0
[pid 1826125]      0.000144 [00007f61dc568e9e] prctl(PR_SET_NAME, "tokio-runtime-w"...) = 0
[pid 1826125]      0.000100 [00007f61dc561a46] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f61d3d41000
[pid 1826125]      0.000094 [00007f61dc561adb] munmap(0x7f61d3d41000, 2879488) = 0
[pid 1826125]      0.000106 [00007f61dc561adb] munmap(0x7f61d8000000, 64229376) = 0
[pid 1826125]      0.000090 [00007f61dc561b0b] mprotect(0x7f61d4000000, 135168, PROT_READ|PROT_WRITE) = 0
[pid 1826125]      0.000117 [00007f61dbf63684] sched_getaffinity(1826125, 32, [0]) = 8
[pid 1826125]      0.000158 [00007f61dc56189d] futex(0x7f61dbf44bd0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 1826124]      0.000103 [00007f61dc56189d] <... futex resumed>) = 0
[pid 1826125]      0.000055 [00007f61dc56189d] <... futex resumed>) = 1
[pid 1826124]      0.000066 [00007f61dbf58cd7] futex(0x7f61dbf419d0, FUTEX_WAIT, 1826125, NULL <unfinished ...>
[pid 1826125]      0.000101 [00007f61dc48cb4b] sigaltstack({ss_sp=NULL, ss_flags=SS_DISABLE, ss_size=8192}, NULL) = 0
[pid 1826125]      0.000091 [00007f61dc561adb] munmap(0x7f61dc919000, 12288) = 0
[pid 1826125]      0.000120 [00007f61dc561bdb] madvise(0x7f61dbd41000, 2076672, MADV_DONTNEED) = 0
[pid 1826125]      0.000109 [00007f61dbf576c6] exit(0) = ?
[pid 1826124]      0.000137 [00007f61dbf58cd7] <... futex resumed>) = 0
[pid 1826124]      0.000106 [00007f61dc54b96e] sched_getaffinity(0, 128, [0]) = 8
[pid 1826124]      0.000133 [00007f61dc568c2b] epoll_create1(EPOLL_CLOEXEC) = 10
[pid 1826124]      0.000111 [00007f61dc56848b] eventfd2(0, EFD_CLOEXEC|EFD_NONBLOCK) = 11
[pid 1826124]      0.000088 [00007f61dc568c5e] epoll_ctl(10, EPOLL_CTL_ADD, 11, {EPOLLIN|EPOLLRDHUP|EPOLLET, {u32=2147483648, u64=2147483648}}) = 0
[pid 1826124]      0.000110 [00007f61dc557a0b] dup(10) = 12
[pid 1826124]      0.000115 [00007f61dc55c5a4] fcntl(7, F_DUPFD_CLOEXEC, 0) = 13
[pid 1826124]      0.000096 [00007f61dc568c5e] epoll_ctl(12, EPOLL_CTL_ADD, 13, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=0, u64=0}}) = 0
[pid 1826124]      0.000158 [00007f61dc568285] clone(child_stack=0x7f61dbf40d70, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[1826126], tls=0x7f61dbf41700, child_tidptr=0x7f61dbf419d0) = 1826126
[pid 1826124]      0.000413 [00007f61dc55d3eb] brk(0x55dc92bc1000) = 0x55dc92bc1000
[pid 1826124]      0.000604 [00007f61dc55d3eb] brk(0x55dc92be2000) = 0x55dc92be2000
[pid 1826124]      0.000809 [00007f61dc55d3eb] brk(0x55dc92c03000) = 0x55dc92c03000
[pid 1826124]      0.001031 [00007f61dc561a46] mmap(NULL, 200704, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f61dbd10000
[pid 1826124]      0.000394 [00007f61dc561adb] munmap(0x7f61dbd10000, 200704) = 0
[pid 1826124]      0.000231 [00007f61dc55d3eb] brk(0x55dc92c29000) = 0x55dc92c29000
[pid 1826124]      0.000289 [00007f61dc55d3eb] brk(0x55dc92c58000) = 0x55dc92c58000
[pid 1826124]      0.000713 [00007f61dc56189d] statx(0, NULL, AT_STATX_SYNC_AS_STAT, STATX_ALL,  <unfinished ...>
[pid 1826125]      0.000164 [????????????????] +++ exited with 0 +++

Code:
[pid 1826124]      0.000097 [00007f61dc56990b] socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_TCP) = 14
[pid 1826124]      0.000449 [00007f61dc55c5a4] fcntl(14, F_GETFL) = 0x2 (flags O_RDWR)
[pid 1826124]      0.000203 [00007f61dc55c5a4] fcntl(14, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 1826124]      0.000159 [00007f61dc56933b] connect(14, {sa_family=AF_INET, sin_port=htons(8007), sin_addr=inet_addr("10.10.10.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
[pid 1826124]      0.000298 [00007f61dc568c5e] epoll_ctl(12, EPOLL_CTL_ADD, 14, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=1, u64=1}}) = 0
[pid 1826124]      0.000177 [00007f61dc55721f] write(11, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 1826126]      0.000184 [00007f61dc5685ce] <... epoll_wait resumed>[{EPOLLIN, {u32=2147483648, u64=2147483648}}], 1024, 118764) = 1
[pid 1826124]      0.000129 [00007f61dc56189d] futex(0x7f61dbf44bd0, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 1826126]      0.000145 [00007f61dc54b89b] sched_yield() = 0
[pid 1826126]      0.000192 [00007f61dc54b89b] sched_yield() = 0
[pid 1826126]      0.000148 [00007f61dc54b89b] sched_yield() = 0
[pid 1826126]      0.000155 [00007f61dc5685ce] epoll_wait(10, [{EPOLLOUT, {u32=1, u64=1}}], 1024, 8103) = 1
[pid 1826126]      0.000257 [00007f61dc56189d] futex(0x7f61dbf44bd0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 1826124]      0.000154 [00007f61dc56189d] <... futex resumed>) = 0
[pid 1826126]      0.000095 [00007f61dc56189d] <... futex resumed>) = 1
[pid 1826124]      0.000101 [00007f61dc5693fe] getsockopt(14, SOL_SOCKET, SO_ERROR,  <unfinished ...>
[pid 1826126]      0.000139 [00007f61dc54b89b] sched_yield( <unfinished ...>
[pid 1826124]      0.000121 [00007f61dc5693fe] <... getsockopt resumed>[0], [4]) = 0
[pid 1826126]      0.000116 [00007f61dc54b89b] <... sched_yield resumed>) = 0
[pid 1826124]      0.000098 [00007f61dc5698ae] setsockopt(14, SOL_TCP, TCP_NODELAY, [1], 4 <unfinished ...>
[pid 1826126]      0.000144 [00007f61dc54b89b] sched_yield( <unfinished ...>
[pid 1826124]      0.000121 [00007f61dc5698ae] <... setsockopt resumed>) = 0
[pid 1826126]      0.000111 [00007f61dc54b89b] <... sched_yield resumed>) = 0
[pid 1826124]      0.000156 [00007f61dc5698ae] setsockopt(14, SOL_SOCKET, SO_KEEPALIVE, [1], 4 <unfinished ...>
[pid 1826127]      0.000153 [00007f61dc56189d] futex(0x7f61dbd40510, FUTEX_WAIT_PRIVATE, 1, {tv_sec=9, tv_nsec=999977103} <unfinished ...>
[pid 1826126]      0.000140 [00007f61dc54b89b] sched_yield( <unfinished ...>
[pid 1826124]      0.000123 [00007f61dc5698ae] <... setsockopt resumed>) = 0
[pid 1826126]      0.000081 [00007f61dc54b89b] <... sched_yield resumed>) = 0
[pid 1826124]      0.000094 [00007f61dc5698ae] setsockopt(14, SOL_TCP, TCP_KEEPIDLE, [120], 4 <unfinished ...>
[pid 1826126]      0.000138 [00007f61dc5685ce] epoll_wait(10,  <unfinished ...>
[pid 1826124]      0.000123 [00007f61dc5698ae] <... setsockopt resumed>) = 0
 
Last edited:
Ok, I found the problem.
If openstack instance has only one core, then backup will get FUTEX_WAIT_PRIVATE paused forever. However if I start backup as image not as pxar - then backup will be finished.
So proxmox-backup-client + pxar + openstack one core = never finished backup.
 
I have changed the subject thread, as I think this has nothing to do with OpenStack or PVE VM.
This is probably a bug and can be replicated with any system but with only one core available (I don't have x64 bare metal machine to test with, so I'm guessing here).
Or should be mentioned in "requirements", that proxmox-backup-client backup arch.pxar: require at least 2 cores.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!