[SOLVED] "vma create" segfaulting in libglib-2.0.so.0.6600.8 - multi-threading bug?

Nov 25, 2019
7
1
8
Hi!

I have a PBS system (latest v2.1-1 with kernel 5.13.19-4-pve) running with ZFS raidz2 on a ProLiant DL380 Gen9 server (Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz, 1 socket with 8 cores + 2 threads per core AKA 16 CPUs).
The system also has pve-qemu-kvm v6.1.1-1 installed (from http://download.proxmox.com/debian/pve), to execute vma create for restoring VMs for direct (import) usage in PVE environments.
This basically works fine, but for a few VMs and under certain(?) situations I get immediate segfaults when running vma create:

Code:
root@pbs01:~# vma create /mnt/offsite-backup/vzdump-qemu-103-2022_01_27-00_04_36.vma -v -c /srv/restore/103/fw.conf -c /srv/restore/103/qemu-server.conf drive-virtio0=/srv/restore/103/drive-virtio0.img drive-virtio1=/srv/restore/103/dri
ve-virtio1.img drive-virtio2=/srv/restore/103/drive-virtio2.img
vma: vma_writer_register_stream 'drive-virtio2' failed
Trace/breakpoint trap (core dumped)
root@pbs01:~# dmesg -T | tail -1
[Fri Feb 11 14:54:05 2022] traps: vma[3258736] trap int3 ip:7f9e73366332 sp:7ffd45559170 error:0 in libglib-2.0.so.0.6600.8[7f9e73329000+88000]
root@pbs01:~# dpkg -l libglib2.0-0\* | grep '^ii'
ii  libglib2.0-0:amd64        2.66.8-1     amd64        GLib library of C routines
ii  libglib2.0-0-dbgsym:amd64 2.66.8-1     amd64        debug symbols for libglib2.0-0

I have bt full from such a coredump available with dbg packages being present (briefly stripped down in the paste):

https://paste.grml.org/hidden/5fdfb37c/

What's interesting is, that this vma create ... segfaults, but when executing under gdb, it's working perfectly fine:

Code:
root@pbs01:~# gdb --args vma create /mnt/offsite-backup/vzdump-qemu-103-2022_01_27-00_04_36.vma -v -c /srv/restore/103/fw.conf -c /srv/restore/103/qemu-server.conf drive-virtio0=/srv/restore/103/drive-virtio0.img drive-virtio1=/srv/rest
ore/103/drive-virtio1.img drive-virtio2=/srv/restore/103/drive-virtio2.img
GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from vma...
Reading symbols from /usr/lib/debug/.build-id/3a/977ef179bb3be80ff7c2afff3ef350aaad5e9f.debug...
(gdb) run
Starting program: /usr/bin/vma create /mnt/offsite-backup/vzdump-qemu-103-2022_01_27-00_04_36.vma -v -c /srv/restore/103/fw.conf -c /srv/restore/103/qemu-server.conf drive-virtio0=/srv/restore/103/drive-virtio0.img drive-virtio1=/srv/restore/103/drive-virtio1.img drive-virtio2=/srv/restore/103/drive-virtio2.img
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffeca05700 (LWP 3259172)]
[New Thread 0x7fffe7c4e700 (LWP 3259173)]
[New Thread 0x7fffe6f48700 (LWP 3259174)]
progress 0% 393216/416611827712 258048
[New Thread 0x7fffe6747700 (LWP 3259175)]
[New Thread 0x7fffe5f46700 (LWP 3261574)]
[New Thread 0x7fffe5745700 (LWP 3261575)]
progress 1% 4166254592/416611827712 2710716416
progress 2% 8332247040/416611827712 4807213056
progress 3% 12498370560/416611827712 6867615744
progress 4% 16664494080/416611827712 10424430592
progress 5% 20830617600/416611827712 12500537344
progress 6% 24996806656/416611827712 14348218368
progress 7% 29162930176/416611827712 17227669504
progress 8% 33329053696/416611827712 18972573696
[...]
progress 98% 408279646208/416611827712 109400383488
progress 99% 412445769728/416611827712 109400383488
progress 100% 416611827712/416611827712 109400383488
image drive-virtio0: size=34359738368 zeros=608661504 saved=33751076864
image drive-virtio1: size=274877906944 zeros=1423118336 saved=273454788608
image drive-virtio2: size=107374182400 zeros=107368603648 saved=5578752
[Thread 0x7fffe5f46700 (LWP 3261574) exited]
[Thread 0x7fffe6747700 (LWP 3259175) exited]
[Thread 0x7fffe6f48700 (LWP 3259174) exited]
[Thread 0x7fffe7c4e700 (LWP 3259173) exited]
[Thread 0x7fffeca05700 (LWP 3259172) exited]
[Thread 0x7fffecb66cc0 (LWP 3259168) exited]
[Inferior 1 (process 3259168) exited normally]

By identifying this, I tried binding the vma create process to a single CPU (via taskset 1 vma create ...), and this also seems to work reliable.

Now this looks like a bug related to threading?
Any ideas what's going wrong here?
I seem to have a reproducible cmdline available, and can also share such a coredump file in private if that would help.
More than happy to provide any further information. :)
 
Hi,
thank you for the detailed report! I think I was able to reproduce the issue and sent a patch that should fix it.
 
  • Like
Reactions: mika
This should be fixed right?
Had a spontaneous reboots recently and found this in the logs when investigating.

Code:
pr 20 02:53:41 proxmox kernel: [20761.448291] show_signal_msg: 2 callbacks suppressed
Apr 20 02:53:41 proxmox kernel: [20761.448294] kvm[15747]: segfault at 51 ip 00007f4f21328f63 sp 00007fffd99d9020 error 4 in libglib-2.0.so.0.6600.8[7f4f212f5000+88000] likely on CPU 4 (core 4, socket 0)
Apr 20 02:53:41 proxmox kernel: [20761.448312] Code: 8b 7b 18 48 85 ff 74 ae 8b 43 08 85 c0 74 a7 48 8b 33 ba 01 00 00 00 e8 cb e2 ff ff eb 98 66 0f 1f 84 00 00 00 00 00 48 8b 03 <48> 8b 68 50 eb ac 0f 1f 80 00 00 00 00 41 55 41 54 55 53 48 83 ec
Apr 20 02:53:41 proxmox kernel: [20761.467286]  zd48: p1 p2 p3 p4 p5 p6 p7 p8
Apr 20 02:53:41 proxmox kernel: [20761.502282] vmbr0: port 4(tap102i0) entered disabled state
Apr 20 02:53:41 proxmox kernel: [20761.502407] vmbr0: port 4(tap102i0) entered disabled state
Apr 20 02:53:41 proxmox systemd[1]: 102.scope: Succeeded.
Apr 20 02:53:41 proxmox systemd[1]: 102.scope: Consumed 54min 23.314s CPU time.
Apr 20 02:53:42 proxmox qmeventd[805800]: Starting cleanup for 102
Apr 20 02:53:42 proxmox qmeventd[805800]: Finished cleanup for 102

# pveversion --verbose
proxmox-ve: 7.4-1 (running kernel: 6.2.6-1-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-6.2: 7.3-8
pve-kernel-5.15: 7.3-3
pve-kernel-5.19: 7.2-15
pve-kernel-5.4: 6.4-20
pve-kernel-6.2.6-1-pve: 6.2.6-1
pve-kernel-5.19.17-2-pve: 5.19.17-2
pve-kernel-5.15.102-1-pve: 5.15.102-1
pve-kernel-5.4.203-1-pve: 5.4.203-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 14.2.21-1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36+pve2
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-4
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.1-1
proxmox-backup-file-restore: 2.4.1-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.1-1
proxmox-widget-toolkit: 3.6.5
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-2
pve-firewall: 4.3-1
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1
 
Last edited:
Hi,
This should be fixed right?
yes, but your error is different.

Had a spontaneous reboots recently and found this in the logs when investigating.

Code:
pr 20 02:53:41 proxmox kernel: [20761.448291] show_signal_msg: 2 callbacks suppressed
Apr 20 02:53:41 proxmox kernel: [20761.448294] kvm[15747]: segfault at 51 ip 00007f4f21328f63 sp 00007fffd99d9020 error 4 in libglib-2.0.so.0.6600.8[7f4f212f5000+88000] likely on CPU 4 (core 4, socket 0)
It's for the kvm binary, not the vma binary. And the segfault error code is 4, not 0.

But this alone should not lead to a spontaneous reboot, and it apparently didn't, because the log goes on. When did the reboot happen? Is there anything else in the logs? I'd also suggest you run a memtest.
 
Thanks, you are right I should have looked a bit closer.
Will do a memtest soon but the crashes coincide exactly with an upgrade from 6.4 to 7.4
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!