pveproxy Segfault error 4 in EV.so

CamoYoshi

New Member
Nov 21, 2016
4
0
1
31
Hello all.

Running Proxmox 4.3 on my FX-6300 system with 8GB of RAM just to try some things out. Aftert about day of just letting the system idle (no running VMs or anything), I notice that the server has a red X in the web interface. I SSH'd in and checked the syslog to find this snippet:

Code:
[ 1769.981028] pveproxy worker[1331]: segfault at 7f706a9c3df4 ip 00007f70697a57d6 sp 00007ffffedf48d0 error 4 in EV.so[7f706979b000+27000]
[10479.826583] pvestatd[1307]: segfault at 0 ip 00007f1789471534 sp 00007ffe068c23b0 error 4 in libperl.so.5.20.2[7f17893b3000+1b7000]
[11745.945249] BUG: unable to handle kernel paging request at 0000000000040034
[11745.945268] IP: [<ffffffff811cc474>] unlink_anon_vmas+0xd4/0x1e0
[11745.945281] PGD 2139f4067 PUD 214bb1067 PMD 0
[11745.945290] Oops: 0002 [#1] SMP
[11745.945297] Modules linked in: ip_set ip6table_filter ip6_tables iptable_filter ip_tables x_tables softdog nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfnetlink_log nfnetlink zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO) dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c amdkfd kvm_amd amd_iommu_v2 mxm_wmi kvm irqbypass radeon crct10dif_pclmul crc32_pclmul snd_hda_codec_realtek snd_hda_codec_generic ttm aesni_intel aes_x86_64 lrw gf128mul glue_helper drm_kms_helper ablk_helper snd_hda_intel cryptd drm snd_hda_codec serio_raw edac_mce_amd pcspkr snd_hda_core input_leds k10temp snd_hwdep i2c_algo_bit fam15h_power edac_core fb_sys_fops snd_pcm syscopyarea
[11745.945439]  sysfillrect snd_timer sysimgblt snd soundcore i2c_piix4 shpchp wmi 8250_fintek tpm_infineon mac_hid vhost_net vhost macvtap macvlan autofs4 btrfs xor raid6_pq hid_generic usbmouse usbkbd usbhid hid psmouse r8169 mii ahci libahci fjes
[11745.945489] CPU: 2 PID: 1295 Comm: server Tainted: P           O    4.4.19-1-pve #1
[11745.945500] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./GA-990FXA-UD3 Ultra-CF, BIOS F1 04/11/2016
[11745.945515] task: ffff8800dd179b80 ti: ffff880213a44000 task.ti: ffff880213a44000
[11745.945524] RIP: 0010:[<ffffffff811cc474>]  [<ffffffff811cc474>] unlink_anon_vmas+0xd4/0x1e0
[11745.945536] RSP: 0018:ffff880213a47d60  EFLAGS: 00010206
[11745.945543] RAX: 0000000000040000 RBX: ffff8800dcea91f8 RCX: 00007f8214f1b000
[11745.945552] RDX: 00007f8214f06000 RSI: ffff8800dcea9208 RDI: ffff8800dcea9190
[11745.945560] RBP: ffff880213a47d98 R08: 000000000001abd0 R09: ffffffffffffffff
[11745.945569] R10: 00007f8214346040 R11: 0000000000000217 R12: 0000000000000000
[11745.945578] R13: ffff8800dcea91f8 R14: ffff8800dcea9208 R15: 0000000000000000
[11745.945588] FS:  00007f820d811700(0000) GS:ffff88021ec80000(0000) knlGS:0000000000000000
[11745.945598] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[11745.945605] CR2: 0000000000040034 CR3: 00000002140c9000 CR4: 00000000000406e0
[11745.945614] Stack:
[11745.945618]  fffffffd1291df18 ffff8800dcea9190 ffff8800dcea9190 00007f8214f0b000
[11745.945629]  00007f8214f06000 ffff880213a47df0 0000000000000000 ffff880213a47dd8
[11745.945640]  ffffffff811bd9ab 00007f8214f1b000 00007f8214f0b000 00007f8214f0e000
[11745.945652] Call Trace:
[11745.945659]  [<ffffffff811bd9ab>] free_pgtables+0x3b/0x110
[11745.945666]  [<ffffffff811c4e70>] unmap_region+0xd0/0x120
[11745.945675]  [<ffffffff8120f6c0>] ? __fput+0x190/0x220
[11745.945682]  [<ffffffff811c53e3>] ? vma_rb_erase+0x113/0x210
[11745.945693]  [<ffffffff811c7181>] do_munmap+0x1f1/0x430
[11745.945700]  [<ffffffff811c7401>] vm_munmap+0x41/0x60
[11745.945707]  [<ffffffff811c8342>] SyS_munmap+0x22/0x30
[11745.945714]  [<ffffffff818544b6>] entry_SYSCALL_64_fastpath+0x16/0x75
[11745.945722] Code: 77 40 48 89 df e8 cd ce fe ff 49 83 7f 40 00 75 80 49 8b 47 38 83 68 34 01 eb aa 48 8b 45 d0 48 8b 80 88 00 00 00 48 85 c0 74 04 <83> 68 34 01 4d 85 e4 74 0a 49 8d 7c 24 08 e8 49 e4 ef ff 48 8b
[11745.945802] RIP  [<ffffffff811cc474>] unlink_anon_vmas+0xd4/0x1e0
[11745.945811]  RSP <ffff880213a47d60>
[11745.945818] CR2: 0000000000040034
[11745.949308] ---[ end trace dd4f5bfce4c35a1d ]---
[51619.652485] swap_free: Unused swap offset entry 00004000
[51619.652499] BUG: Bad page map in process ksmtuned  pte:00800000 pmd:c5127067
[51619.652509] addr:00007fcacea47000 vm_flags:08000070 anon_vma:          (null) mapping:ffff88021229da40 index:84
[51619.652527] file:libtinfo.so.5.9 fault:ext4_filemap_fault mmap:ext4_file_mmap readpage:ext4_readpage
[51619.652540] CPU: 2 PID: 15971 Comm: ksmtuned Tainted: P      D    O    4.4.19-1-pve #1
[51619.652551] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./GA-990FXA-UD3 Ultra-CF, BIOS F1 04/11/2016
[51619.652566]  0000000000000286 0000000018e76e2a ffff880036067c60 ffffffff813f3de3
[51619.652579]  00007fcacea47000 ffff880213ba6af0 ffff880036067cb0 ffffffff811bc79f
[51619.652590]  ffff880036067c88 ffffffff811d3baa 0000000000004000 00007fcacea47000
[51619.652601] Call Trace:
[51619.652608]  [<ffffffff813f3de3>] dump_stack+0x63/0x90
[51619.652617]  [<ffffffff811bc79f>] print_bad_pte+0x1ef/0x2b0
[51619.652625]  [<ffffffff811d3baa>] ? swap_info_get+0x9a/0xe0
[51619.652633]  [<ffffffff811be280>] unmap_single_vma+0x510/0x840
[51619.653256]  [<ffffffff811bf03a>] unmap_vmas+0x4a/0xa0
[51619.653849]  [<ffffffff811c8747>] exit_mmap+0xa7/0x170
[51619.654459]  [<ffffffff8107eb07>] mmput+0x57/0x110
[51619.655038]  [<ffffffff81084473>] do_exit+0x323/0xb30
[51619.655639]  [<ffffffff8106b50a>] ? __do_page_fault+0x1ba/0x410
[51619.656228]  [<ffffffff81084d03>] do_group_exit+0x43/0xc0
[51619.656835]  [<ffffffff81084d94>] SyS_exit_group+0x14/0x20
[51619.657417]  [<ffffffff818544b6>] entry_SYSCALL_64_fastpath+0x16/0x75
[51619.658177] BUG: Bad rss-counter state mm:ffff8800dbf79e00 idx:2 val:-1

The important bits that stood out to me were the two segfaults at the beginning.

Previously I was having issues with just booting Linux completely due to something with the IOMMU, so I had to boot the kernel with iommu=soft after some googling (otherwise the NIC nor any of the USB ports wouldn't work).

I've let a Memtest86+ run for about 8 hours and had no errors. SSD that Proxmox boots from is good, the PSU is healthy, and the CPU works in other systems I've tried, plus the system runs Windows 7 and 10 without any issues, so I'm a bit lost right now. Google gives me either results that are too vague or nothing at all after googling about some of the segfault messages.

I'm not sure what other debug info I should include here but if any more info is needed I'd be happy to provide it.
 
you could try to reinstall the package with the library

it is either
libcoro-perl
or
libev-perl

so a
Code:
apt-get install --reinstall libcoro-perl libev-perl

could help

edit:

here is only libev-perl installed,
so i guess this is the one
 
Tried that but as soon as I run apt-get, I get this on my terminal about every 20 seconds:

Code:
 kernel:[41439.551464] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [vgs:11999]

Google says trying different kernel versions but I don't know if there's a way to do that with Proxmox.
 
After much googling around and trying different things, I have determined this motherboard simply does not work with Linux in general. Kernel 4.8.5 on System Rescue CD (aka Gentoo) does not work either. This issue isn't so much a problem with Proxmox as it is with the hardware itself.