pve-firewall: General protection fault triggered in ebtables-restore

thenickdude

Active Member
Oct 13, 2016
29
3
43
I have twice today had my host totally crash, while a guest was busy compiling and installing a bunch of different software packages (that load-case may be unrelated to the actual problem).

Before this started happening, I ran Proxmox updates and rebooted the host, which included an update from pve-kernel-5.4.78-1-pve to pve-kernel-5.4.78-2-pve. After completing that I was trying to upgrade my MacPorts packages inside my Big Sur VM, but after about 20 minutes the host would die.

Both times Proxmox logged the same general protection fault error in ebtables-restore, before the host totally died and stopped writing further logs, here's the first incident:

Code:
Dec 16 13:10:01 proxmox systemd[1]: Started Proxmox VE replication runner.
Dec 16 13:10:28 proxmox kernel: [ 2477.109806] general protection fault: 0000 [#1] SMP PTI
Dec 16 13:10:28 proxmox kernel: [ 2477.116235] CPU: 13 PID: 16588 Comm: ebtables-restor Tainted: P           O      5.4.78-2-pve #1
Dec 16 13:10:28 proxmox kernel: [ 2477.141017] RIP: 0010:__kmalloc_node+0x19d/0x330
Dec 16 13:10:28 proxmox kernel: [ 2477.147086] Code: 75 0e 4d 89 f9 41 f6 47 0b 04 0f 84 ef fe ff ff 4c 89 ff e8 25 f8 01 00 49 89 c1 e9 df fe ff ff 41
8b 41 20 49 8b 39 4c 01 d0 <48> 8b 18 48 89 c1 49 33 99 70 01 00 00 4c 89 d0 48 0f c9 48 31 cb
Dec 16 13:10:28 proxmox kernel: [ 2477.171446] RAX: 5c691476e7e1ef27 RBX: 0000000000000000 RCX: 0000000000000000
Dec 16 13:10:28 proxmox kernel: [ 2477.183486] RBP: ffff9fc231d7fbd0 R08: ffff8bd35f970040 R09: ffff8bcb5f407b80
Dec 16 13:10:28 proxmox kernel: [ 2477.195246] R13: 0000000000000008 R14: 00000000ffffffff R15: ffff8bcb5f407b80
Dec 16 13:10:28 proxmox kernel: [ 2477.212439] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 16 13:10:28 proxmox kernel: [ 2477.223792] Call Trace:
Dec 16 13:10:28 proxmox kernel: [ 2477.234838]  __vmalloc_node_range+0xd4/0x270
Dec 16 13:10:28 proxmox kernel: [ 2477.245659]  ? translate_table+0x5a0/0x710 [ebtables]
Dec 16 13:10:28 proxmox kernel: [ 2477.256102]  do_replace_finish+0x232/0x730 [ebtables]
Dec 16 13:10:28 proxmox kernel: [ 2477.266130]  ? __vmalloc_node_range+0x1eb/0x270
Dec 16 13:10:28 proxmox kernel: [ 2477.275773]  do_ebt_set_ctl+0x69/0x80 [ebtables]
Dec 16 13:10:28 proxmox kernel: [ 2477.285091]  ip_setsockopt+0x66/0x90
Dec 16 13:10:28 proxmox kernel: [ 2477.293961]  sock_common_setsockopt+0x1a/0x20
Dec 16 13:10:28 proxmox kernel: [ 2477.302542]  __x64_sys_setsockopt+0x24/0x30
Dec 16 13:10:28 proxmox kernel: [ 2477.310705]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 16 13:10:28 proxmox kernel: [ 2477.318667] Code: ff ff ff c3 48 8b 15 25 04 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b1 0f 1f 80 00 00 00 00 49
89 ca b8 36 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f6 03 0c 00 f7 d8 64 89 01 48
Dec 16 13:10:28 proxmox kernel: [ 2477.339310] RAX: ffffffffffffffda RBX: 0000000000000e38 RCX: 00007fbaec9a9a6a
Dec 16 13:10:28 proxmox kernel: [ 2477.347877] RBP: 0000562fe2455150 R08: 0000000000000e38 R09: 0000562fe24551d0
Dec 16 13:10:28 proxmox kernel: [ 2477.356114] R13: 00007fbaecaa9468 R14: 0000562fe2453750 R15: 0000562fe2455fd0
Dec 16 13:10:28 proxmox kernel: [ 2477.360137]  hwmon_vid coretemp vfio_pci vfio_virqfd irqbypass vfio_iommu_type1 vfio ip_tables x_tables autofs4 zfs(P
O) zunicode(PO) zlua(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) hid_logitech_hidpp hid_logitech_dj usbmouse hid_generic usbkbd usbhid hid uas u
sb_storage raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear xhci_pci ahci i
sci ehci_pci xhci_hcd i2c_i801 libahci lpc_ich libsas e1000e ehci_hcd scsi_transport_sas wmi
Dec 16 13:10:28 proxmox kernel: [ 2477.615334] RIP: 0010:__kmalloc_node+0x19d/0x330
Dec 16 13:10:28 proxmox kernel: [ 2477.639440] RSP: 0018:ffff9fc231d7fb90 EFLAGS: 00010206
Dec 16 13:10:28 proxmox kernel: [ 2477.645396] RAX: 5c691476e7e1ef27 RBX: 0000000000000000 RCX: 0000000000000000
Dec 16 13:10:28 proxmox kernel: [ 2477.657557] RBP: ffff9fc231d7fbd0 R08: ffff8bd35f970040 R09: ffff8bcb5f407b80
Dec 16 13:10:28 proxmox kernel: [ 2477.669778] R13: 0000000000000008 R14: 00000000ffffffff R15: ffff8bcb5f407b80
Dec 16 13:10:28 proxmox kernel: [ 2477.687526] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 16 13:10:28 proxmox pve-firewall[6736]: status update error: ebtables_restore_cmdlist: got signal 11
Dec 16 13:18:04 proxmox systemd-modules-load[1411]: Inserted module 'vfio'
Dec 16 13:18:04 proxmox kernel: [    0.000000] microcode: microcode updated early to revision 0x718, date = 2019-05-21

And the second:

Code:
Dec 16 15:37:01 proxmox systemd[1]: Started Proxmox VE replication runner.
Dec 16 15:37:15 proxmox kernel: [ 8404.719136] CPU: 20 PID: 9752 Comm: ebtables-restor Tainted: P           O      5.4.78-2-pve #1
Dec 16 15:37:15 proxmox kernel: [ 8404.768147] RSP: 0018:ffffb65653833b90 EFLAGS: 00010202
Dec 16 15:37:15 proxmox kernel: [ 8404.786276] RBP: ffffb65653833bd0 R08: ffff94c7dfb30040 R09: ffff94c7df407b80
Dec 16 15:37:15 proxmox kernel: [ 8404.803820] FS:  00007fc0af2df740(0000) GS:ffff94c7dfb00000(0000) knlGS:0000000000000000
Dec 16 15:37:15 proxmox kernel: [ 8404.826467] Call Trace:
Dec 16 15:37:15 proxmox kernel: [ 8404.842788]  vmalloc+0x4c/0x50
Dec 16 15:37:15 proxmox kernel: [ 8404.858314]  do_replace_finish+0x232/0x730 [ebtables]
Dec 16 15:37:15 proxmox kernel: [ 8404.873081]  do_replace+0x15f/0x1e0 [ebtables]
Dec 16 15:37:15 proxmox kernel: [ 8404.887084]  ip_setsockopt+0x66/0x90
Dec 16 15:37:15 proxmox kernel: [ 8404.900174]  __sys_setsockopt+0xcc/0x180
Dec 16 15:37:15 proxmox kernel: [ 8404.912539]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 16 15:37:15 proxmox kernel: [ 8404.932768] RSP: 002b:00007ffff5545278 EFLAGS: 00000206 ORIG_RAX: 0000000000000036
Dec 16 15:37:15 proxmox kernel: [ 8404.949667] RBP: 00005654ba2a3150 R08: 0000000000000e38 R09: 00005654ba2a31d0
Dec 16 15:37:15 proxmox kernel: [ 8404.961913] Modules linked in: veth 8021q garp mrp ebtable_filter ebtables ip6table_raw ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables iptable_raw xt_mac xt_NFLOG xt_limit ipt_REJECT nf_reject_ipv4 xt_physdev xt_addrtype xt_multiport xt_conntrack xt_set xt_tcpudp xt_comment xt_mark ip_set_hash_net ip_set iptable_filter iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter softdog nfnetlink_log nfnetlink dm_crypt algif_skcipher af_alg intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd vhost_net btusb glue_helper vhost btrtl tap rapl btbcm drm_vram_helper btintel intel_cstate ttm pcspkr bluetooth drm_kms_helper vendor_reset(O) ipmi_ssif ecdh_generic joydev input_leds ecc hid_magicmouse drm i2c_algo_bit fb_sys
_fops syscopyarea sysfillrect sysimgblt mei_me mei ioatdma dca mac_hid ipmi_si ipmi_devintf ipmi_msghandler nct6775 hwmon_vid
Dec 16 15:37:15 proxmox kernel: [ 8405.217041] RIP: 0010:__kmalloc_node+0x19d/0x330
Dec 16 15:37:15 proxmox kernel: [ 8405.240771] RSP: 0018:ffffb65653833b90 EFLAGS: 00010202
Dec 16 15:37:15 proxmox kernel: [ 8405.264982] R10: 3b053d0a7e892d04 R11: ffffb655c0000000 R12: 0000000000000dc0
Dec 16 15:37:15 proxmox kernel: [ 8405.277079] FS:  00007fc0af2df740(0000) GS:ffff94c7dfb00000(0000) knlGS:0000000000000000
Dec 16 15:37:15 proxmox kernel: [ 8405.288781] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 16 15:37:15 proxmox pve-firewall[7646]: status update error: ebtables_restore_cmdlist: got signal 11
Dec 16 15:37:17 proxmox pvedaemon[7702]: <root@pam> successful auth for user 'root@pam'
Dec 16 15:37:17 proxmox kernel: [ 8406.806612] kvm [30317]: vcpu29, guest rIP: 0xffffff800d05dd75 ignored rdmsr: 0x3f9
Dec 16 15:37:17 proxmox kernel: [ 8406.829684] kvm [30317]: vcpu29, guest rIP: 0xffffff800d05dd92 ignored rdmsr: 0x630
Dec 16 17:25:06 proxmox systemd-modules-load[1409]: Inserted module 'vfio'
Dec 16 17:25:06 proxmox kernel: [    0.000000] microcode: microcode updated early to revision 0x718, date = 2019-05-21

It's really interesting to me that the crash was identical both times.

pveversion: pve-manager/6.3-3/eee5f901 (running kernel: 5.4.78-2-pve)
 
Before this started happening, I ran Proxmox updates and rebooted the host, which included an update from pve-kernel-5.4.78-1-pve to pve-kernel-5.4.78-2-pve. After completing that I was trying to upgrade my MacPorts packages inside my Big Sur VM, but after about 20 minutes the host would die.
So, it works with "pve-kernel-5.4.78-1-pve" for sure? Can you tried to boot into that one and try that? Because 5.4.78-2-pve differs from 5.4.78-1-pve in only two patches, none of those get even close to change network or firewall related code, so that would be a bit surprising to me.

Did any other package updates come in with that problematic upgrade? You could check /var/log/apt/history.log


What iptables and ebtables version are in use, can you post the output of:
Bash:
iptables -V
ebtables -V
 
>So, it works with "pve-kernel-5.4.78-1-pve" for sure?

No, I haven't positively identified that yet. But I haven't encountered this fault before today. As you say, the changes between -1 and -2 look small and totally irrelevant.

I'm currently changing one variable at a time and seeing if I can re-trigger the fault. Currently I've unloaded the third-party "vendor-reset" kernel module I had previously installed and loaded (on the 4th of December), and successfully completed one complete MacPorts package rebuild in my macOS VM, and the fault wasn't triggered again, still on 5.4.78-2-pve. If I can repeat this a couple of times successfully I'll try putting vendor-reset back and seeing if the fault comes back.

The total updates I've installed today amount to this:

Install: linux-headers-4.19.0-13-amd64:amd64 (4.19.160-2, automatic), pve-kernel-5.4.78-2-pve:amd64 (5.4.78-2, automatic), linux-headers-4.19.0-13-common:amd64 (4.19.160-2, automatic), pve-headers-5.4.78-2-pve:amd64 (5.4.78-2, automatic)
Upgrade: pve-kernel-5.4:amd64 (6.3-2, 6.3-3), libefivar1:amd64 (37-2, 37-2+deb10u1), libcups2:amd64 (2.2.10-6+deb10u3, 2.2.10-6+deb10u4), linux-libc-dev:amd64 (4.19.152-1, 4.19.160-2), libapt-inst2.0:amd64 (1.8.2.1, 1.8.2.2), libpve-storage-perl:amd64 (6.3-2, 6.3-3), openssl:amd64 (1.1.1d-0+deb10u3, 1.1.1d-0+deb10u4), libjpeg-dev:amd64 (1:1.5.2-2, 1:1.5.2-2+deb10u1), libsystemd0:amd64 (241-7~deb10u4, 241-7~deb10u5), apt:amd64 (1.8.2.1, 1.8.2.2), mariadb-common:amd64 (1:10.3.25-0+deb10u1, 1:10.3.27-0+deb10u1), libcups2-dev:amd64 (2.2.10-6+deb10u3, 2.2.10-6+deb10u4), tcpdump:amd64 (4.9.3-1~deb10u1, 4.9.3-1~deb10u2), libsqlite3-0:amd64 (3.27.2-3, 3.27.2-3+deb10u1), libcupsimage2-dev:amd64 (2.2.10-6+deb10u3, 2.2.10-6+deb10u4), libefiboot1:amd64 (37-2, 37-2+deb10u1), python-apt-common:amd64 (1.8.4.1, 1.8.4.2), libjpeg62-turbo-dev:amd64 (1:1.5.2-2+b1, 1:1.5.2-2+deb10u1), libproxmox-acme-perl:amd64 (1.0.5, 1.0.7), udev:amd64 (241-7~deb10u4, 241-7~deb10u5), linux-compiler-gcc-8-x86:amd64 (4.19.152-1, 4.19.160-2), pve-container:amd64 (3.3-1, 3.3-2), proxmox-backup-client:amd64 (1.0.5-1, 1.0.6-1), libcpupower1:amd64 (4.19.152-1, 4.19.160-2), libapt-pkg5.0:amd64 (1.8.2.1, 1.8.2.2), libudev1:amd64 (241-7~deb10u4, 241-7~deb10u5), sqlite3:amd64 (3.27.2-3, 3.27.2-3+deb10u1), pve-manager:amd64 (6.3-2, 6.3-3), libimobiledevice6:amd64 (1.2.1~git20181030.92c5462-2, 1.2.1~
git20181030.92c5462-2+deb10u1), systemd-sysv:amd64 (241-7~deb10u4, 241-7~deb10u5), libxml2-dev:amd64 (2.9.4+dfsg1-7+b3, 2.9.4+dfsg1-7+deb10u1), libpve-common-perl:amd64 (6.3-1, 6.3-2), python-apt:amd64 (1.8.4.1, 1.8.4.2), libpam-systemd:amd64 (241-7~deb10u4, 241-7~deb10u5), distro-info-data:amd64 (0.41+deb10u2, 0.41+deb10u3), linux-headers-amd64:amd64 (4.19+105+deb10u7, 4.19+105+deb10u8), systemd:amd64 (241-7~deb10u4, 241-7~deb10u5), qemu-server:amd64 (6.3-1, 6.3-2), libssl-dev:amd64 (1.1.1d-0+deb10u3, 1.1.1d-0+deb10u4), libssl-doc:amd64 (1.1.1d-0+deb10u3, 1.1.1d-0+deb10u4), apt-utils:amd64 (1.8.2.1, 1.8.2.2), libnss-systemd:amd64 (241-7~deb10u4, 241-7~deb10u5), pve-headers-5.4:amd64 (6.3-2, 6.3-3), pve-kernel-helper:amd64 (6.3-2, 6.3-3), libxml2:amd64 (2.9.4+dfsg1-7+b3, 2.9.4+dfsg1-7+deb10u1), libmariadb3:amd64 (1:10.3.25-0+deb10u1, 1:10.3.27-0+deb10u1), libpve-http-server-perl:amd64 (3.0-6, 3.1-1), linux-kbuild-4.19:amd64 (4.19.152-1, 4.19.160-2), apt-transport-https:amd64 (1.8.2.1, 1.8.2.2), libssl1.1:amd64 (1.1.1d-0+deb10u3, 1.1.1d-0+deb10u4), libcupsimage2:amd64 (2.2.10-6+deb10u3, 2.2.10-6+deb10u4), libpve-apiclient-perl:amd64 (3.1-1, 3.1-3), libjpeg62-turbo:amd64 (1:1.5.2-2+b1, 1:1.5.2-2+deb10u1), python3-apt:amd64 (1.8.4.1, 1.8.4.2), linux-cpupower:amd64 (4.19.152-1, 4.19.160-2), base-files:amd64 (10.3+deb10u6, 10.3+deb10u7)
End-Date: 2020-12-16 12:23:56

My iptables and ebtables versions:

- iptables v1.8.2 (legacy)
- ebtables v2.0.10.4 (legacy) (December 2011)
 
Last edited:
I've now had the system crash again while the VM was just idling, I believe, this time without the vendor-reset module loaded. The crash stack trace still contains setsockopt:

Code:
Dec 17 02:59:54 proxmox kernel: [31419.688435] kvm [22443]: vcpu0, guest rIP: 0xffffff800965dd92 ignored rdmsr: 0x621
Dec 17 03:02:43 proxmox kernel: [31588.059561] R13: 0000000000000008 R14: 00000000ffffffff R15: ffff98ee1f407b80
Dec 17 03:02:43 proxmox kernel: [31588.087910] Call Trace:
Dec 17 03:02:43 proxmox kernel: [31588.104202]  vmalloc+0x4c/0x50
Dec 17 03:02:43 proxmox kernel: [31588.119696]  do_replace_finish+0x232/0x730 [ebtables]
Dec 17 03:02:43 proxmox kernel: [31588.134414]  do_replace+0x15f/0x1e0 [ebtables]
Dec 17 03:02:43 proxmox kernel: [31588.148505]  ip_setsockopt+0x66/0x90
Dec 17 03:02:43 proxmox kernel: [31588.161529]  __sys_setsockopt+0xcc/0x180
Dec 17 03:02:43 proxmox kernel: [31588.173857]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Dec 17 03:02:43 proxmox kernel: [31588.193991] RSP: 002b:00007fff43f6f548 EFLAGS: 00000206 ORIG_RAX: 0000000000000036
Dec 17 03:02:43 proxmox kernel: [31588.210842] RBP: 0000560bdfa1b150 R08: 0000000000000e38 R09: 0000560bdfa1b1d0
Dec 17 03:02:43 proxmox kernel: [31588.223028] Modules linked in: veth 8021q garp mrp ebtable_filter ebtables ip6table_raw ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables iptable_raw xt_mac xt_NFLOG xt_limit ipt_REJECT nf_reject_ipv4 xt_physdev xt_addrtype xt_multiport xt_conntrack xt_set xt_tcpudp xt_comment xt_mark ip_set_hash_net ip_set iptable_filter iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter softdog nfnetlink_log nfnetlink dm_crypt algif_skcipher af_alg intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper btusb btrtl btbcm drm_vram_helper btintel rapl ttm drm_kms_helper intel_cstate bluetooth pcspkr ecdh_generic joydev input_leds ecc hid_magicmouse drm i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt mei_me mei ioatdma dca ipmi_ssif vhost_net vhost tap mac_hid sunrpc ipmi_si ipmi_devintf ipmi_msghandler nct6775 hwmon_vid coretemp
Dec 17 03:02:43 proxmox kernel: [31588.518646] RBP: ffffb7805704fbd0 R08: ffff98ee1f8f0040 R09: ffff98ee1f407b80
Dec 17 03:02:43 proxmox kernel: [31588.524785] R10: d37e8f1a2aea7ba8 R11: ffffb78000000000 R12: 0000000000000dc0
Dec 17 03:02:43 proxmox kernel: [31588.530794] R13: 0000000000000008 R14: 00000000ffffffff R15: ffff98ee1f407b80
Dec 17 03:02:43 proxmox kernel: [31588.536742] FS:  00007f62ee13b740(0000) GS:ffff98ee1f8c0000(0000) knlGS:0000000000000000
Dec 17 03:02:43 proxmox kernel: [31588.554183] CR2: 0000560bdfa1b018 CR3: 00000001f95e4004 CR4: 00000000000626e0
Dec 17 04:31:06 proxmox kernel: [    0.000000] microcode: microcode updated early to revision 0x718, date = 2019-05-21

I'll try reverting to pve-kernel-5.4.78-1-pve now.
 
OK, thanks for confirming that, makes it all the weirder for me - I'll see if I can spot a possibility of some correlation for this issue in the two patches applied between that one and the next version..
 
On the hypothesis that the kernel memory pool is being corrupted and this is what triggers a death in vmalloc, I'll see if I can figure out a debug build of -2 with memory bounds checking enabled.
 
Argh, my machine died overnight while running 5.4.78-1-pve, so my crashes definitely weren't caused by the transition between 5.4.78-1-pve and 5.4.78-2-pve. This time it didn't manage to write the crash log to disk and I could only see the tail end of it on the monitor. It didn't look the same as the earlier crashes, but it could have been a subsequent crash. I guess I have more debugging to do.
 
I'm now running 5.4.78-2-pve recompiled to enable the KASAN kernel address sanitiser, so fingers crossed that it crashes again for me now!
 
I had been running crash-free with KASAN enabled for nearly a month. But I just got my first KASAN error detection (a double-free) caused by my Magic Trackpad 2 being "unplugged" when the host USB controller it was connected to was detached to be attached to a VM. It looks like that's a reported bug in the kernel handling of the Magic Trackpad 2 specifically:

https://bugzilla.kernel.org/show_bug.cgi?id=210241

I'm not sure if this was what was triggering my crashes in December, but it's very probable because I can see from Proxmox's kern.log that the first time the trackpad was seen by Proxmox on USB was on Dec 16, the date of my first crashes (it's usually connected wirelessly and the host doesn't see it).

I think this is solved now by adding "blacklist hid-magicmouse" to modprobe.d.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!