Kernel panic caused by skbuff

paul20

Member
May 30, 2018
28
1
8
34
Every now and then my server kernel panic so I set up netconsole to see what's the culprit and it seems to be related to firewall(?)
Any idea how to fix that?

Here are the logs:
Code:
[494221.511181] skbuff: skb_under_panic: text:00000000f0c6c75f len:74 put:14 head:000000001bebe299 data:000000007de59a6c tail:0x3c end:0xc0 dev:fwln7270i0
[494221.511181] skbuff: skb_under_panic: text:00000000f0c6c75f len:74 put:14 head:000000001bebe299 data:000000007de59a6c tail:0x3c end:0xc0 dev:fwln7270i0
[494221.511366] ------------[ cut here ]------------
[494221.511366] ------------[ cut here ]------------
[494221.511475] kernel BUG at net/core/skbuff.c:108!
[494221.511475] kernel BUG at net/core/skbuff.c:108!
[494221.511589] invalid opcode: 0000 [#1] SMP NOPTI
[494221.511589] invalid opcode: 0000 [#1] SMP NOPTI
[494221.511700] CPU: 11 PID: 0 Comm: swapper/11 Kdump: loaded Not tainted 5.4.128-1-pve #1
[494221.511700] CPU: 11 PID: 0 Comm: swapper/11 Kdump: loaded Not tainted 5.4.128-1-pve #1
[494221.511972] RIP: 0010:skb_panic+0x4c/0x4e
[494221.511972] RIP: 0010:skb_panic+0x4c/0x4e
[494221.512077] Code: 4f 70 50 8b 87 bc 00 00 00 50 8b 87 b8 00 00 00 50 ff b7 c8 00 00 00 4c 8b 8f c0 00 00 00 48 c7 c7 f8 25 83 8a e8 0f ba fb ff <0f> 0b 48 8b 55 08 48 c7 c1 a0 6d 55 8a e8 a2 ff ff ff 48 c7 c6 e0
[494221.512077] Code: 4f 70 50 8b 87 bc 00 00 00 50 8b 87 b8 00 00 00 50 ff b7 c8 00 00 00 4c 8b 8f c0 00 00 00 48 c7 c7 f8 25 83 8a e8 0f ba fb ff <0f> 0b 48 8b 55 08 48 c7 c1 a0 6d 55 8a e8 a2 ff ff ff 48 c7 c6 e0
[494221.512295] RSP: 0018:ffffa5ca40464780 EFLAGS: 00010246
[494221.512295] RSP: 0018:ffffa5ca40464780 EFLAGS: 00010246
[494221.512404] RAX: 000000000000008a RBX: ffff9ac6625b4858 RCX: 0000000000000000
[494221.512404] RAX: 000000000000008a RBX: ffff9ac6625b4858 RCX: 0000000000000000
[494221.512548] RDX: 0000000000000000 RSI: ffff9acc8ead78c8 RDI: ffff9acc8ead78c8
[494221.512548] RDX: 0000000000000000 RSI: ffff9acc8ead78c8 RDI: ffff9acc8ead78c8
[494221.512692] RBP: ffffa5ca404647a0 R08: 000000000000066c R09: 0000000000000a20
[494221.512692] RBP: ffffa5ca404647a0 R08: 000000000000066c R09: 0000000000000a20
[494221.512835] R10: 00000000000000c5 R11: 0000000000000000 R12: 000000000000003c
[494221.512835] R10: 00000000000000c5 R11: 0000000000000000 R12: 000000000000003c
[494221.512979] R13: ffff9acae7042000 R14: 00000000000086dd R15: ffff9ac6625b485e
[494221.512979] R13: ffff9acae7042000 R14: 00000000000086dd R15: ffff9ac6625b485e
[494221.513123] FS:  0000000000000000(0000) GS:ffff9acc8eac0000(0000) knlGS:0000000000000000
[494221.513123] FS:  0000000000000000(0000) GS:ffff9acc8eac0000(0000) knlGS:0000000000000000
[494221.513270] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[494221.513270] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[494221.513381] CR2: 0000008b9b930020 CR3: 0000001dde30e000 CR4: 0000000000340ee0
[494221.513381] CR2: 0000008b9b930020 CR3: 0000001dde30e000 CR4: 0000000000340ee0
[494221.513524] Call Trace:
[494221.513524] Call Trace:
[494221.513624]  <IRQ>
[494221.513624]  <IRQ>
[494221.513724]  skb_push.cold.95+0x10/0x10
[494221.513724]  skb_push.cold.95+0x10/0x10
[494221.513830]  eth_header+0x2b/0xc0
[494221.513830]  eth_header+0x2b/0xc0
[494221.513936]  nf_send_reset6+0x2b3/0xb40 [nf_reject_ipv6]
[494221.513936]  nf_send_reset6+0x2b3/0xb40 [nf_reject_ipv6]
[494221.514048]  reject_tg6+0x91/0xd0 [ip6t_REJECT]
[494221.514048]  reject_tg6+0x91/0xd0 [ip6t_REJECT]
[494221.514156]  ip6t_do_table+0x2d8/0x6b0 [ip6_tables]
[494221.514156]  ip6t_do_table+0x2d8/0x6b0 [ip6_tables]
[494221.514266]  ip6table_filter_hook+0x1f/0x30 [ip6table_filter]
[494221.514266]  ip6table_filter_hook+0x1f/0x30 [ip6table_filter]
[494221.514379]  nf_hook_slow+0x49/0xd0
[494221.514379]  nf_hook_slow+0x49/0xd0
[494221.514484]  br_nf_forward_ip+0x20a/0x450
[494221.514484]  br_nf_forward_ip+0x20a/0x450
[494221.514589]  ? br_nf_hook_thresh+0xf0/0xf0
[494221.514589]  ? br_nf_hook_thresh+0xf0/0xf0
[494221.514695]  nf_hook_slow+0x49/0xd0
[494221.514695]  nf_hook_slow+0x49/0xd0
[494221.514800]  __br_forward+0xc2/0x1e0
[494221.514800]  __br_forward+0xc2/0x1e0
[494221.514904]  ? br_dev_queue_push_xmit+0x1a0/0x1a0
[494221.514904]  ? br_dev_queue_push_xmit+0x1a0/0x1a0
[494221.515012]  br_forward+0xcc/0xe0
[494221.515012]  br_forward+0xcc/0xe0
[494221.515114]  br_handle_frame_finish+0x2d5/0x440
[494221.515114]  br_handle_frame_finish+0x2d5/0x440
[494221.515222]  ? br_pass_frame_up+0x150/0x150
[494221.515222]  ? br_pass_frame_up+0x150/0x150
[494221.515329]  br_nf_hook_thresh+0xda/0xf0
[494221.515329]  br_nf_hook_thresh+0xda/0xf0
[494221.515434]  ? br_pass_frame_up+0x150/0x150
[494221.515434]  ? br_pass_frame_up+0x150/0x150
[494221.515540]  br_nf_pre_routing_finish_ipv6+0x140/0x200
[494221.515540]  br_nf_pre_routing_finish_ipv6+0x140/0x200
[494221.515649]  ? br_pass_frame_up+0x150/0x150
[494221.515649]  ? br_pass_frame_up+0x150/0x150
[494221.515755]  ? nf_hook_slow+0x49/0xd0
[494221.515755]  ? nf_hook_slow+0x49/0xd0
[494221.515858]  br_nf_pre_routing_ipv6+0x174/0x190
[494221.515858]  br_nf_pre_routing_ipv6+0x174/0x190
[494221.515966]  ? br_nf_pre_routing+0x4e0/0x4e0
[494221.515966]  ? br_nf_pre_routing+0x4e0/0x4e0
[494221.516072]  br_nf_pre_routing+0x3c4/0x4e0
[494221.516072]  br_nf_pre_routing+0x3c4/0x4e0
[494221.516180]  ? ip6_rcv_finish_core.isra.22+0xc0/0xc0
[494221.516180]  ? ip6_rcv_finish_core.isra.22+0xc0/0xc0
[494221.516289]  br_handle_frame+0x1c4/0x370
[494221.516289]  br_handle_frame+0x1c4/0x370
[494221.516393]  ? br_pass_frame_up+0x150/0x150
[494221.516393]  ? br_pass_frame_up+0x150/0x150
[494221.516500]  __netif_receive_skb_core+0x3a0/0xc50
[494221.516500]  __netif_receive_skb_core+0x3a0/0xc50
[494221.516609]  ? netif_receive_skb_list_internal+0x188/0x290
[494221.516609]  ? netif_receive_skb_list_internal+0x188/0x290
[494221.516719]  __netif_receive_skb_one_core+0x3e/0xa0
[494221.516719]  __netif_receive_skb_one_core+0x3e/0xa0
[494221.516827]  __netif_receive_skb+0x18/0x60
[494221.516827]  __netif_receive_skb+0x18/0x60
[494221.516933]  process_backlog+0xb8/0x170
[494221.516933]  process_backlog+0xb8/0x170
[494221.517037]  net_rx_action+0x138/0x380
[494221.517037]  net_rx_action+0x138/0x380
[494221.517143]  __do_softirq+0xdc/0x2d4
[494221.517143]  __do_softirq+0xdc/0x2d4
[494221.517248]  irq_exit+0xa9/0xb0
[494221.517248]  irq_exit+0xa9/0xb0
[494221.517350]  do_IRQ+0x59/0xe0
[494221.517350]  do_IRQ+0x59/0xe0
[494221.517452]  common_interrupt+0xf/0xf
[494221.517452]  common_interrupt+0xf/0xf
[494221.517555]  </IRQ>
[494221.517555]  </IRQ>
[494221.517655] RIP: 0010:cpuidle_enter_state+0xbd/0x450
[494221.517655] RIP: 0010:cpuidle_enter_state+0xbd/0x450
[494221.517763] Code: ff e8 47 66 88 ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 63 03 00 00 31 ff e8 7a 72 8e ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 8d 02 00 00 49 63 cd 48 8b 75 d0 48 2b 75 c8 48 8d
[494221.517763] Code: ff e8 47 66 88 ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 63 03 00 00 31 ff e8 7a 72 8e ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 8d 02 00 00 49 63 cd 48 8b 75 d0 48 2b 75 c8 48 8d
[494221.517976] RSP: 0018:ffffa5ca4017fe48 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdb
[494221.517976] RSP: 0018:ffffa5ca4017fe48 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffdb
[494221.518122] RAX: ffff9acc8eaeae00 RBX: ffffffff8ab66a00 RCX: 000000000000001f
[494221.518122] RAX: ffff9acc8eaeae00 RBX: ffffffff8ab66a00 RCX: 000000000000001f
[494221.518265] RDX: 0001c17de99c46d2 RSI: 00000000295e2e4e RDI: 0000000000000000
[494221.518265] RDX: 0001c17de99c46d2 RSI: 00000000295e2e4e RDI: 0000000000000000
[494221.518408] RBP: ffffa5ca4017fe88 R08: 0000000000000002 R09: 000000000002a680
[494221.518408] RBP: ffffa5ca4017fe88 R08: 0000000000000002 R09: 000000000002a680
[494221.518552] R10: 00056efffe2a4e29 R11: ffff9acc8eae9aa0 R12: ffff9acc791d0c00
[494221.518552] R10: 00056efffe2a4e29 R11: ffff9acc8eae9aa0 R12: ffff9acc791d0c00
[494221.518696] R13: 0000000000000002 R14: ffffffff8ab66ad8 R15: ffffffff8ab66ac0
[494221.518696] R13: 0000000000000002 R14: ffffffff8ab66ad8 R15: ffffffff8ab66ac0
[494221.518843]  ? cpuidle_enter_state+0x99/0x450
[494221.518843]  ? cpuidle_enter_state+0x99/0x450
[494221.518953]  cpuidle_enter+0x2e/0x40
[494221.518953]  cpuidle_enter+0x2e/0x40
[494221.519059]  call_cpuidle+0x23/0x40
[494221.519059]  call_cpuidle+0x23/0x40
[494221.519165]  do_idle+0x22c/0x270
[494221.519165]  do_idle+0x22c/0x270
[494221.519270]  cpu_startup_entry+0x1d/0x20
[494221.519270]  cpu_startup_entry+0x1d/0x20
[494221.519378]  start_secondary+0x166/0x1c0
[494221.519378]  start_secondary+0x166/0x1c0
[494221.519487]  secondary_startup_64+0xa4/0xb0
[494221.519487]  secondary_startup_64+0xa4/0xb0

Pve version:
proxmox-ve: 6.4-1 (running kernel: 5.4.128-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-5.4: 6.4-5
pve-kernel-helper: 6.4-5
pve-kernel-5.4.128-1-pve: 5.4.128-1
pve-kernel-5.4.119-1-pve: 5.4.119-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksmtuned: 4.20150325+b1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-3
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.12-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.2-4
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
 
I also experience kernel crashes since I migrated to Proxmox 7. 3 times in 2 weeks. No changes to hardware compared to Proxmox 4. Just a fresh installation. memtest already ran. HP iLO shows good system health. Unfortunately, my system is then inaccessible and does not write any corresponding logs to the disk any more. So I've attached a screenshot of the iLO console. I suspect the problem is related to this thread

https://www.spinics.net/lists/stable/msg474252.html

This commit:

https://git.kernel.org/pub/scm/linu.../?id=5c37711d9f27bdc83fd5980446be7f4aa2106230

However, based on the changelog of the PVE kernel, I was not yet able to determine whether the fix already exists there or not. If there is anything I can do to further investigate the problem, feel free to contact me.

PVE version:

proxmox-ve: 7.0-2 (running kernel: 5.11.22-4-pve)
pve-manager: 7.0-11 (running version: 7.0-11/63d82f4e)
pve-kernel-5.11: 7.0-7
pve-kernel-helper: 7.0-7
pve-kernel-5.11.22-4-pve: 5.11.22-8
pve-kernel-5.11.22-3-pve: 5.11.22-7
ceph-fuse: 14.2.21-1
corosync: 3.1.2-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
libjs-extjs: 7.0.0-1
libknet1: 1.21-pve1
libproxmox-acme-perl: 1.3.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-6
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-2
libpve-storage-perl: 7.0-10
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.9-2
proxmox-backup-file-restore: 2.0.9-2
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.0-9
pve-docs: 7.0-5
pve-edk2-firmware: 3.20200531-1
pve-firewall: 4.2-2
pve-firmware: 3.3-1
pve-ha-manager: 3.3-1
pve-i18n: 2.4-1
pve-qemu-kvm: 6.0.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-13
smartmontools: 7.2-1
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1
 

Attachments

  • pve-crash.png
    pve-crash.png
    56.7 KB · Views: 11
I suspect the problem is related to this thread

https://www.spinics.net/lists/stable/msg474252.html

But the thread refers to another commit as you linked, i.e. this one:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=fb32856b16ad

(see below)

However, based on the changelog of the PVE kernel, I was not yet able to determine whether the fix already exists there or not. If there is anything I can do to further investigate the problem, feel free to contact me.

The commit I linked is a potential bad one, while the one you linked fixes the bad one, but the first one isn't in our 5.11 git tree, so the second one makes not much sense (and wouldn't even apply).
Or do you suggest that both would be required? which would seem a bit odd to me, but I did not checked that close yet.
 
Last edited:
Many thanks for the answer. I have to admit that I don't really know anything about kernel development. But I also didn't want to just write an entry in the forum without having done at least a little research. I'm sorry if that caused confusion. In any case, I use virtio drivers in all VMs. And it's a problem that I understand has only existed since kernel 5.x. I had no stability problems with Proxmox 4. Up to the reinstallation 322 days uptime. I haven't changed anything on the hardware and the dump looks similar. I'm looking for a solution to get my Proxmox 7 stable as best I can.
 
No worries, it's much appreciated getting some related pointers to kernel mailing list discussion or even commits directly, I mean those do not sound too far off, just seems to be that there should be more to it, i.e. something older that caused the regression in the 5.x (or whenever) series in the first place.

With VirtIO the guest kernel has also some say, so the issue here could be at least correlated to what runs in the guest(s).
Can you please give us some details regarding that? OS, Linux Kernel version (if applicable).

I'm a bit wondering, if Firewall + VirtIO net is the issue with 5.4 or 5.11 kernel, like this thread suggests, it would be reported much more frequently (and our production workloads would notice that definitively too), so I think there's some additional thing required for that issue to show up, and it'd be good to find out what that could be to find the actual regression.
 
No worries, it's much appreciated getting some related pointers to kernel mailing list discussion or even commits directly, I mean those do not sound too far off, just seems to be that there should be more to it, i.e. something older that caused the regression in the 5.x (or whenever) series in the first place.

With VirtIO the guest kernel has also some say, so the issue here could be at least correlated to what runs in the guest(s).
Can you please give us some details regarding that? OS, Linux Kernel version (if applicable).

I'm a bit wondering, if Firewall + VirtIO net is the issue with 5.4 or 5.11 kernel, like this thread suggests, it would be reported much more frequently (and our production workloads would notice that definitively too), so I think there's some additional thing required for that issue to show up, and it'd be good to find out what that could be to find the actual regression.
It seems to happen when I add a firewall rule and action "REJECT", replacing the action to "DROP" and I no longer seem to get kernel panic.
 
Unfortunately the problem still occurs. In the meantime there was a few months of silence, but now the server has stopped several times within 10 days. So I set up a netconsole and found the following dump:

Code:
[451923.338181] skbuff: skb_under_panic: text:ffffffffafcc27db len:74 put:14 head:ffff8b796a883000 data:ffff8b796a882ff2 tail:0x3c end:0xc0 dev:fwln101i0
[451923.347939] ------------[ cut here ]------------
[451923.352922] kernel BUG at net/core/skbuff.c:112!
[451923.357871] invalid opcode: 0000 [#1] SMP PTI
[451923.362774] CPU: 17 PID: 0 Comm: swapper/17 Tainted: P           O      5.15.30-2-pve #1
[451923.367283] Hardware name: HP ProLiant DL380 G7, BIOS P67 05/21/2018
[451923.373022] RIP: 0010:skb_panic+0x4c/0x4e
[451923.378215] Code: 4f 70 50 8b 87 bc 00 00 00 50 8b 87 b8 00 00 00 50 ff b7 c8 00 00 00 4c 8b 8f c0 00 00 00 48 c7 c7 28 fb 89 b0 e8 f1 a2 f7 ff <0f> 0b 48 8b 55 08 48 c7 c1 c0 75 52 b0 e8 a2 ff ff ff 48 8b 55 08
[451923.389524] RSP: 0018:ffffa1f64674c718 EFLAGS: 00010246
[451923.395219] RAX: 0000000000000089 RBX: ffff8b796a88045a RCX: 0000000000000000
[451923.401004] RDX: 0000000000000000 RSI: ffff8b7dafc20980 RDI: 0000000000000300
[451923.406817] RBP: ffffa1f64674c738 R08: 0000000000000003 R09: 0000000000000001
[451923.412711] R10: ffff8b66d1e08000 R11: 0000000000000402 R12: 000000000000003c
[451923.418629] R13: ffff8b66c4b15000 R14: 00000000000086dd R15: ffff8b796a880460
[451923.424496] FS:  0000000000000000(0000) GS:ffff8b7dafc00000(0000) knlGS:0000000000000000
[451923.430449] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[451923.436463] CR2: 00007ff6b371a3f8 CR3: 0000001592c10001 CR4: 00000000000226e0
[451923.442550] Call Trace:
[451923.448551]  <IRQ>
[451923.454526]  skb_push.cold+0x10/0x10
[451923.460476]  eth_header+0x2b/0xc0
[451923.466455]  nf_send_reset6+0x3a7/0x4b0 [nf_reject_ipv6]
[451923.472465]  reject_tg6+0xa3/0x100 [ip6t_REJECT]
[451923.478443]  ip6t_do_table+0x2e5/0x870 [ip6_tables]
[451923.484489]  ? __dev_queue_xmit+0x367/0xb30
[451923.490508]  ip6table_filter_hook+0x1a/0x20 [ip6table_filter]
[451923.496659]  nf_hook_slow+0x44/0xb0
[451923.502846]  br_nf_forward_ip+0x36e/0x4a0
[451923.509086]  ? br_nf_hook_thresh+0x110/0x110
[451923.515271]  nf_hook_slow+0x44/0xb0
[451923.521389]  __br_forward+0xd2/0x1e0
[451923.527540]  ? iov_iter_alignment+0x132/0x140
[451923.533674]  ? br_dev_queue_push_xmit+0x1a0/0x1a0
[451923.539789]  br_forward+0xf1/0x110
[451923.545927]  br_handle_frame_finish+0x25b/0x520
[451923.552046]  ? br_pass_frame_up+0x180/0x180
[451923.558194]  br_nf_hook_thresh+0xd9/0x110
[451923.564290]  ? nf_conntrack_in+0xf9/0x6f0 [nf_conntrack]
[451923.570392]  ? br_pass_frame_up+0x180/0x180
[451923.576512]  br_nf_pre_routing_finish_ipv6+0x146/0x210
[451923.582662]  ? br_pass_frame_up+0x180/0x180
[451923.588861]  ? nf_hook_slow+0x44/0xb0
[451923.595092]  br_nf_pre_routing_ipv6+0x124/0x1a0
[451923.601399]  ? br_nf_pre_routing+0x540/0x540
[451923.607542]  br_nf_pre_routing+0x36b/0x540
[451923.613708]  ? ip6_output+0x75/0x130
[451923.619634]  br_handle_frame+0x20d/0x3c0
[451923.625343]  ? ip6_forward+0x568/0x9f0
[451923.630998]  ? br_pass_frame_up+0x180/0x180
[451923.636654]  ? br_handle_frame_finish+0x520/0x520
[451923.642311]  __netif_receive_skb_core+0x232/0xee0
[451923.647906]  ? ipv6_rcv+0x154/0x160
[451923.653382]  __netif_receive_skb_one_core+0x3f/0xa0
[451923.658858]  __netif_receive_skb+0x15/0x60
[451923.664117]  process_backlog+0xa2/0x170
[451923.669195]  __napi_poll+0x33/0x180
[451923.674143]  net_rx_action+0x126/0x280
[451923.679014]  __do_softirq+0xd9/0x2e6
[451923.683889]  irq_exit_rcu+0x8c/0xb0
[451923.688703]  common_interrupt+0x8a/0xa0
[451923.693526]  </IRQ>
[451923.698285]  <TASK>
[451923.702887]  asm_common_interrupt+0x1e/0x40
[451923.707472] RIP: 0010:cpuidle_enter_state+0xd9/0x620
[451923.712030] Code: 3d 94 ae 43 50 e8 27 d8 71 ff 49 89 c7 0f 1f 44 00 00 31 ff e8 88 e4 71 ff 80 7d d0 00 0f 85 5a 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 f6 0f 88 66 01 00 00 4d 63 ee 49 83 fd 09 0f 87 e1 03 00 00
[451923.721812] RSP: 0018:ffffa1f646357e38 EFLAGS: 00000246
[451923.726836] RAX: ffff8b7dafc30f40 RBX: ffffc1f63fc00000 RCX: 0000000000000000
[451923.732004] RDX: 000000000000cd98 RSI: 000000002dba3071 RDI: 0000000000000000
[451923.737169] RBP: ffffa1f646357e88 R08: 00019b059a0911f6 R09: 00019b05069814c2
[451923.742350] R10: 00019b0505e0f9c2 R11: 071c71c71c71c71c R12: ffffffffb10d3700
[451923.747584] R13: 0000000000000004 R14: 0000000000000004 R15: 00019b059a0911f6
[451923.752844]  ? cpuidle_enter_state+0xc8/0x620
[451923.758060]  cpuidle_enter+0x2e/0x40
[451923.763213]  do_idle+0x209/0x2b0
[451923.768328]  cpu_startup_entry+0x20/0x30
[451923.773454]  start_secondary+0x12a/0x180
[451923.778574]  secondary_startup_64_no_verify+0xc2/0xcb
[451923.783773]  </TASK>
[451923.788907] Modules linked in: tcp_diag inet_diag veth nfsv3 nfs_acl nfs lockd grace fscache netfs ebtable_filter ebtables ip6table_raw ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables iptable_raw xt_mac ipt_REJECT nf_reject_ipv4 xt_mark xt_set xt_NFLOG xt_limit xt_physdev xt_addrtype xt_comment xt_tcpudp xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip_set_hash_net ip_set sctp ip6_udp_tunnel udp_tunnel nf_tables netconsole sit tunnel4 ip_tunnel bonding tls softdog nfnetlink_log nfnetlink intel_powerclamp radeon coretemp ipmi_ssif drm_ttm_helper kvm_intel ttm drm_kms_helper kvm cec rc_core i2c_algo_bit fb_sys_fops irqbypass syscopyarea sysfillrect sysimgblt crct10dif_pclmul ghash_clmulni_intel aesni_intel acpi_ipmi crypto_simd ipmi_si ipmi_devintf cryptd ipmi_msghandler serio_raw mac_hid hpilo intel_cstate pcspkr i7core_edac acpi_power_meter zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net
[451923.788977]  vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sunrpc drm ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear simplefb raid1 gpio_ich crc32_pclmul psmouse pata_acpi hpsa ehci_pci uhci_hcd lpc_ich ehci_hcd bnx2 scsi_transport_sas
[451923.869107] ---[ end trace ef8cfbab5b9e6d98 ]---
[451923.876775] RIP: 0010:skb_panic+0x4c/0x4e
[451923.884445] Code: 4f 70 50 8b 87 bc 00 00 00 50 8b 87 b8 00 00 00 50 ff b7 c8 00 00 00 4c 8b 8f c0 00 00 00 48 c7 c7 28 fb 89 b0 e8 f1 a2 f7 ff <0f> 0b 48 8b 55 08 48 c7 c1 c0 75 52 b0 e8 a2 ff ff ff 48 8b 55 08
[451923.900600] RSP: 0018:ffffa1f64674c718 EFLAGS: 00010246
[451923.907778] RAX: 0000000000000089 RBX: ffff8b796a88045a RCX: 0000000000000000
[451923.917076] RDX: 0000000000000000 RSI: ffff8b7dafc20980 RDI: 0000000000000300
[451923.925346] RBP: ffffa1f64674c738 R08: 0000000000000003 R09: 0000000000000001
[451923.933634] R10: ffff8b66d1e08000 R11: 0000000000000402 R12: 000000000000003c
[451923.941959] R13: ffff8b66c4b15000 R14: 00000000000086dd R15: ffff8b796a880460
[451923.950252] FS:  0000000000000000(0000) GS:ffff8b7dafc00000(0000) knlGS:0000000000000000
[451923.958673] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[451923.967116] CR2: 00007ff6b371a3f8 CR3: 0000001592c10001 CR4: 00000000000226e0
[451923.975685] Kernel panic - not syncing: Fatal exception in interrupt
[451923.984409] Kernel Offset: 0x2e200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[451923.993367] Rebooting in 10 seconds..

The mentioned interface "dev:fwln101i0" belongs to a VM that has virtio as NIC. The VM is a Debian 10.12 with a Linux 4.19.0-20-amd64 x86_64 kernel. The VM has IPv4 and IPv6 addresses and a firewall is set up in the Proxmox web interface, which has an incoming DROP policy and then various ports for IPv4 and IPv6 destination and sometimes source addresses are authorized. Outgoing policy is ACCEPT.

I have now tested 5.11, 5.13 and 5.15 PVE kernels and the problem occurs with all versions. Attached is my current PVE version with which the above crash dump was created.

Code:
# pveversion -v
proxmox-ve: 7.1-2 (running kernel: 5.15.30-2-pve)
pve-manager: 7.1-12 (running version: 7.1-12/b3c09de3)
pve-kernel-helper: 7.2-2
pve-kernel-5.15: 7.2-1
pve-kernel-5.13: 7.1-9
pve-kernel-5.11: 7.0-10
pve-kernel-5.15.30-2-pve: 5.15.30-3
pve-kernel-5.15.30-1-pve: 5.15.30-1
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.11.22-7-pve: 5.11.22-12
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-7
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-5
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.1-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.1.6-1
proxmox-backup-file-restore: 2.1.6-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-10
pve-cluster: 7.1-3
pve-container: 4.1-5
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-1
pve-ha-manager: 3.3-4
pve-i18n: 2.6-3
pve-qemu-kvm: 6.2.0-5
pve-xtermjs: 4.16.0-1
qemu-server: 7.1-5
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1

Are there any approaches other than turning off the firewalls on the VMs? And if that's not enough, also on the host systems. Actually, I would like to avoid that, because I think it's nice to have all the settings, including the firewall rules, in one interface. But of course stability would be more important.
 
Could it be related to IPv6 or dualstack? In both of the dumps I have, I can see a lot of references to IPv6 like:

Code:
[451923.472465]  reject_tg6+0xa3/0x100 [ip6t_REJECT]
[451923.478443]  ip6t_do_table+0x2e5/0x870 [ip6_tables]
[451923.490508]  ip6table_filter_hook+0x1a/0x20 [ip6table_filter]
[451923.613708]  ? ip6_output+0x75/0x130
[451923.625343]  ? ip6_forward+0x568/0x9f0
[451923.647906]  ? ipv6_rcv+0x154/0x160

And also the dump from @paul20 seems to have references to IPv6. @paul20: Are you using IPv6 or dualstack in the VMs and/or do you have IPv6 routed and firewalled on the proxmox nodes?
 
Last edited:
The problem has occurred again. I think it looks very similar. However, different interface to other VM. I have now switched off the firewalls on the VM and hope that this will make it more stable. Any ideas what else I could do? Here is the stack trace:

Code:
[362086.626721] skbuff: skb_under_panic: text:ffffffff918c27db len:74 put:14 head:ffff9a348e6e7a00 data:ffff9a348e6e79f2 tail:0x3c end:0xc0 dev:fwln106i0
[362086.637313] ------------[ cut here ]------------
[362086.642769] kernel BUG at net/core/skbuff.c:112!
[362086.648278] invalid opcode: 0000 [#1] SMP PTI
[362086.653710] CPU: 20 PID: 31182 Comm: vhost-30990 Tainted: P           O      5.15.30-2-pve #1
[362086.659298] Hardware name: HP ProLiant DL380 G7, BIOS P67 05/21/2018
[362086.664975] RIP: 0010:skb_panic+0x4c/0x4e
[362086.670707] Code: 4f 70 50 8b 87 bc 00 00 00 50 8b 87 b8 00 00 00 50 ff b7 c8 00 00 00 4c 8b 8f c0 00 00 00 48 c7 c7 28 fb 49 92 e8 f1 a2 f7 ff <0f> 0b 48 8b 55 08 48 c7 c1 c0 75 12 92 e8 a2 ff ff ff 48 8b 55 08
[362086.682985] RSP: 0018:ffffbf55067e8728 EFLAGS: 00010246
[362086.689154] RAX: 0000000000000089 RBX: ffff9a400a8fa85a RCX: 0000000000000000
[362086.695514] RDX: 0000000000000000 RSI: ffff9a3f2fca0980 RDI: ffff9a3f2fca0980
[362086.701851] RBP: ffffbf55067e8748 R08: 0000000000000003 R09: 0000000000000001
[362086.708214] R10: ffff9a3f6619c000 R11: 0000000000000c02 R12: 000000000000003c
[362086.714604] R13: ffff9a3444001000 R14: 00000000000086dd R15: ffff9a400a8fa860
[362086.720909] FS:  0000000000000000(0000) GS:ffff9a3f2fc80000(0000) knlGS:0000000000000000
[362086.727387] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[362086.733828] CR2: 00007ff537d275f2 CR3: 0000000c44074002 CR4: 00000000000226e0
[362086.740335] Call Trace:
[362086.746826]  <IRQ>
[362086.753244]  skb_push.cold+0x10/0x10
[362086.759740]  eth_header+0x2b/0xc0
[362086.766262]  nf_send_reset6+0x3a7/0x4b0 [nf_reject_ipv6]
[362086.772899]  reject_tg6+0xa3/0x100 [ip6t_REJECT]
[362086.779493]  ip6t_do_table+0x2e5/0x870 [ip6_tables]
[362086.786062]  ? ip6_protocol_deliver_rcu+0x436/0x520
[362086.792707]  ? ip6_input+0xb7/0xd0
[362086.799278]  ip6table_filter_hook+0x1a/0x20 [ip6table_filter]
[362086.805924]  nf_hook_slow+0x44/0xb0
[362086.812587]  br_nf_forward_ip+0x36e/0x4a0
[362086.819214]  ? unix_dgram_bpf_update_proto+0x42/0xd0
[362086.825936]  ? br_nf_hook_thresh+0x110/0x110
[362086.832609]  nf_hook_slow+0x44/0xb0
[362086.839210]  __br_forward+0xd2/0x1e0
[362086.845824]  ? iov_iter_alignment+0x132/0x140
[362086.852448]  ? br_dev_queue_push_xmit+0x1a0/0x1a0
[362086.859136]  br_forward+0xf1/0x110
[362086.865851]  br_handle_frame_finish+0x25b/0x520
[362086.872637]  ? br_pass_frame_up+0x180/0x180
[362086.879236]  br_nf_hook_thresh+0xd9/0x110
[362086.885853]  ? nf_conntrack_in+0xf9/0x6f0 [nf_conntrack]
[362086.892557]  ? br_pass_frame_up+0x180/0x180
[362086.899264]  br_nf_pre_routing_finish_ipv6+0x146/0x210
[362086.905854]  ? br_pass_frame_up+0x180/0x180
[362086.912351]  ? nf_hook_slow+0x44/0xb0
[362086.918722]  br_nf_pre_routing_ipv6+0x124/0x1a0
[362086.925037]  ? br_nf_pre_routing+0x540/0x540
[362086.931151]  br_nf_pre_routing+0x36b/0x540
[362086.937044]  ? update_cfs_group+0x9c/0xb0
[362086.942888]  br_handle_frame+0x20d/0x3c0
[362086.948710]  ? br_pass_frame_up+0x180/0x180
[362086.954510]  ? br_handle_frame_finish+0x520/0x520
[362086.960247]  __netif_receive_skb_core+0x232/0xee0
[362086.965886]  ? try_to_wake_up+0x214/0x5c0
[362086.971468]  __netif_receive_skb_one_core+0x3f/0xa0
[362086.976926]  __netif_receive_skb+0x15/0x60
[362086.982149]  process_backlog+0xa2/0x170
[362086.987249]  __napi_poll+0x33/0x180
[362086.992254]  net_rx_action+0x126/0x280
[362086.997240]  ? clockevents_program_event+0xa7/0x120
[362087.002208]  __do_softirq+0xd9/0x2e6
[362087.007172]  do_softirq+0x75/0xa0
[362087.012113]  </IRQ>
[362087.016894]  <TASK>
[362087.021538]  __local_bh_enable_ip+0x50/0x60
[362087.026176]  tun_sendmsg+0x294/0x610
[362087.030722]  vhost_tx_batch.constprop.0+0x68/0x1e0 [vhost_net]
[362087.035417]  handle_tx_copy+0x194/0x690 [vhost_net]
[362087.040149]  handle_tx+0xb0/0xe0 [vhost_net]
[362087.044883]  handle_tx_kick+0x15/0x20 [vhost_net]
[362087.049583]  vhost_worker+0x7e/0xc0 [vhost]
[362087.054245]  ? vhost_exceeds_weight+0x50/0x50 [vhost]
[362087.058941]  kthread+0x12a/0x150
[362087.063587]  ? set_kthread_struct+0x50/0x50
[362087.068253]  ret_from_fork+0x22/0x30
[362087.072862]  </TASK>
[362087.077374] Modules linked in: tcp_diag inet_diag veth nfsv3 nfs_acl nfs lockd grace fscache netfs ebtable_filter ebtables ip6table_raw ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables iptable_raw xt_mac ipt_REJECT nf_reject_ipv4 xt_mark xt_set xt_NFLOG xt_limit xt_physdev xt_addrtype xt_comment xt_tcpudp xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip_set_hash_net ip_set sctp ip6_udp_tunnel udp_tunnel nf_tables netconsole sit tunnel4 ip_tunnel bonding tls softdog nfnetlink_log nfnetlink intel_powerclamp radeon drm_ttm_helper ttm coretemp drm_kms_helper cec kvm_intel rc_core kvm irqbypass i2c_algo_bit fb_sys_fops crct10dif_pclmul syscopyarea ghash_clmulni_intel sysfillrect aesni_intel sysimgblt crypto_simd cryptd intel_cstate ipmi_ssif hpilo acpi_ipmi i7core_edac ipmi_si ipmi_devintf serio_raw pcspkr ipmi_msghandler acpi_power_meter mac_hid zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net
[362087.077447]  vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sunrpc drm ip_tables x_tables autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear simplefb raid1 gpio_ich crc32_pclmul psmouse pata_acpi lpc_ich hpsa ehci_pci uhci_hcd ehci_hcd bnx2 scsi_transport_sas
[362087.150482] ---[ end trace 11fa9f96729156c2 ]---
[362087.157592] RIP: 0010:skb_panic+0x4c/0x4e
[362087.164672] Code: 4f 70 50 8b 87 bc 00 00 00 50 8b 87 b8 00 00 00 50 ff b7 c8 00 00 00 4c 8b 8f c0 00 00 00 48 c7 c7 28 fb 49 92 e8 f1 a2 f7 ff <0f> 0b 48 8b 55 08 48 c7 c1 c0 75 12 92 e8 a2 ff ff ff 48 8b 55 08
[362087.179621] RSP: 0018:ffffbf55067e8728 EFLAGS: 00010246
[362087.187181] RAX: 0000000000000089 RBX: ffff9a400a8fa85a RCX: 0000000000000000
[362087.194845] RDX: 0000000000000000 RSI: ffff9a3f2fca0980 RDI: ffff9a3f2fca0980
[362087.202524] RBP: ffffbf55067e8748 R08: 0000000000000003 R09: 0000000000000001
[362087.210267] R10: ffff9a3f6619c000 R11: 0000000000000c02 R12: 000000000000003c
[362087.217982] R13: ffff9a3444001000 R14: 00000000000086dd R15: ffff9a400a8fa860
[362087.225711] FS:  0000000000000000(0000) GS:ffff9a3f2fc80000(0000) knlGS:0000000000000000
[362087.233574] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[362087.241509] CR2: 00007ff537d275f2 CR3: 0000000c44074002 CR4: 00000000000226e0
[362087.249488] Kernel panic - not syncing: Fatal exception in interrupt
[362087.257568] Kernel Offset: 0xfe00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[362087.265880] Rebooting in 10 seconds..
 
I did some more research but didn't really get any further. Any ideas how to proceed? At times I disabled the VM's firewalls and hope it's stable now.
 
This thread seems to be pretty old, but I ran into a similar issue on a host node, where I use IPv6 subnets from tunnel broker which are directly routed. The VMs have IPv6 only setup. When having a firewall enabled on a VM it ran some hours later into:

Mar 29 04:40:03 virt-node13 kernel: skbuff: skb_under_panic: text:ffffffff84ba465b len:74 put:14 head:ffff8ea491011900 data:ffff8ea4910118f2 tail:0x3c end:0x140 dev:ens18

As well as:
Mar 30 17:46:57 virt-node13 kernel: skbuff: skb_under_panic: text:ffffffffb81a465b len:74 put:14 head:ffff95c4115d3700 data:ffff95c4115d36f2 tail:0x3c end:0x140 dev:fwln115i0
Mar 30 17:46:57 virt-node13 kernel: ------------[ cut here ]------------
Mar 30 17:46:57 virt-node13 kernel: kernel BUG at net/core/skbuff.c:192!
PVE: 8.1.5
Kernel: 6.5.13-3-pve

Other nodes without activated host firewall do not have this issue. All nodes run firewall rulesets for addresses that are directly terminating on the nodes itself without any issues.
 
Last edited:
After only one of my multiple host nodes crashed again and again I could isolate the issue which caused it. Only this this node is running a slightly different and special configuration regarding IPv6 where multiple IPv6 networks are tunnelled by different Tunnelbrokers with different MTU sizes and also using jumbo frames.
 
Last edited:
Ok, I finally could make it reproducible and track it down.

In my case it’s related to a different MTU size on IPV6 (also dual stack) links to VMs which are using the e1000 network driver and firewalling is active in general (even only on host node). I’m trying to find more details on a different system by time, but in this case it is solvable by switching to the VirtIO driver (if there isn’t any specific reason you should use it always).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!