Unexplained neighbor table overflow causing CPU lockup followed by reset

May 3, 2022
6
0
1
I observed the following logs in the system journal:
Code:
Jan 19 10:44:01 argynvostholt kernel: neighbour: ndisc_cache: neighbor table overflow!
Jan 19 10:44:02 argynvostholt kernel: Route cache is full: consider increasing sysctl net.ipv6.route.max_size.

Which was then followed by:

Code:
Jan 19 10:44:23 argynvostholt kernel: watchdog: BUG: soft lockup - CPU#25 stuck for 22s! [swapper/25:0]
Jan 19 10:44:23 argynvostholt kernel: Modules linked in: ipt_REJECT nf_reject_ipv4 xt_multiport ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter nf_tables xt_MASQUERADE iptable_nat xt_REDIRECT nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_tcpudp bpfilter bonding tls softdog nfnetlink_log nfnetlink ipmi_ssif zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd ib_cm ib_core kvm_amd iscsi_tcp libiscsi_tcp libiscsi kvm scsi_transport_iscsi nct6775 crct10dif_pclmul ghash_clmulni_intel hwmon_vid aesni_intel crypto_simd cryptd snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi drm_vram_helper snd_hda_codec drm_ttm_helper ttm snd_hda_core snd_hwdep drm_kms_helper rapl snd_pcm cec snd_timer rc_core wmi_bmof snd cdc_ether fb_sys_fops pcspkr efi_pstore syscopyarea usbnet soundcore k10temp sysfillrect ccp joydev input_leds
Jan 19 10:44:23 argynvostholt kernel:  sysimgblt mii acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler mac_hid vfio_pci vfio_pci_core vfio_virqfd irqbypass vfio_iommu_type1 vfio vendor_reset(O) drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid0 multipath linear simplefb hid_generic usbmouse usbkbd dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c usbhid hid raid1 crc32_pclmul i2c_piix4 nvme xhci_pci xhci_pci_renesas nvme_core ahci xhci_hcd igb libahci i2c_algo_bit dca wmi gpio_amdpt gpio_generic
Jan 19 10:44:23 argynvostholt kernel: CPU: 25 PID: 0 Comm: swapper/25 Tainted: P           O      5.15.74-1-pve #1
Jan 19 10:44:23 argynvostholt kernel: Hardware name: To Be Filled By O.E.M. B550D4-4L/B550D4-4L, BIOS L1.29 06/13/2022
Jan 19 10:44:23 argynvostholt kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x1f5/0x240
Jan 19 10:44:23 argynvostholt kernel: Code: c5 40 19 03 00 49 81 fe ff 1f 00 00 77 49 4e 03 2c f5 e0 fa cb a5 4d 89 65 00 41 8b 44 24 08 85 c0 75 0b f3 90 41 8b 44 24 08 <85> c0 74 f5 49 8b 0c 24 48 85 c9 0f 84 47 ff ff ff 0f 0d 09 e9 3f
Jan 19 10:44:23 argynvostholt kernel: RSP: 0018:ffffbe41c077cb40 EFLAGS: 00000246
Jan 19 10:44:23 argynvostholt kernel: RAX: 0000000000000000 RBX: ffffffffa6c05400 RCX: 0000000000000009
Jan 19 10:44:23 argynvostholt kernel: RDX: 0000000000680000 RSI: 0000000000680000 RDI: ffffffffa6c05400
Jan 19 10:44:23 argynvostholt kernel: RBP: ffffbe41c077cb68 R08: 0000000000000000 R09: 0000000000000000
Jan 19 10:44:23 argynvostholt kernel: R10: 0000000000000020 R11: ffffffff80000000 R12: ffff959fbf071940
Jan 19 10:44:23 argynvostholt kernel: R13: ffff959fbec31940 R14: 0000000000000008 R15: 0000000000680000
Jan 19 10:44:23 argynvostholt kernel: FS:  0000000000000000(0000) GS:ffff959fbf040000(0000) knlGS:0000000000000000
Jan 19 10:44:23 argynvostholt kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 19 10:44:23 argynvostholt kernel: CR2: 000000c0008fd000 CR3: 000000010c590000 CR4: 0000000000750ee0
Jan 19 10:44:23 argynvostholt kernel: PKRU: 55555554
Jan 19 10:44:23 argynvostholt kernel: Call Trace:
Jan 19 10:44:23 argynvostholt kernel:  <IRQ>
Jan 19 10:44:23 argynvostholt kernel:  _raw_spin_lock_bh+0x2d/0x40
Jan 19 10:44:23 argynvostholt kernel:  fib6_run_gc+0x43/0x110
Jan 19 10:44:23 argynvostholt kernel:  ip6_dst_gc+0x95/0x160
Jan 19 10:44:23 argynvostholt kernel:  dst_alloc+0x126/0x170
Jan 19 10:44:23 argynvostholt kernel:  ip6_dst_alloc+0x27/0x90
Jan 19 10:44:23 argynvostholt kernel:  icmp6_dst_alloc+0x76/0x220
Jan 19 10:44:23 argynvostholt kernel:  ndisc_send_skb+0x96/0x380
Jan 19 10:44:23 argynvostholt kernel:  ? __kmalloc_node_track_caller+0x16f/0x3a0
Jan 19 10:44:23 argynvostholt kernel:  ? ksize+0x30/0x50
Jan 19 10:44:23 argynvostholt kernel:  ? __build_skb_around+0xb4/0xc0
Jan 19 10:44:23 argynvostholt kernel:  ndisc_send_ns+0xcd/0x200
Jan 19 10:44:23 argynvostholt kernel:  ndisc_solicit+0xc1/0x170
Jan 19 10:44:23 argynvostholt kernel:  ? __skb_clone+0x2e/0x140
Jan 19 10:44:23 argynvostholt kernel:  neigh_probe+0x52/0x70
Jan 19 10:44:23 argynvostholt kernel:  neigh_timer_handler+0x218/0x300
Jan 19 10:44:23 argynvostholt kernel:  ? neigh_changeaddr+0x50/0x50
Jan 19 10:44:23 argynvostholt kernel:  call_timer_fn+0x2b/0x120
Jan 19 10:44:23 argynvostholt kernel:  __run_timers.part.0+0x1e1/0x270
Jan 19 10:44:23 argynvostholt kernel:  ? ktime_get+0x46/0xc0
Jan 19 10:44:23 argynvostholt kernel:  ? native_x2apic_icr_read+0x20/0x20
Jan 19 10:44:23 argynvostholt kernel:  ? lapic_next_event+0x21/0x30
Jan 19 10:44:23 argynvostholt kernel:  ? clockevents_program_event+0xab/0x130
Jan 19 10:44:23 argynvostholt kernel:  run_timer_softirq+0x2a/0x60
Jan 19 10:44:23 argynvostholt kernel:  __do_softirq+0xd9/0x2ea
Jan 19 10:44:23 argynvostholt kernel:  irq_exit_rcu+0x94/0xc0
Jan 19 10:44:23 argynvostholt kernel:  sysvec_apic_timer_interrupt+0x80/0x90
Jan 19 10:44:23 argynvostholt kernel:  </IRQ>
Jan 19 10:44:23 argynvostholt kernel:  <TASK>
Jan 19 10:44:23 argynvostholt kernel:  asm_sysvec_apic_timer_interrupt+0x1b/0x20
Jan 19 10:44:23 argynvostholt kernel: RIP: 0010:cpuidle_enter_state+0xd9/0x620
Jan 19 10:44:23 argynvostholt kernel: Code: 3d 64 64 ff 5a e8 27 24 6e ff 49 89 c7 0f 1f 44 00 00 31 ff e8 68 31 6e ff 80 7d d0 00 0f 85 5e 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 f6 0f 88 6a 01 00 00 4d 63 ee 49 83 fd 09 0f 87 e5 03 00 00
Jan 19 10:44:23 argynvostholt kernel: RSP: 0018:ffffbe41c022fe38 EFLAGS: 00000246
Jan 19 10:44:23 argynvostholt kernel: RAX: ffff959fbf070bc0 RBX: ffff958104948400 RCX: 0000000000000000
Jan 19 10:44:23 argynvostholt kernel: RDX: 0000000000009235 RSI: 0000000025b7c068 RDI: 0000000000000000
Jan 19 10:44:23 argynvostholt kernel: RBP: ffffbe41c022fe88 R08: 0001244744b8c72c R09: 0000000000008ca0
Jan 19 10:44:23 argynvostholt kernel: R10: 0000000000000003 R11: 071c71c71c71c71c R12: ffffffffa64e6cc0
Jan 19 10:44:23 argynvostholt kernel: R13: 0000000000000001 R14: 0000000000000001 R15: 0001244744b8c72c
Jan 19 10:44:23 argynvostholt kernel:  ? cpuidle_enter_state+0xc8/0x620
Jan 19 10:44:23 argynvostholt kernel:  cpuidle_enter+0x2e/0x50
Jan 19 10:44:23 argynvostholt kernel:  do_idle+0x20d/0x2b0
Jan 19 10:44:23 argynvostholt kernel:  cpu_startup_entry+0x20/0x30
Jan 19 10:44:23 argynvostholt kernel:  start_secondary+0x12a/0x180
Jan 19 10:44:23 argynvostholt kernel:  secondary_startup_64_no_verify+0xc2/0xcb
Jan 19 10:44:23 argynvostholt kernel:  </TASK>

This bug about a CPU softlock was outputted several times about every CPU core, and server power draw spiked from 80W to 180W (likely due to the cores being stuck in a spinlock?). The system then did a reset.

Kernel version 5.15.74-1-pve, PVE version 7.3-3

This is not the first time this has happened I am not as to why. I have included the complete logs here: https://cdn.discordapp.com/attachments/329653697422295040/1065611992766877818/journal_2

Any ideas as to why this is happening?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!