Hi everyone,
First post; I'll try to provide as much as I can.
A stock install of 7.0, fresh off the ISO, was running stable for me for a few weeks. I then did the usual process of updating packages and a reboot. Now my prox host has random crashes about once every 6-18 hours even with minimal workload or reason.
First, some background on my hardware:
CompuLab Fitlet2
Intel Celeron J3455
8 Gigs DDR 1866 RAM
Transcend mSATA M.2 SSD
Additional igb network adapters (via FACET card) beyond the two on board
Latest BIOS installed
And my Prox environment:
I am running two LXD containers on Debian 11
The only VM running is opnsense on the near-latest version - my box has 1.5 - 2 gigs free RAM at all times
lvm-thin provisioning; XFS for root (I also had similar crashes when using root on zfs) - 4 gig swap partition
5.11.22-4-pve #1 SMP PVE 5.11.22-8 (Fri, 27 Aug 2021 11:51:34 +0200) x86_64 GNU/Linux
Latest intel microcode package from debian
The pain
My most recent crash is below. I have tried setting
Crash #1 is attached for length considerations;
I've run the tool
I am currently going on a whim and temporarily disabling all spectre / PTI mitigations (
Edit: The VM in question has been running stable for a week on another, differently-specced proxmox install with the same package list as the crashing host.
Edit 2: Noted 4 gig swap partition
The reason for this post: if anyone on the forum recognizes anything in the crash logs above that could be helpful.
Edit 3: Update: Solved!
First post; I'll try to provide as much as I can.
A stock install of 7.0, fresh off the ISO, was running stable for me for a few weeks. I then did the usual process of updating packages and a reboot. Now my prox host has random crashes about once every 6-18 hours even with minimal workload or reason.
First, some background on my hardware:
CompuLab Fitlet2
Intel Celeron J3455
8 Gigs DDR 1866 RAM
Transcend mSATA M.2 SSD
Additional igb network adapters (via FACET card) beyond the two on board
Latest BIOS installed
And my Prox environment:
I am running two LXD containers on Debian 11
The only VM running is opnsense on the near-latest version - my box has 1.5 - 2 gigs free RAM at all times
lvm-thin provisioning; XFS for root (I also had similar crashes when using root on zfs) - 4 gig swap partition
5.11.22-4-pve #1 SMP PVE 5.11.22-8 (Fri, 27 Aug 2021 11:51:34 +0200) x86_64 GNU/Linux
Latest intel microcode package from debian
non-free
(also crashed without microcode update)processor.max_cstate=1 intel_idle.max_cstate=1
set as boot options - when the system was stable, it was only stable with these enabled... similar to the braswell c-state bugThe pain
My most recent crash is below. I have tried setting
aio=native
for my sole VM based on a few other posts I saw, however that has not helped the situation much. My two crash logs are attached - I managed to capture both via netconsole without issue. I've also posted my vm config, dmesg output and pveversion --verbose
output. Again, the crashes are happening as the VM (and prox host) sits mostly idle as I have moved my production opnsense install to another box.Crash #1 is attached for length considerations;
aio=native
was not set.[96996.156090] BUG: unable to handle page fault for address: ffffffff9b32e5e0
[96996.156138] #PF: supervisor instruction fetch in kernel mode
[96996.156151] #PF: error_code(0x0010) - not-present page
[96996.156162] PGD 1f5215067 P4D 1f5215067 PUD 1f5216063 PMD 0
[96996.156182] Oops: 0010 [#1] SMP NOPTI
[96996.156197] CPU: 0 PID: 6940 Comm: kvm Tainted: P W O 5.11.22-4-pve #1
[96996.156213] Hardware name: N/A N/A/N/A, BIOS FLT2.NBR.0.46.02.01 03/07/2021
[96996.156224] RIP: 0010:0xffffffff9b32e5e0
[96996.156242] Code: Unable to access opcode bytes at RIP 0xffffffff9b32e5b6.
[96996.156253] RSP: 0018:ffffb58c05397f00 EFLAGS: 00010246
[96996.156266] RAX: 0000000000000000 RBX: ffff8bae8a38de01 RCX: 0000000000000000
[96996.156278] RDX: 000000000000ae80 RSI: 000000000000001a RDI: ffff8bae8a38de00
[96996.156290] RBP: ffffb58c05397f30 R08: 0000000000004000 R09: 000000000000001a
[96996.156301] R10: 0000000000000003 R11: 0000000000000000 R12: 000000000000001a
[96996.156312] R13: 000000000000ae80 R14: 0000000000000000 R15: ffff8bae8a38de00
[96996.156324] FS: 00007f4ccbfff700(0000) GS:ffff8baff7c00000(0000) knlGS:0000000000000000
[96996.156339] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[96996.156351] CR2: ffffffff9b32e5b6 CR3: 0000000103db6000 CR4: 00000000003526f0
[96996.156366] Call Trace:
[96996.156377] ? __x64_sys_ioctl+0x6f/0xc0
[96996.156397] do_syscall_64+0x38/0x90
[96996.156414] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[96996.156430] RIP: 0033:0x7f4cdaf98cc7
[96996.156443] Code: 00 00 00 48 8b 05 c9 91 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 99 91 0c 00 f7 d8 64 89 01 48
[96996.156464] RSP: 002b:00007f4ccbffa288 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[96996.156479] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f4cdaf98cc7
[96996.156490] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000001a
[96996.156501] RBP: 000055e6a1485c90 R08: 000055e69f810b38 R09: 00000000ffffffff
[96996.156512] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
[96996.156523] R13: 000055e69fc61e60 R14: 0000000000000000 R15: 0000000000000000
[96996.156539] Modules linked in: nft_counter nft_chain_nat cfg80211 nft_compat nf_tables 8021q garp mrp veth tcp_diag inet_diag ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_nat xt_REDIRECT nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_tcpudp iptable_filter bpfilter bonding tls nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common intel_telemetry_pltdrv intel_punit_ipc intel_telemetry_core x86_pkg_temp_thermal kvm_intel kvm irqbypass crct10dif_pclmul ghash_clmulni_intel mei_hdcp aesni_intel crypto_simd cryptd glue_helper pcspkr efi_pstore at24 rapl intel_cstate 8250_dw i915 drm_kms_helper cec rc_core fb_sys_fops syscopyarea sysfillrect sysimgblt mei_me intel_xhci_usb_role_switch mei mac_hid zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi coretemp drm sunrpc ip_tables x_tables
[96996.156746] autofs4 xfs btrfs blake2b_generic xor raid6_pq netconsole dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c lpc_ich crc32_pclmul i2c_i801 i2c_smbus ahci igb xhci_pci xhci_pci_renesas i2c_algo_bit intel_lpss_pci intel_lpss idma64 xhci_hcd virt_dma libahci dca video intel_pmc_bxt pinctrl_broxton
[96996.156888] CR2: ffffffff9b32e5e0
[96996.156900] ---[ end trace d2ba67674b61f60c ]---
[96996.156901] BUG: unable to handle page fault for address: ffffffff9b32e5e0
[96996.156995] #PF: supervisor instruction fetch in kernel mode
[96996.159804] #PF: error_code(0x0010) - not-present page
[96996.159816] PGD 1f5215067 P4D 1f5215067 PUD 1f5216063 PMD 0
[96996.159836] Oops: 0010 [#2] SMP NOPTI
[96996.159849] CPU: 1 PID: 6941 Comm: kvm Tainted: P D W O 5.11.22-4-pve #1
[96996.159864] Hardware name: N/A N/A/N/A, BIOS FLT2.NBR.0.46.02.01 03/07/2021
[96996.159875] RIP: 0010:0xffffffff9b32e5e0
[96996.159891] Code: Unable to access opcode bytes at RIP 0xffffffff9b32e5b6.
[96996.159902] RSP: 0018:ffffb58c0532bf00 EFLAGS: 00010246
[96996.159916] RAX: 0000000000000000 RBX: ffff8bae8a38dd01 RCX: 0000000000000000
[96996.159928] RDX: 000000000000ae80 RSI: 000000000000001b RDI: ffff8bae8a38dd00
[96996.159940] RBP: ffffb58c0532bf30 R08: 0000000000004000 R09: 000000000000001b
[96996.159951] R10: 0000000000000003 R11: 0000000000000000 R12: 000000000000001b
[96996.159962] R13: 000000000000ae80 R14: 0000000000000000 R15: ffff8bae8a38dd00
[96996.159974] FS: 00007f4ccb7fe700(0000) GS:ffff8baff7c80000(0000) knlGS:0000000000000000
[96996.159988] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[96996.160000] CR2: ffffffff9b32e5b6 CR3: 0000000103db6000 CR4: 00000000003526e0
[96996.160012] Call Trace:
[96996.160022] ? __x64_sys_ioctl+0x6f/0xc0
[96996.160039] do_syscall_64+0x38/0x90
[96996.162276] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[96996.165078] RIP: 0033:0x7f4cdaf98cc7
[96996.165093] Code: 00 00 00 48 8b 05 c9 91 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 99 91 0c 00 f7 d8 64 89 01 48
[96996.165114] RSP: 002b:00007f4ccb7f9288 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[96996.165132] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f4cdaf98cc7
[96996.165144] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000001b
[96996.165156] RBP: 000055e6a14bca90 R08: 000055e69f810b38 R09: 00000000ffffffff
[96996.165167] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
[96996.165179] R13: 000055e69fc61e60 R14: 0000000000000000 R15: 0000000000000000
[96996.165194] Modules linked in: nft_counter nft_chain_nat cfg80211 nft_compat nf_tables 8021q garp mrp veth tcp_diag inet_diag ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_nat xt_REDIRECT nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_tcpudp iptable_filter bpfilter bonding tls nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common intel_telemetry_pltdrv intel_punit_ipc intel_telemetry_core x86_pkg_temp_thermal kvm_intel kvm irqbypass crct10dif_pclmul ghash_clmulni_intel mei_hdcp aesni_intel crypto_simd cryptd glue_helper pcspkr efi_pstore at24 rapl intel_cstate 8250_dw i915 drm_kms_helper cec rc_core fb_sys_fops syscopyarea sysfillrect sysimgblt mei_me intel_xhci_usb_role_switch mei mac_hid zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi coretemp drm sunrpc ip_tables x_tables
[96996.170304] autofs4 xfs btrfs blake2b_generic xor raid6_pq netconsole dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c lpc_ich crc32_pclmul i2c_i801 i2c_smbus ahci igb xhci_pci xhci_pci_renesas i2c_algo_bit intel_lpss_pci intel_lpss idma64 xhci_hcd virt_dma libahci dca video intel_pmc_bxt pinctrl_broxton
[96996.175287] CR2: ffffffff9b32e5e0
[96996.175302] ---[ end trace d2ba67674b61f60d ]---
[96996.238182] RIP: 0010:0xffffffff9b32e5e0
[96996.238202] RIP: 0010:0xffffffff9b32e5e0
[96996.238228] Code: Unable to access opcode bytes at RIP 0xffffffff9b32e5b6.
[96996.238244] Code: Unable to access opcode bytes at RIP 0xffffffff9b32e5b6.
[96996.238250] RSP: 0018:ffffb58c05397f00 EFLAGS: 00010246
[96996.238262] RSP: 0018:ffffb58c05397f00 EFLAGS: 00010246
[96996.238276] RAX: 0000000000000000 RBX: ffff8bae8a38de01 RCX: 0000000000000000
[96996.238286] RAX: 0000000000000000 RBX: ffff8bae8a38de01 RCX: 0000000000000000
[96996.238298] RDX: 000000000000ae80 RSI: 000000000000001a RDI: ffff8bae8a38de00
[96996.238308] RDX: 000000000000ae80 RSI: 000000000000001a RDI: ffff8bae8a38de00
[96996.238318] RBP: ffffb58c05397f30 R08: 0000000000004000 R09: 000000000000001a
[96996.238329] RBP: ffffb58c05397f30 R08: 0000000000004000 R09: 000000000000001a
[96996.238339] R10: 0000000000000003 R11: 0000000000000000 R12: 000000000000001a
[96996.238348] R10: 0000000000000003 R11: 0000000000000000 R12: 000000000000001a
[96996.238358] R13: 000000000000ae80 R14: 0000000000000000 R15: ffff8bae8a38de00
[96996.238368] R13: 000000000000ae80 R14: 0000000000000000 R15: ffff8bae8a38de00
[96996.238379] FS: 00007f4ccbfff700(0000) GS:ffff8baff7c00000(0000) knlGS:0000000000000000
[96996.238388] FS: 00007f4ccb7fe700(0000) GS:ffff8baff7c80000(0000) knlGS:0000000000000000
[96996.238399] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[96996.238410] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[96996.238419] CR2: ffffffff9b32e5b6 CR3: 0000000103db6000 CR4: 00000000003526f0
[96996.238428] CR2: ffffffff9b32e5b6 CR3: 0000000103db6000 CR4: 00000000003526e0
[96996.156138] #PF: supervisor instruction fetch in kernel mode
[96996.156151] #PF: error_code(0x0010) - not-present page
[96996.156162] PGD 1f5215067 P4D 1f5215067 PUD 1f5216063 PMD 0
[96996.156182] Oops: 0010 [#1] SMP NOPTI
[96996.156197] CPU: 0 PID: 6940 Comm: kvm Tainted: P W O 5.11.22-4-pve #1
[96996.156213] Hardware name: N/A N/A/N/A, BIOS FLT2.NBR.0.46.02.01 03/07/2021
[96996.156224] RIP: 0010:0xffffffff9b32e5e0
[96996.156242] Code: Unable to access opcode bytes at RIP 0xffffffff9b32e5b6.
[96996.156253] RSP: 0018:ffffb58c05397f00 EFLAGS: 00010246
[96996.156266] RAX: 0000000000000000 RBX: ffff8bae8a38de01 RCX: 0000000000000000
[96996.156278] RDX: 000000000000ae80 RSI: 000000000000001a RDI: ffff8bae8a38de00
[96996.156290] RBP: ffffb58c05397f30 R08: 0000000000004000 R09: 000000000000001a
[96996.156301] R10: 0000000000000003 R11: 0000000000000000 R12: 000000000000001a
[96996.156312] R13: 000000000000ae80 R14: 0000000000000000 R15: ffff8bae8a38de00
[96996.156324] FS: 00007f4ccbfff700(0000) GS:ffff8baff7c00000(0000) knlGS:0000000000000000
[96996.156339] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[96996.156351] CR2: ffffffff9b32e5b6 CR3: 0000000103db6000 CR4: 00000000003526f0
[96996.156366] Call Trace:
[96996.156377] ? __x64_sys_ioctl+0x6f/0xc0
[96996.156397] do_syscall_64+0x38/0x90
[96996.156414] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[96996.156430] RIP: 0033:0x7f4cdaf98cc7
[96996.156443] Code: 00 00 00 48 8b 05 c9 91 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 99 91 0c 00 f7 d8 64 89 01 48
[96996.156464] RSP: 002b:00007f4ccbffa288 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[96996.156479] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f4cdaf98cc7
[96996.156490] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000001a
[96996.156501] RBP: 000055e6a1485c90 R08: 000055e69f810b38 R09: 00000000ffffffff
[96996.156512] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
[96996.156523] R13: 000055e69fc61e60 R14: 0000000000000000 R15: 0000000000000000
[96996.156539] Modules linked in: nft_counter nft_chain_nat cfg80211 nft_compat nf_tables 8021q garp mrp veth tcp_diag inet_diag ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_nat xt_REDIRECT nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_tcpudp iptable_filter bpfilter bonding tls nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common intel_telemetry_pltdrv intel_punit_ipc intel_telemetry_core x86_pkg_temp_thermal kvm_intel kvm irqbypass crct10dif_pclmul ghash_clmulni_intel mei_hdcp aesni_intel crypto_simd cryptd glue_helper pcspkr efi_pstore at24 rapl intel_cstate 8250_dw i915 drm_kms_helper cec rc_core fb_sys_fops syscopyarea sysfillrect sysimgblt mei_me intel_xhci_usb_role_switch mei mac_hid zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi coretemp drm sunrpc ip_tables x_tables
[96996.156746] autofs4 xfs btrfs blake2b_generic xor raid6_pq netconsole dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c lpc_ich crc32_pclmul i2c_i801 i2c_smbus ahci igb xhci_pci xhci_pci_renesas i2c_algo_bit intel_lpss_pci intel_lpss idma64 xhci_hcd virt_dma libahci dca video intel_pmc_bxt pinctrl_broxton
[96996.156888] CR2: ffffffff9b32e5e0
[96996.156900] ---[ end trace d2ba67674b61f60c ]---
[96996.156901] BUG: unable to handle page fault for address: ffffffff9b32e5e0
[96996.156995] #PF: supervisor instruction fetch in kernel mode
[96996.159804] #PF: error_code(0x0010) - not-present page
[96996.159816] PGD 1f5215067 P4D 1f5215067 PUD 1f5216063 PMD 0
[96996.159836] Oops: 0010 [#2] SMP NOPTI
[96996.159849] CPU: 1 PID: 6941 Comm: kvm Tainted: P D W O 5.11.22-4-pve #1
[96996.159864] Hardware name: N/A N/A/N/A, BIOS FLT2.NBR.0.46.02.01 03/07/2021
[96996.159875] RIP: 0010:0xffffffff9b32e5e0
[96996.159891] Code: Unable to access opcode bytes at RIP 0xffffffff9b32e5b6.
[96996.159902] RSP: 0018:ffffb58c0532bf00 EFLAGS: 00010246
[96996.159916] RAX: 0000000000000000 RBX: ffff8bae8a38dd01 RCX: 0000000000000000
[96996.159928] RDX: 000000000000ae80 RSI: 000000000000001b RDI: ffff8bae8a38dd00
[96996.159940] RBP: ffffb58c0532bf30 R08: 0000000000004000 R09: 000000000000001b
[96996.159951] R10: 0000000000000003 R11: 0000000000000000 R12: 000000000000001b
[96996.159962] R13: 000000000000ae80 R14: 0000000000000000 R15: ffff8bae8a38dd00
[96996.159974] FS: 00007f4ccb7fe700(0000) GS:ffff8baff7c80000(0000) knlGS:0000000000000000
[96996.159988] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[96996.160000] CR2: ffffffff9b32e5b6 CR3: 0000000103db6000 CR4: 00000000003526e0
[96996.160012] Call Trace:
[96996.160022] ? __x64_sys_ioctl+0x6f/0xc0
[96996.160039] do_syscall_64+0x38/0x90
[96996.162276] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[96996.165078] RIP: 0033:0x7f4cdaf98cc7
[96996.165093] Code: 00 00 00 48 8b 05 c9 91 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 99 91 0c 00 f7 d8 64 89 01 48
[96996.165114] RSP: 002b:00007f4ccb7f9288 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[96996.165132] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f4cdaf98cc7
[96996.165144] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000001b
[96996.165156] RBP: 000055e6a14bca90 R08: 000055e69f810b38 R09: 00000000ffffffff
[96996.165167] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
[96996.165179] R13: 000055e69fc61e60 R14: 0000000000000000 R15: 0000000000000000
[96996.165194] Modules linked in: nft_counter nft_chain_nat cfg80211 nft_compat nf_tables 8021q garp mrp veth tcp_diag inet_diag ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_nat xt_REDIRECT nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_tcpudp iptable_filter bpfilter bonding tls nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common intel_telemetry_pltdrv intel_punit_ipc intel_telemetry_core x86_pkg_temp_thermal kvm_intel kvm irqbypass crct10dif_pclmul ghash_clmulni_intel mei_hdcp aesni_intel crypto_simd cryptd glue_helper pcspkr efi_pstore at24 rapl intel_cstate 8250_dw i915 drm_kms_helper cec rc_core fb_sys_fops syscopyarea sysfillrect sysimgblt mei_me intel_xhci_usb_role_switch mei mac_hid zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi coretemp drm sunrpc ip_tables x_tables
[96996.170304] autofs4 xfs btrfs blake2b_generic xor raid6_pq netconsole dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c lpc_ich crc32_pclmul i2c_i801 i2c_smbus ahci igb xhci_pci xhci_pci_renesas i2c_algo_bit intel_lpss_pci intel_lpss idma64 xhci_hcd virt_dma libahci dca video intel_pmc_bxt pinctrl_broxton
[96996.175287] CR2: ffffffff9b32e5e0
[96996.175302] ---[ end trace d2ba67674b61f60d ]---
[96996.238182] RIP: 0010:0xffffffff9b32e5e0
[96996.238202] RIP: 0010:0xffffffff9b32e5e0
[96996.238228] Code: Unable to access opcode bytes at RIP 0xffffffff9b32e5b6.
[96996.238244] Code: Unable to access opcode bytes at RIP 0xffffffff9b32e5b6.
[96996.238250] RSP: 0018:ffffb58c05397f00 EFLAGS: 00010246
[96996.238262] RSP: 0018:ffffb58c05397f00 EFLAGS: 00010246
[96996.238276] RAX: 0000000000000000 RBX: ffff8bae8a38de01 RCX: 0000000000000000
[96996.238286] RAX: 0000000000000000 RBX: ffff8bae8a38de01 RCX: 0000000000000000
[96996.238298] RDX: 000000000000ae80 RSI: 000000000000001a RDI: ffff8bae8a38de00
[96996.238308] RDX: 000000000000ae80 RSI: 000000000000001a RDI: ffff8bae8a38de00
[96996.238318] RBP: ffffb58c05397f30 R08: 0000000000004000 R09: 000000000000001a
[96996.238329] RBP: ffffb58c05397f30 R08: 0000000000004000 R09: 000000000000001a
[96996.238339] R10: 0000000000000003 R11: 0000000000000000 R12: 000000000000001a
[96996.238348] R10: 0000000000000003 R11: 0000000000000000 R12: 000000000000001a
[96996.238358] R13: 000000000000ae80 R14: 0000000000000000 R15: ffff8bae8a38de00
[96996.238368] R13: 000000000000ae80 R14: 0000000000000000 R15: ffff8bae8a38de00
[96996.238379] FS: 00007f4ccbfff700(0000) GS:ffff8baff7c00000(0000) knlGS:0000000000000000
[96996.238388] FS: 00007f4ccb7fe700(0000) GS:ffff8baff7c80000(0000) knlGS:0000000000000000
[96996.238399] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[96996.238410] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[96996.238419] CR2: ffffffff9b32e5b6 CR3: 0000000103db6000 CR4: 00000000003526f0
[96996.238428] CR2: ffffffff9b32e5b6 CR3: 0000000103db6000 CR4: 00000000003526e0
agent: 1
balloon: 0
boot: order=scsi0;net0
cores: 3
cpu: host
hotplug: 0
machine: q35
memory: 4096
name: opnsense
net0: virtio=[MACADDRESS],bridge=vmbr1
net1: virtio=[MACADDRESS],bridge=vmbr2
numa: 0
onboot: 1
ostype: l26
scsi0: local-lvm:vm-100-disk-0,cache=none,discard=on,iothread=1,size=26G,ssd=1,aio=native
scsihw: virtio-scsi-pci
serial0: socket
smbios1: uuid=632d2c94-4981-4810-8151-68ed2c125b70
sockets: 1
startup: order=1
vmgenid: f24a78c1-d0ea-464d-9719-5c0baf54730b
balloon: 0
boot: order=scsi0;net0
cores: 3
cpu: host
hotplug: 0
machine: q35
memory: 4096
name: opnsense
net0: virtio=[MACADDRESS],bridge=vmbr1
net1: virtio=[MACADDRESS],bridge=vmbr2
numa: 0
onboot: 1
ostype: l26
scsi0: local-lvm:vm-100-disk-0,cache=none,discard=on,iothread=1,size=26G,ssd=1,aio=native
scsihw: virtio-scsi-pci
serial0: socket
smbios1: uuid=632d2c94-4981-4810-8151-68ed2c125b70
sockets: 1
startup: order=1
vmgenid: f24a78c1-d0ea-464d-9719-5c0baf54730b
proxmox-ve: 7.0-2 (running kernel: 5.11.22-4-pve)
pve-manager: 7.0-11 (running version: 7.0-11/63d82f4e)
pve-kernel-5.11: 7.0-7
pve-kernel-helper: 7.0-7
pve-kernel-5.11.22-4-pve: 5.11.22-8
ceph-fuse: 15.2.13-pve1
corosync: 3.1.2-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.21-pve1
libproxmox-acme-perl: 1.3.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-6
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-2
libpve-storage-perl: 7.0-10
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.9-2
proxmox-backup-file-restore: 2.0.9-2
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.0-9
pve-docs: 7.0-5
pve-edk2-firmware: 3.20200531-1
pve-firewall: 4.2-2
pve-firmware: 3.3-1
pve-ha-manager: 3.3-1
pve-i18n: 2.4-1
pve-qemu-kvm: 6.0.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-13
smartmontools: 7.2-1
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1
pve-manager: 7.0-11 (running version: 7.0-11/63d82f4e)
pve-kernel-5.11: 7.0-7
pve-kernel-helper: 7.0-7
pve-kernel-5.11.22-4-pve: 5.11.22-8
ceph-fuse: 15.2.13-pve1
corosync: 3.1.2-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.21-pve1
libproxmox-acme-perl: 1.3.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-6
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-2
libpve-storage-perl: 7.0-10
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.9-2
proxmox-backup-file-restore: 2.0.9-2
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.0-9
pve-docs: 7.0-5
pve-edk2-firmware: 3.20200531-1
pve-firewall: 4.2-2
pve-firmware: 3.3-1
pve-ha-manager: 3.3-1
pve-i18n: 2.4-1
pve-qemu-kvm: 6.0.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-13
smartmontools: 7.2-1
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1
I've run the tool
stress-ng
(stressing CPU, HDD, Memory) and Passmark's stress testing tool to try to trigger the crash early - both run super stable and the system remains completely operational. This leads me to believe, with a very limtied education on the matter, that the VM is doing something cheeky and OOPSing or Panicking the kernel.I am currently going on a whim and temporarily disabling all spectre / PTI mitigations (
mitigations=off
for grub) on the chance I have some sort of BIOS or microcode bug biting me. My next debug step, if the mitigation thing doesn't work out, is to downgrade to the lowest/oldest 5.11 kernel that I can, eg: the stock one on the installer ISO. The kernel on the prox ISO was super stable which leads me to believe that some sort of backported fix is causing my issues.Edit: The VM in question has been running stable for a week on another, differently-specced proxmox install with the same package list as the crashing host.
Edit 2: Noted 4 gig swap partition
The reason for this post: if anyone on the forum recognizes anything in the crash logs above that could be helpful.
Edit 3: Update: Solved!
nopti
needed to be set as a kernel boot option as the kernel mitigation and the microcode were seemingly in conflict with each other - the Kernel should have disbled its mitigations based on some syscalls returned by the CPU, this wasn't happening it seems..? The 0x32 microcode package (for Apollo Lake) or newer needs to be baked in to the BIOS of our PC -or- install intel-microcode
from the Debian non-free
repo to negate the need for kernel PTI.Attachments
Last edited: