QEMU Guest Kernel Panic

patrickli · Oct 24, 2022

I've got a new tiny Celeron N5105 box for my home lab as a router and run some LXC containers. The router is VyOS (Debian basically) running on QEMU.

I've noticed that it started to reboot itself often with no log indicating so. This feels like automatic reboot after kernel panic. So I setup a serial terminal to capture panic logs. I managed to capture one panic but can't make any sense of it. Any idea would be much appreciated.

Code:

[ 3875.691337] BUG: unable to handle page fault for address: 0000000060ef0144
[ 3875.692675] #PF: supervisor write access in kernel mode
[ 3875.693631] #PF: error_code(0x0002) - not-present page
[ 3875.694591] PGD 0 P4D 0
[ 3875.695131] Oops: 0002 [#1] SMP NOPTI
[ 3875.695840] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.4.218-amd64-vyos #1
[ 3875.697083] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 3875.699119] RIP: 0010:rcu_irq_enter+0x22/0x60
[ 3875.699946] Code: 0f 1f 84 00 00 00 00 00 48 c7 c0 c0 6d 02 00 48 89 c1 65 48 03 0d 76 86 f4 74 48 8b 91 d8 00 00 00 48 85 d2 78 3a 65 48 03 05 <62> 86 f4 74 8b b0 e0 00 00 00 b8 02 00 00 00 83 e6 02 75 11 e8 a5
[ 3875.703161] RSP: 0018:ffffb17cc00e4b40 EFLAGS: 00010006
[ 3875.704118] RAX: 0000000060ef0144 RBX: 000000000001bec0 RCX: 0000000000000000
[ 3875.705377] RDX: 0000000000000000 RSI: ffffffff8b801a6a RDI: ffffb17cc00e4b78
[ 3875.706639] RBP: ffffb17cc007be28 R08: 0000000000000000 R09: 0000000000000000
[ 3875.707894] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 3875.709149] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 3875.710417] FS:  0000000000000000(0000) GS:ffff9a647eb00000(0000) knlGS:0000000000000000
[ 3875.711884] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3875.712925] CR2: 0000000060ef0144 CR3: 0000000038334000 CR4: 0000000000340ee0
[ 3875.714223] Call Trace:
[ 3875.714771]  <IRQ>
[ 3875.715232]  irq_enter+0x5/0x50
[ 3875.715882]  smp_apic_timer_interrupt+0x1e/0x90
[ 3875.716771]  apic_timer_interrupt+0xf/0x20
[ 3875.717573]  ? apic_timer_interrupt+0xa/0x20
[ 3875.718423]  ? __copy_skb_header+0x9c/0x180
[ 3875.719241]  ? __skb_clone+0x24/0x100
[ 3875.719975]  ? dev_queue_xmit_nit+0xf8/0x2a0
[ 3875.720809]  ? dev_hard_start_xmit+0x64/0x110
[ 3875.721671]  ? __dev_queue_xmit+0x7ce/0x950
[ 3875.722514]  ? vlan_dev_hard_header+0x55/0x130 [8021q]
[ 3875.723494]  ? ip_finish_output2+0x17b/0x560
[ 3875.724320]  ? ip_output+0x64/0xe0
[ 3875.725007]  ? __ip_finish_output+0x230/0x230
[ 3875.725858]  ? ip_forward+0x378/0x480
[ 3875.726594]  ? ip4_key_hashfn+0xb0/0xb0
[ 3875.727351]  ? ip_rcv+0xb7/0xc0
[ 3875.727993]  ? ip_rcv_finish_core.isra.19+0x370/0x370
[ 3875.728966]  ? __netif_receive_skb_one_core+0x80/0x90
[ 3875.729942]  ? netif_receive_skb+0x2a/0xa0
[ 3875.730745]  ? net_rx_action+0x1d8/0x2e0
[ 3875.731526]  ? ifb_ri_tasklet+0x160/0x259 [ifb]
[ 3875.732403]  ? tasklet_action_common.isra.20+0x4c/0xb0
[ 3875.733386]  ? __do_softirq+0xd2/0x227
[ 3875.734135]  ? irq_exit+0x9e/0xa0
[ 3875.734810]  ? do_IRQ+0x49/0xd0
[ 3875.735463]  ? common_interrupt+0xf/0xf
[ 3875.736229]  </IRQ>
[ 3875.736707]  ? __sched_text_end+0x6/0x6
[ 3875.737471]  ? native_safe_halt+0xe/0x10
[ 3875.738247]  ? default_idle+0x5/0x10
[ 3875.738963]  ? do_idle+0x1ce/0x250
[ 3875.739658]  ? cpu_startup_entry+0x14/0x20
[ 3875.740458]  ? start_secondary+0x15f/0x1b0
[ 3875.741270]  ? secondary_startup_64+0xa4/0xb0
[ 3875.742127] Modules linked in: sch_fq_codel sch_htb act_mirred cls_u32 sch_ingress pppoe pppox af_packet ppp_generic slhc 8021q garp mrp ifb xt_MASQUERADE xt_set xt_multiport xt_comment xt_state xt_conntrack ip_set_hash_net ip_set_hash_ip ip_set fuse nft_chain_nat xt_CT xt_tcpudp nft_compat nfnetlink_cthelper nft_counter nf_tables nfnetlink nf_nat_pptp nf_conntrack_pptp nf_nat_h323 nf_conntrack_h323 nf_nat_sip nf_conntrack_sip nf_nat_tftp nf_nat_ftp nf_nat nf_conntrack_tftp nf_conntrack_ftp nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper virtio_balloon virtio_console pcspkr evdev sg button mpls_iptunnel mpls_router ip_tunnel mpls_gso br_netfilter bridge stp llc ip_tables x_tables autofs4 usb_storage ohci_hcd squashfs zstd_decompress lz4_decompress loop overlay ext4 crc32c_generic crc16 mbcache jbd2 nls_ascii sd_mod sr_mod cdrom virtio_net net_failover failover virtio_scsi ata_generic
[ 3875.742287]  crc32c_intel ata_piix libata virtio_pci i2c_piix4 uhci_hcd virtio_ring virtio scsi_mod ehci_hcd
[ 3875.759194] CR2: 0000000060ef0144
[ 3875.759889] ---[ end trace cadf75336e0897c9 ]---
[ 3875.760819] RIP: 0010:rcu_irq_enter+0x22/0x60
[ 3875.761715] Code: 0f 1f 84 00 00 00 00 00 48 c7 c0 c0 6d 02 00 48 89 c1 65 48 03 0d 76 86 f4 74 48 8b 91 d8 00 00 00 48 85 d2 78 3a 65 48 03 05 <62> 86 f4 74 8b b0 e0 00 00 00 b8 02 00 00 00 83 e6 02 75 11 e8 a5
[ 3875.765067] RSP: 0018:ffffb17cc00e4b40 EFLAGS: 00010006
[ 3875.766085] RAX: 0000000060ef0144 RBX: 000000000001bec0 RCX: 0000000000000000
[ 3875.767403] RDX: 0000000000000000 RSI: ffffffff8b801a6a RDI: ffffb17cc00e4b78
[ 3875.778652] RBP: ffffb17cc007be28 R08: 0000000000000000 R09: 0000000000000000
[ 3875.779966] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 3875.781285] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 3875.782619] FS:  0000000000000000(0000) GS:ffff9a647eb00000(0000) knlGS:0000000000000000
[ 3875.784151] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3875.785239] CR2: 0000000060ef0144 CR3: 0000000038334000 CR4: 0000000000340ee0
[ 3875.786569] Kernel panic - not syncing: Fatal exception in interrupt
[ 3875.787855] Kernel Offset: 0xa000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 3875.789747] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

VM configuration:

Code:

agent: 1
boot: order=scsi0
cores: 4
cpu: host
ide2: none,media=cdrom
memory: 1024
meta: creation-qemu=6.2.0,ctime=1660006189
name: vyos.mgmt.palace
net0: virtio=xxxx,bridge=vmbr0,queues=4
net1: virtio=xxxx,bridge=vmbr1,queues=4,mtu=1
numa: 0
onboot: 1
ostype: l26
protection: 1
scsi0: ssd:vm-101-disk-0,discard=on,size=4G,ssd=1
scsihw: virtio-scsi-pci
serial0: socket
smbios1: uuid=xxxx
sockets: 1
startup: order=1
tablet: 0
vmgenid: xxxx

leesteken · Oct 24, 2022

There have been reports about problems with Proxmox on N5105. Maybe some of the other threads can provide you with work-arounds?

patrickli · Oct 24, 2022

Well I'm glad that this is not just me but sad as this sucks. Will play around with a few things.

Neobin · Oct 24, 2022

For reference:
"Main" thread: https://forum.proxmox.com/threads/vm-freezes-irregularly.111494
Bugzilla: https://bugzilla.proxmox.com/show_bug.cgi?id=4188

Kapako · May 25, 2023

Hi there, don't know which version you are running, but I have had similar issues combined with freezing VMs in my HA setup.
Most of the times it seemed to be related to taking a snapshot for failover or backup. Sometimes even every other day.
A couple of months ago I found a possible resolution to upgrade to a newer kernel which is not yet general available and since then I did not have any issues anymore!!!!

My setup is at v7 level (pve-manager/7.4-3/9002ab8a) the kernel was on 5.14.something, I think, and now upgraded to 5.19. You may try what I did:

Code:

apt update
apt install pve-kernel-5.19
reboot

It will upgrade your kernel to: Linux 5.19.17-2-pve #1 SMP PREEMPT_DYNAMIC PVE 5.19.17-2 (Sat, 28 Jan 2023 16:40:25

fiona · May 25, 2023

Hi,

Kapako said:
Hi there, don't know which version you are running, but I have had similar issues combined with freezing VMs in my HA setup.
Most of the times it seemed to be related to taking a snapshot for failover or backup. Sometimes even every other day.
A couple of months ago I found a possible resolution to upgrade to a newer kernel which is not yet general available and since then I did not have any issues anymore!!!!

My setup is at v7 level (pve-manager/7.4-3/9002ab8a) the kernel was on 5.14.something, I think, and now upgraded to 5.19. You may try what I did:

Code:

apt update apt install pve-kernel-5.19 reboot

It will upgrade your kernel to: Linux 5.19.17-2-pve #1 SMP PREEMPT_DYNAMIC PVE 5.19.17-2 (Sat, 28 Jan 2023 16:40:25

just noting that the current opt-in kernel is 6.2, the 5.19 one won't receive any updates from us anymore.

Kapako · May 25, 2023

Hi Fiona, thanks for sharing. I will try that. Reverting to the default is not an option ;-)

Search

Search

QEMU Guest Kernel Panic

patrickli

Member

leesteken

Distinguished Member

patrickli

Member

Neobin

Distinguished Member

Kapako

New Member

fiona

Proxmox Staff Member

Kapako

New Member

We value your privacy