KVM Internal Error. Suberror: 3 VM fails with exclamation mark

smith844

New Member
Apr 16, 2023
3
0
1
I am running proxmox on a Aliexpress N5105 with 4 x i226 NiCs and 16gb RAM (1x16gb module) and OPNSense in a VM. every 5 days or so it fails and I cannot seem to be able to restart the VM without powering off the box with the physical switch. I have tried restarting VM in proxmox and restarting proxmox with reboot command from front end.
I have seen several threads none of which seem to give a purposeful answer. As it is running as my firewall and router it takes my whole network down when it fails but i do not want to have to create a cluster for high availability, just not have it fail... Can anyone shed any light given the details below:

PVE Version:

Code:
proxmox-ve: 7.3-1 (running kernel: 5.15.74-1-pve)
pve-manager: 7.3-3 (running version: 7.3-3/c3928077)
pve-kernel-5.15: 7.2-14
pve-kernel-helper: 7.2-14
pve-kernel-5.15.74-1-pve: 5.15.74-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-8
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.2-12
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.7-1
proxmox-backup-file-restore: 2.2.7-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.3
pve-cluster: 7.3-1
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.5-6
pve-ha-manager: 3.5.1
pve-i18n: 2.8-1
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-1
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.6-pve1
qm config
Code:
balloon: 0
boot: order=scsi0;ide2;net0
cores: 2
cpu: host
ide2: none,media=cdrom
memory: 12288
meta: creation-qemu=7.1.0,ctime=1679739692
name: OPNSense
net0: virtio=06:27:C6:DE:3A:86,bridge=vmbr1
net1: virtio=12:A3:54:B1:40:F7,bridge=vmbr2
numa: 0
onboot: 1
ostype: l26
parent: saturday
scsi0: local-lvm:vm-100-disk-0,iothread=1,size=16G
scsihw: virtio-scsi-single
smbios1: uuid=b6863876-3c22-4c0e-be21-508fa2c51e84
sockets: 1
tablet: 0
vmgenid: 5ddd6cdc-8235-491d-9d8b-7520ad5b5181
Now two examples of the syslog entries which appear just before it fails:
Code:
Apr 10 17:22:33 n5105 QEMU[1012]: KVM internal error. Suberror: 3
Apr 10 17:22:33 n5105 QEMU[1012]: extra data[0]: 0x0000000080000b0e
Apr 10 17:22:33 n5105 QEMU[1012]: extra data[1]: 0x0000000000000031
Apr 10 17:22:33 n5105 QEMU[1012]: extra data[2]: 0x0000000000000083
Apr 10 17:22:33 n5105 QEMU[1012]: extra data[3]: 0x000000082900cfe0
Apr 10 17:22:33 n5105 QEMU[1012]: extra data[4]: 0x0000000000000001
Apr 10 17:22:33 n5105 QEMU[1012]: RAX=000000082900c948 RBX=fffffe0017754090 RCX=00000000c0000101 RDX=00000000ffffffff
Apr 10 17:22:33 n5105 QEMU[1012]: RSI=0000000000000031 RDI=fffffe0017754090 RBP=fffffe0017754080 RSP=fffffe0017753fb0
Apr 10 17:22:33 n5105 QEMU[1012]: R8 =000000c0006632e0 R9 =000000c00044b9c8 R10=0000000000000008 R11=0000000000000010
Apr 10 17:22:33 n5105 QEMU[1012]: R12=000000c000663200 R13=0000000000000040 R14=000000c0002fc9c0 R15=000000082900c948
Apr 10 17:22:33 n5105 QEMU[1012]: RIP=ffffffff81132fd1 RFL=00010082 [--S----] CPL=0 II=0 A20=1 SMM=0 HLT=0
Apr 10 17:22:33 n5105 QEMU[1012]: ES =003b 0000000000000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
Apr 10 17:22:33 n5105 QEMU[1012]: CS =0020 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
Apr 10 17:22:33 n5105 QEMU[1012]: SS =0000 0000000000000000 ffffffff 00c00000
Apr 10 17:22:33 n5105 QEMU[1012]: DS =003b 0000000000000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
Apr 10 17:22:33 n5105 QEMU[1012]: FS =0013 000000c000053498 ffffffff 00c0f300 DPL=3 DS   [-WA]
Apr 10 17:22:33 n5105 QEMU[1012]: GS =001b ffffffff82611000 ffffffff 00c0f300 DPL=3 DS   [-WA]
Apr 10 17:22:33 n5105 QEMU[1012]: LDT=0000 0000000000000000 ffffffff 00c00000
Apr 10 17:22:33 n5105 QEMU[1012]: TR =0048 ffffffff82611384 00002068 00008b00 DPL=0 TSS64-busy
Apr 10 17:22:33 n5105 QEMU[1012]: GDT=     ffffffff826113ec 00000067
Apr 10 17:22:33 n5105 QEMU[1012]: IDT=     ffffffff81f5d710 00000fff
Apr 10 17:22:33 n5105 QEMU[1012]: CR0=80050033 CR2=ffffffff81132fd1 CR3=000000082900c948 CR4=003506e0
Apr 10 17:22:33 n5105 QEMU[1012]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
Apr 10 17:22:33 n5105 QEMU[1012]: DR6=00000000ffff0ff0 DR7=0000000000000400
Apr 10 17:22:33 n5105 QEMU[1012]: EFER=0000000000000d01
Apr 10 17:22:33 n5105 QEMU[1012]: Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??

Apr 15 06:52:03 n5105 QEMU[1009]: KVM internal error. Suberror: 3
Apr 15 06:52:03 n5105 QEMU[1009]: extra data[0]: 0x0000000080000b0e
Apr 15 06:52:03 n5105 QEMU[1009]: extra data[1]: 0x0000000000000031
Apr 15 06:52:03 n5105 QEMU[1009]: extra data[2]: 0x0000000000000083
Apr 15 06:52:03 n5105 QEMU[1009]: extra data[3]: 0x0000000800578ff8
Apr 15 06:52:03 n5105 QEMU[1009]: extra data[4]: 0x0000000000000003
Apr 15 06:52:03 n5105 QEMU[1009]: KVM internal error. Suberror: 3
Apr 15 06:52:03 n5105 QEMU[1009]: extra data[0]: 0x0000000080000b0e
Apr 15 06:52:03 n5105 QEMU[1009]: extra data[1]: 0x0000000000000031
Apr 15 06:52:03 n5105 QEMU[1009]: extra data[2]: 0x0000000000000083
Apr 15 06:52:03 n5105 QEMU[1009]: extra data[3]: 0x000000080057dfe0
Apr 15 06:52:03 n5105 QEMU[1009]: extra data[4]: 0x0000000000000002
Apr 15 06:52:03 n5105 QEMU[1009]: RAX=0000000000000004 RBX=00000008d9829a58 RCX=00000008011e96da RDX=00000000000001ea
Apr 15 06:52:03 n5105 QEMU[1009]: RSI=00000008dbfff000 RDI=0000000000000092 RBP=00007fffdcecc1f0 RSP=34362f3a3a313970
Apr 15 06:52:03 n5105 QEMU[1009]: R8 =00000000000001ea R9 =0000000816cb5570 R10=00000008dbfff000 R11=0000000000000246
Apr 15 06:52:03 n5105 QEMU[1009]: R12=0000000000000092 R13=00000008d987ee00 R14=00000000000001ea R15=00000008dbfff000
Apr 15 06:52:03 n5105 QEMU[1009]: RIP=ffffffff811335fc RFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
Apr 15 06:52:03 n5105 QEMU[1009]: ES =003b 0000000000000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
Apr 15 06:52:03 n5105 QEMU[1009]: CS =0020 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
Apr 15 06:52:03 n5105 QEMU[1009]: SS =0028 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
Apr 15 06:52:03 n5105 QEMU[1009]: DS =003b 0000000000000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
Apr 15 06:52:03 n5105 QEMU[1009]: FS =0013 00000008e1250120 ffffffff 00c0f300 DPL=3 DS   [-WA]
Apr 15 06:52:03 n5105 QEMU[1009]: GS =001b ffffffff82610000 ffffffff 00c0f300 DPL=3 DS   [-WA]
Apr 15 06:52:03 n5105 QEMU[1009]: LDT=0000 0000000000000000 ffffffff 00c00000
Apr 15 06:52:03 n5105 QEMU[1009]: TR =0048 ffffffff82610384 00002068 00008b00 DPL=0 TSS64-busy
Apr 15 06:52:03 n5105 QEMU[1009]: GDT=     ffffffff826103ec 00000067
Apr 15 06:52:03 n5105 QEMU[1009]: IDT=     ffffffff81f5d710 00000fff
Apr 15 06:52:03 n5105 QEMU[1009]: CR0=80050033 CR2=ffffffff811335fc CR3=0000000800578620 CR4=003506e0
Apr 15 06:52:03 n5105 QEMU[1009]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
Apr 15 06:52:03 n5105 QEMU[1009]: DR6=00000000ffff0ff0 DR7=0000000000000400
Apr 15 06:52:03 n5105 QEMU[1009]: EFER=0000000000000d01
Apr 15 06:52:03 n5105 QEMU[1009]: Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
Apr 15 06:52:03 n5105 QEMU[1009]: RAX=000000080057d6d0 RBX=fffffe0017754090 RCX=00000000c0000101 RDX=00000000ffffffff
Apr 15 06:52:03 n5105 QEMU[1009]: RSI=0000000805a70080 RDI=fffffe0017754090 RBP=fffffe0017754080 RSP=fffffe0017753fb0
Apr 15 06:52:03 n5105 QEMU[1009]: R8 =0000000000000000 R9 =3639313a31636162 R10=34362f3a3a313a30 R11=0000000000000246
Apr 15 06:52:03 n5105 QEMU[1009]: R12=00000000000000a4 R13=0000000000000040 R14=0000000000000018 R15=000000080057d6d0
Apr 15 06:52:03 n5105 QEMU[1009]: RIP=ffffffff81132fd1 RFL=00010082 [--S----] CPL=0 II=0 A20=1 SMM=0 HLT=0
Apr 15 06:52:03 n5105 QEMU[1009]: ES =003b 0000000000000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
Apr 15 06:52:03 n5105 QEMU[1009]: CS =0020 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
Apr 15 06:52:03 n5105 QEMU[1009]: SS =0000 0000000000000000 ffffffff 00c00000
Apr 15 06:52:03 n5105 QEMU[1009]: DS =003b 0000000000000000 ffffffff 00c0f300 DPL=3 DS   [-WA]
Apr 15 06:52:03 n5105 QEMU[1009]: FS =0013 0000000800a69120 ffffffff 00c0f300 DPL=3 DS   [-WA]
Apr 15 06:52:03 n5105 QEMU[1009]: GS =001b ffffffff82611000 ffffffff 00c0f300 DPL=3 DS   [-WA]
Apr 15 06:52:03 n5105 QEMU[1009]: LDT=0000 0000000000000000 ffffffff 00c00000
Apr 15 06:52:03 n5105 QEMU[1009]: TR =0048 ffffffff82611384 00002068 00008b00 DPL=0 TSS64-busy
Apr 15 06:52:03 n5105 QEMU[1009]: GDT=     ffffffff826113ec 00000067
Apr 15 06:52:03 n5105 QEMU[1009]: IDT=     ffffffff81f5d710 00000fff
Apr 15 06:52:03 n5105 QEMU[1009]: CR0=80050033 CR2=ffffffff81132fd1 CR3=000000080057d6d0 CR4=003506e0
Apr 15 06:52:03 n5105 QEMU[1009]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
Apr 15 06:52:03 n5105 QEMU[1009]: DR6=00000000ffff0ff0 DR7=0000000000000400
Apr 15 06:52:03 n5105 QEMU[1009]: EFER=0000000000000d01
Apr 15 06:52:03 n5105 QEMU[1009]: Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
The answer is probably to move to OPNSense on the bare metal rather than use a proxmox. But in order of preference:
I would like it to stop it happening at all or
Have it auto restart if it fails
Any help gratefully received.
 
And just to say I tried to update the microcode and so far this has led to more regular (daily) failures so I will be trying more steps.