After Update - Kernelpanic an filesystem damage on all VM's with UEFI

fireon

Distinguished Member
Oct 25, 2010
4,502
476
153
Austria/Graz
deepdoc.at
Hello all,

since the latest update after some hour's, day's or minutes, VM's with ovmf are running in to a kernel panic. Really strange. Turning the repairprocess with an clonezilla iso, the liveiso get also an kernelpanic... what's going on? So after reboot into old kernel, it looks great again...

Code:
pveversion -v
proxmox-ve: 6.0-2 (running kernel: 5.0.21-2-pve)
pve-manager: 6.0-9 (running version: 6.0-9/508dcee0)
pve-kernel-5.0: 6.0-9
pve-kernel-helper: 6.0-9
pve-kernel-5.0.21-3-pve: 5.0.21-7
pve-kernel-5.0.21-2-pve: 5.0.21-7
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-5
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.0-9
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-65
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-8
pve-cluster: 6.0-7
pve-container: 3.0-7
pve-docs: 6.0-7
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.1-3
pve-xtermjs: 3.13.2-1
pve-zsync: 2.0-1
qemu-server: 6.0-9
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve1

Here are one VM-config:
Code:
agent: 1
bios: ovmf
bootdisk: scsi0
cores: 8
cpu: kvm64,flags=+pcid;+spec-ctrl
description: UCS Active Directory Slave DC%0A%0A- BlueSpice MediaWiki%0A- Wekan Kanbanboard
efidisk0: SSD-vmdata:vm-110-disk-0,size=1M
ide2: none,media=cdrom
memory: 5120
name: app.supertux.lan
net0: virtio=FA:8C:56:C1:BE:3C,bridge=vmbr0,firewall=1
numa: 1
onboot: 1
ostype: l26
scsi0: SSD-vmdata:vm-110-disk-1,discard=on,size=40G,ssd=1
scsi1: SSD-vmdata:vm-110-disk-2,discard=on,size=8G,ssd=1
scsihw: virtio-scsi-pci
serial0: socket
smbios1: uuid=1315bc07-3f6d-4e4c-a5eb-c6c46429e28c
sockets: 1
vga: qxl
vmgenid: 570312bc-e910-440a-adc2-33cd2cd9add2
The only way to solve this filesystemerrors was to recover the whole backup.
I have remotelogging activated. So here the kernellog from the crash. (attached)


Maybe bad kernel 5.0.21-3-pve


Thanks a lot.
 

Attachments

Hi,
Please reboot your system and run on 5.0.21-3.
There are mad kernels in the Kernel 5.0.21-2 series.
Or more precise a bug in the ZFS module.
 
Hmm, could not reproduce here, on an Intel host with a Ubuntu 19.10 OVMF installation (I tested the kernel with some OVMF VMs, but that one was still available to just run).

What I see in your log is that the error happens in the page fault code-path, and that the kernel cannot immediately allocate memory so it needs to free stuff first, possible high memory usage?

Can you tell me a bit more about the host HW?
I'll try to boot a plain Debian VM, to match your setup more closely.
 
If you use ZFS the issue could also be the following:
The data breakage could have happened when using the ZFS-problematic kernels with ABI 5.0.21-1 and 5.0.21-2, but only the reboot to a newer kernel, and thus reboot of the VM (?) made the issue show up, so while one may suspect the new kernel, it could have been the previous one and just a side-effect of kernel-update related reboot. An educate guess, only valid in the case the host really uses ZFS.
 
If you use ZFS the issue could also be the following:
The data breakage could have happened when using the ZFS-problematic kernels with ABI 5.0.21-1 and 5.0.21-2, but only the reboot to a newer kernel, and thus reboot of the VM (?) made the issue show up, so while one may suspect the new kernel, it could have been the previous one and just a side-effect of kernel-update related reboot. An educate guess, only valid in the case the host really uses ZFS.
Very strange...

Hello all, and thanks for the reply. On the hostmachine i had high memoryusage @the first crash, yes. I have no swap but zram. Normaly this should not be a problem (have this a long time), because if mem needed the hostkernel releases that. But @the second crash after reboot with the kernel 5.0.21-3 one VM (was repaired before) crashed again after some minutes another VM, and at this time there was no hight memory usage.

I Reboot the system in actual kernel.
 
Supermicro Board and Intel(R) Xeon(R) CPU E3-1265L v3 @ 2.50GHz. (VM's death) ZFS and LVM-Thin
HP ML310 Intel(R) Xeon(R) CPU E31220 @ 3.10GHz with QCOW2 (VM's death)
Intel(R) Xeon(R) CPU E3-1246 v3 @ 3.50GHz (VM's not death) ZFS
Intel(R) Core(TM) i3-4170 CPU @ 3.70GHz (VM's not death) ZFS

LXC's and local zfs do not seem to be affected.
 
Last edited:
Kernel 5.3 look like the solution. It is working the last two day's stable on all affected servers.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!