Weird kernel dumps ?

Jan 12, 2015
94
2
28
Since upgrading to 5.2, I've been getting VM crashes intermittently. They don't seem to be related to any one hypervisor and I can't actually tell what's wrong from the console message (below). Anyone have any ideas where to look for the problem? Seems like we used to get these years ago when the lzo pipe would have a buffer underrun from vzdump or something. Not sure how the vzdump is different with 5.2. thanks.
 

Attachments

  • shot.png
    shot.png
    42.6 KB · Views: 12
On what 'pveversion -v' are you? And please post more from the trace, the screen sadly shows only half of it. But as a first guess, check your storage.
 
Hi,
Is there a way to get more info on the crash? The screenshot is all that was on the console. I couldn't page up/down or login to the VM. I grep'd for 'Task' in the logs after a reboot of the VM but nothing turned up. pveversion output below.
Thanks

# pveversion -v
proxmox-ve: 5.2-2 (running kernel: 4.15.18-1-pve)
pve-manager: 5.2-5 (running version: 5.2-5/eb24855a)
pve-kernel-4.15: 5.2-4
pve-kernel-4.15.18-1-pve: 4.15.18-15
pve-kernel-4.15.17-3-pve: 4.15.17-14
pve-kernel-4.15.17-1-pve: 4.15.17-9
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-35
libpve-guest-common-perl: 2.0-17
libpve-http-server-perl: 2.0-9
libpve-storage-perl: 5.0-24
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-3
lxcfs: 3.0.0-1
novnc-pve: 1.0.0-1
proxmox-widget-toolkit: 1.0-19
pve-cluster: 5.0-28
pve-container: 2.0-24
pve-docs: 5.2-4
pve-firewall: 3.0-13
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.2-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-29
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.9-pve1~bpo9
 
So the VMs crash, not the hosts. Check your underlying storage, it well may have issues (slow, defective blocks,...).
 
Ok thank you. Any idea where to find logs on Proxmox server that might explain what happened to make the VM crash? Also, the storage server backing the VMs does not seem to be heavily utilized except for one 60% utilization spike - but maybe that would do it..(https://imgur.com/a/U9oaFvH) . The Proxmox and storage servers were rebooted on 7/28 in which they are configured to `fsck` on boot. The crash happened prior to 8am in the graph. I do not know what time exactly however.
 
Any idea where to find logs on Proxmox server that might explain what happened to make the VM crash?
Only the standard logs, syslog/kernel/journal, on PVE and inside the VM. Depending on the issue, it might well be that there is nothing logged on either side.

Also, the storage server backing the VMs does not seem to be heavily utilized except for one 60% utilization spike - but maybe that would do it..(https://imgur.com/a/U9oaFvH)
I don't understand the graph. What kind of utilization, bandwidth, I/O, fill level? The granularity of the graph might hide values. Another important factor is also the latency of the storage, especially when the network is involved.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!