Random KVM crash

f4242

Well-Known Member
Dec 19, 2016
101
4
58
Quebec, QC
Hello,

I have a strange problem with a VM. It randomly crash (the kvm process crash and the VM become stopped in web UI). Nothing usefull in the guest logs and in the host logs I only have:

Dec 27 00:06:35 pve-ext1 kernel: [1766571.271950] vmbr0: port 3(tap5033i0) entered disabled state

It often crash arround midnight when I have a pg_dump running (but it doesn't always crash). This generate high cpu and network use.

I tried to build a test VM with some stress test tools to use cpu and network ressources but I never succeeded to reproduce manually. I also tried to do pg_dump in loops but still no crash...

Any way to start a KVM VM in some debug mode? I would like to be able to get a good stacktrace I could forward to Proxmox developpers.

Thanks!
 
Hi,

You can redirect the output from KVMs stderr to a file with the '-D' option switch.
To enable it for a VM use:
Code:
qm set VMID --args '-D /path/to/file.log'
(Note: this will override any existing manually set 'args' options, by default you do not have any)

Else gdb is an option, install it on the PVE and also install the pve-qemu-kvm-dbg package with `apt install gdb pve-qemu-kvm-dbg`

Now get the slightly modified (we remove daemonize and the kvm binary itself) VM command with
Code:
qm showcmd VMID | sed 's/^\/usr\/bin\/kvm//;s/ -daemonize//'

Start gdb with:
Code:
gdb /usr/bin/kvm

# [gdb starting output]

run ARGS_FROM_showcmd

replace "ARGS_FROM_showcmd" with the output from the above qm showcmd.

In my case, for example, I did:
Code:
(gdb) run -id 101 -chardev 'socket,id=qmp,path=/var/run/qemu-server/101.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/101.pid -smbios 'type=1,uuid=f685cdf7-151a-4ce5-87e7-ff1ae0ebdd81' -name donatello-min-dhcp -smp '2,sockets=1,cores=2,maxcpus=2' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga cirrus -vnc unix:/var/run/qemu-server/101.vnc,x509,password -cpu host,+kvm_pv_unhalt,+kvm_pv_eoi -m 1024 -k de -s -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:ca269bd14bdc' -drive 'file=/mnt/pve/iso/template/iso/alpine-extended-3.4.4-x86_64.iso,if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive 'file=/dev/rbd/test/vm-101-disk-1,if=none,id=drive-virtio0,cache=writeback,format=raw,aio=threads,detect-zeroes=on' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap101i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=62:61:38:32:63:66,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'

Now wait until the VM crashes then get a backtracke with:

Code:
set logging file backtrace.log
set logging on
thread apply all bt full
set logging off
quit

As the bt maybe big we write it to an file (backtrace.log).
 
  • Like
Reactions: chrone
The VM crashed again when doing a pg_dump. Log file only contains:
main-loop: WARNING: I/O thread spun for 1000 iterations

I'm not sure if this is related to the crash. The log has been overwrite when I rebooted the VM and I had to lookup the log from the backup (and I lost the modified time).
 
could you post the output of "pveversion -v"?
 
proxmox-ve: 4.3-72 (running kernel: 4.4.24-1-pve)
pve-manager: 4.3-12 (running version: 4.3-12/6894c9d9)
pve-kernel-4.4.6-1-pve: 4.4.6-48
pve-kernel-4.4.21-1-pve: 4.4.21-71
pve-kernel-4.4.16-1-pve: 4.4.16-64
pve-kernel-4.4.24-1-pve: 4.4.24-72
pve-kernel-4.4.19-1-pve: 4.4.19-66
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-47
qemu-server: 4.0-96
pve-firmware: 1.1-10
libpve-common-perl: 4.0-83
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-68
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-docs: 4.3-17
pve-qemu-kvm: 2.7.0-8
pve-container: 1.0-85
pve-firewall: 2.0-31
pve-ha-manager: 1.0-38
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.6-1
lxcfs: 2.0.5-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.8-pve13~bpo80
 
I suggest updating to the current PVE 4.4 packages, and check if the problem still exists.
 
It didn't crash since upgrading to PVE 4.4... but I had two crashes in the last week :(

EDIT: I just started the VM with gdb. Maybe I will get something usefull at the next crash.
 
Last edited: