VMs keep crashing, showing strange errors

peterge

Member
Apr 3, 2019
19
0
21
25
Hey all,
im having big problems on my proxmox (single node) box.
I run 2 ubuntu 18 server vms, with a few docker container inside. On these vms i often get these kind of errors. Then the whole vm stops and needs to be restartet. Sometimes my proxmox host stops resonding too.

Any idea what causes these errors?

My hardware:
Ryzen 2200g
LVM on Crucial 500gb SSD
Installed proxmox on top of debian
8 gb ram

This one showed up inside the ubuntu server vm:IMG_20190423_222000.jpg
And this on the proxmox host itself:
IMG_20190424_214336.jpg
 
latest bios applied?

latest Proxmox VE kernel? post your "pveversion -v"
 
First, the bios is on the newest version.

I took my time and tried 3 kernels. F
4.15.18-13-pve with custom debian installation and proxmox on top (like described in the wiki).
Then i installed a custom kernel 4.19, but the problem continued. After testing both kernels for 2+ days, i reinstalled proxmox by using the latest iso. There i am on 4.15.18-13-pve and bam - the vm crashed again this evening, after running fine for ~8 hours.

Here is pveversion:

root@pve:~# pveversion -v
proxmox-ve: 5.4-1 (running kernel: 4.15.18-13-pve)
pve-manager: 5.4-5 (running version: 5.4-5/c6fdb264)
pve-kernel-4.15: 5.4-1
pve-kernel-4.15.18-13-pve: 4.15.18-37
pve-kernel-4.15.18-12-pve: 4.15.18-36
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-51
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-13
libpve-storage-perl: 5.0-41
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-3
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-26
pve-cluster: 5.0-36
pve-container: 2.0-37
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-20
pve-firmware: 2.0-6
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 2.12.1-3
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-50
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2
 
This error occured on the vm yesterday yesterday evening
 

Attachments

  • Screenshot_20190503-081422.jpg
    Screenshot_20190503-081422.jpg
    463.6 KB · Views: 14
Please post your VM config.

> qm config VMID
 
root@pve:~# qm config 200
agent: 1
boot: cdn
bootdisk: virtio1
cores: 4
memory: 2048
name: UbuntuPlex
net0: virtio=C6:A3:8B:9A:23:BD,bridge=vmbr0,firewall=1 numa: 0 ostype: l26 scsihw: virtio-scsi-pci
smbios1: uuid=e0ea3f3e-4b9b-4407-9126-08686294cd86
sockets: 1
unused0: DS216j:200/vm-200-disk-0.qcow2
virtio1: local-lvm:vm-200-disk-0,size=32G
vmgenid: 67e1a0f3-f322-4339-b244-373daf6eaae7
 
net0: virtio=C6:A3:8B:9A:23:BD,bridge=vmbr0,firewall=1 numa: 0 ostype: l26 scsihw: virtio-scsi-pci
this should be in 4 separate lines? is this a copy + paste error?
 
This time a fedora server vm crashed, after running fine for ~2 days
 

Attachments

  • Unbenannt.PNG
    Unbenannt.PNG
    218.8 KB · Views: 8
do you have any entries in the hypervisor's logs? (`journalctl -r`, `dmesg` ) when those crashes happen?

* since it's not one particular guest or guest os which has the problems (and the stacktraces look as though they are independent of each other)I would suggest checking for potential hardware problems (e.g. run memtest86 on the host for a few passes)

hope this helps!
 
The same crash happend again on the fedora server vm. It had to be anywhere between 19:15 and 21:30, after running fine for ~7 hours.
I almost thought i fixed the problem because i disabled cool'n'quiet in bios (ryzen), but it seemed to not cause these issues.

I ran memtest ~1 month ago when i bought that factory new ddr4 ram, with no errors, but i can try again.


Syslog:


(i shorted some of the "Started/Staring Proxmox VE" duplicate lines because the message is too long)
 

Attachments

  • syslog.txt
    8.5 KB · Views: 3
No im not live migrating. Im currently using kvm64. Which cpu type would you recommend for ryzen? EPYC, host or kvm64?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!