[SOLVED] PVE occasionally crashes upon starting a VM

ErikL

New Member
Dec 5, 2017
4
0
1
39
Problem
When starting a VM the host occasionally crashes.
/var/log/syslog and /var/log/daemon do not seem to show suspicious activities.
We have tried reinstalling the host once—which yielded no improvement.
The next thing we will try is checking for hardware problems.

Has anybody else experienced this or has a fix? If you need additional information please let me know.

Thank you in advance!
Erik


General information

Code:
#pveversion
pve-manager/5.1-36/131401db (running kernel: 4.13.8-2-pve)

Hardware:
  • Intel Xeon E3-1275 v5 (4 Core + HT)
  • 64GiB DDR4 ECC RAM
  • 2 x 480 GB SATA 6 Gb/s SSD Data Center Series
Setup:
  • Enterprise Repositories
  • Installation Up-To-Date
  • Installation via ISO or Upgraded Debian (tried both)
  • / is on a mirrored btrfs
  • VMs are on a mirrored ZFS
  • No LVM
Configuration VMs:
  • 2x HDD (Swap and root) both with 'discard' enabled
  • Fix-sized RAM
 
Last edited:
When starting a VM the host occasionally crashes.


i.e. performs a restart?

/var/log/syslog and /var/log/daemon do not seem to show suspicious activities.

However - to post what they show about the respective time period may help for better understanding.

We have tried reinstalling the host once—which yielded no improvement.
The next thing we will try is checking for hardware problems.

Also possible - not necessarily a direct relation to starting a VM: when starting kvm (or any other resource consuming application) an unstable hardware (respectively unstable power supply) may cause as immediate reset.

Has anybody else experienced this or has a fix? If you need additional information please let me know.

Yes: it was an obviously unstable mainboard which has resets as mentioned above when connected to a bad quality power supply circuit. Other servers at this circuit had no problems as well as the respective server hat no problems when connected to another power supply circuit.
 
Thanks for your reply!

The problem persists.
The system has all the latest updates applied to it (up to today).

So far I have had two crashes today (after multiple days w/o):

1. Upon starting a VM during the reboot after switching off balooning for RAM. Beforehand I restarted the same VM multiple times without changing its configuration and w/o the host crashing. Also I was not able to reproduce the crash afterwards.

2. With no input from me and without rebooting a VM.​

-> 2. leads to the conclusion that the problem is not restricted to rebooting VMs although it mostly occurs during reboots (around 95%) or within a few hours after (around 5%).


More answers are inline,
thanks again.



i.e. performs a restart?
>> Exactly that.

However - to post what they show about the respective time period may help for better understanding.
>> See both attachments: daemon and syslog.

Also possible - not necessarily a direct relation to starting a VM: when starting kvm (or any other resource consuming application) an unstable hardware (respectively unstable power supply) may cause as immediate reset.
>> A hardware check by the provider yielded no errors.


daemon.png syslog.png
 
Last edited:
-> 2. leads to the conclusion that the problem is not restricted to rebooting VMs although it mostly occurs during reboots (around 95%) or within a few hours after (around 5%).

Sounds like the problem I mentioned in my previous post:

it was an obviously unstable mainboard which has resets as mentioned above when connected to a bad quality power supply circuit. Other servers at this circuit had no problems as well as the respective server hat no problems when connected to another power supply circuit.

That time it was exactly the same: as long as there was no (or less) server load it never happened. As soon I started VMs I had sometimes the problem within a couple of minutes - sometimes after hous or the other day, but latest within a week ...



>> See both attachments: daemon and syslog.

Even they are not readable well - I cannot see any specific event before the reboot.

>> A hardware check by the provider yielded no errors.

That does not automatically mean that there is no error. Recommendation: run it (temporarily) at different hardware.
 
After the system crashed even in rescue mode w/o PVE running I concluded that it must be a hardware issue and had the server replaced. This solved the problem.

Thanks for the answers!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!