VM go down, no logs, no error message in console

Sebastian2000

Member
Oct 31, 2017
80
1
8
41
Hello

I have 5 VMs on an bi xeon ES5-2650 with 64G of RAM and RAID10 HW.
One of them go down today, when I log into our proxmox, VM have state UP, but console still in black (no error, no message). I do an VM reset => no working, I do an stop, and then an start, and the VM go UP again. Have no recieve any error message during the boot of VM.
The only that I have seen, is an very hight usage of CPUs on this VM in proxmox graph, about 90% when normally it's about 10% (6 cores for each VM as server have 32 cores), but I don't found any log of real problem (no strange things in message log VM).
Is there something that can help us to detect the problem? I suppose it's an cpu freeze in this VM (other 4 VMs still working without any problem, and also proxmox host) but would know the exact message to see if there is some solution...

Thanks for your help!
 

Sebastian2000

Member
Oct 31, 2017
80
1
8
41
Today I have again this same problem with another VM in another node :-(.
I have tried to change cpu type from "host" to "Default (kvm64)" and ad some more RAM to each VM (from 10G to 11,7G). I don't thinks that is an RAM problem, I am also not sure that cpu change will solve the problem as I have absolutely no detail about problem, just know that when the VM go down, the cpu is about 90%/100% of use...

Please, if someone have similar experience, and know what log to see, I appreciate so much his help....

Thanks
 

Sebastian2000

Member
Oct 31, 2017
80
1
8
41
Still investigate the problem... As I have 5 servers of 20 with a few filesystem errors (I have move disk of someone and have few menor error), I try to reboot and repair one... On reboot it, CPU and RAM gone at 100% and KVM blocked... I had to do again an stop and an start, then I repair filesystem (fsck) and server reboot normaly.

I really don't understand the problem and if it's an bug about the last version of proxmox (pve-manager/5.1-35/722cc488 (running kernel: 4.13.4-1-pve))... Almost today, the failure have been after an voluntary reboot, I will wait to see if the bug appear again without any action, if this appear again... I will have to try to downgrade to proxmox 4.4 :(.

I have read this : https://forum.proxmox.com/threads/vm-centos7-process-sporadically-ended-without-helping-logs.34691/ it's seem similar problem...
Also https://forum.proxmox.com/threads/virtual-machines-kernel-panic-with-proxmox-5-1.38145/
but not have an real solution for now, all server are cloudlinux7 that worked fine before on proxmox 4.4....

No one more with similar problem?
 

fireon

Famous Member
Oct 25, 2010
3,850
314
103
40
Austria/Graz
iteas.at
Had similar problem here on all Ubuntuvms and one Centos. The see that because there are all autoupdates and autoreboot. And afert reboot, same errorpicture. But this was .... i think 2 or 3 Weeks ago. And now it is gone away. The problem was really every day. Here my Thread.
 

Sebastian2000

Member
Oct 31, 2017
80
1
8
41
Had similar problem here on all Ubuntuvms and one Centos. The see that because there are all autoupdates and autoreboot. And afert reboot, same errorpicture. But this was .... i think 2 or 3 Weeks ago. And now it is gone away. The problem was really every day. Here my Thread.
Well, almost someone else that have similar problem!
What I have not seen, is what you have do to solve it...
 

speedbird

Member
Nov 3, 2017
45
5
13
Having the same problem, just chiming in here and waiting for a fix or solution. Sometimes my Ubuntu LTS VMs just get stuck on reboot with 100% load and nothing but a power down helps. No reset possible.
 

Sebastian2000

Member
Oct 31, 2017
80
1
8
41
Having the same problem, just chiming in here and waiting for a fix or solution. Sometimes my Ubuntu LTS VMs just get stuck on reboot with 100% load and nothing but a power down helps. No reset possible.

We are a lot of members yet with problem, it's posible that proxmox team investigate the cause of this?
 

tom

Proxmox Staff Member
Staff member
Aug 29, 2006
15,486
887
163
We are a lot of members yet with problem, it's posible that proxmox team investigate the cause of this?

You need to isolate the issue, then you can report the bug and we can fix it.

Problems can be on several places, so you have to find a test case so that a developer can see the same on the testlab.
We can also help by directly checking your setup with SSH remote login (as long as you got the needed support subscriptions).
 

Sebastian2000

Member
Oct 31, 2017
80
1
8
41
You need to isolate the issue, then you can report the bug and we can fix it.

Problems can be on several places, so you have to find a test case so that a developer can see the same on the testlab.
We can also help by directly checking your setup with SSH remote login (as long as you got the needed support subscriptions).

Well, in our case, it's on an dell C6220 with 2 cpus E5-2650 in each 4 nodes that have enclosure, 64G of RAM in each nodes, 6 disks SSD kingston DC400 in RAID10 hardware, setup in LVM, proxmox 5.1 fresh install about 2 weeks.

We have 5 or 4 VPS on each one, with cloudlinux/plesk, no more special things...

The first days with this setup, I recieved downtime of 2 VMs, when I logs in proxmox host, all other 4 one that working on node have no problem, and when I open console of the one that was downtime, only an black screen with nothing, no logs, no error message, just seeing the CPU an RAM al 100% in proxmox graph summary.
I change the type of CPUs in each 19 VMs (from "host" to "default cpu") and also add 1,5G of RAM in each one (from 10 to 11,5G) and have no recieved more downtime for now (1 week).

But in some try to reboot server, same problem as all other members, cpu and ram at 100%, black console screen, reset don't work, only solution is an "stop" and "start" action.

I sure that our first problem have relation with this second problem as in both case, have no message in console, SSH are not accesible to virtual, proxmox graph cpu and ram at 100% in VPS summary and reset function don't works.
 

tom

Proxmox Staff Member
Staff member
Aug 29, 2006
15,486
887
163
please also send:
  1. pveversion -v
  2. qm config VMID
 

Sebastian2000

Member
Oct 31, 2017
80
1
8
41
Of course :

# pveversion -v
proxmox-ve: 5.1-25 (running kernel: 4.13.4-1-pve)
pve-manager: 5.1-35 (running version: 5.1-35/722cc488)
pve-kernel-4.13.4-1-pve: 4.13.4-25
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-15
qemu-server: 5.0-17
pve-firmware: 2.0-3
libpve-common-perl: 5.0-20
libpve-guest-common-perl: 2.0-13
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-16
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-2
pve-container: 2.0-17
pve-firewall: 3.0-3
pve-ha-manager: 2.0-3
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.0-2
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.2-pve1~bpo90
-------------------------------------------------------------------------------------------




qm config 118
boot: cdn
bootdisk: scsi0
cores: 6
ide2: none,media=cdrom
keyboard: es
memory: 12000
name: 018.xxxxxx.com
net0: virtio=F6:E2:38:0D:76:49,bridge=vmbr0
numa: 0
ostype: l26
scsi0: local-lvm:vm-118-disk-2,cache=writethrough,size=200G
scsi1: local-lvm:vm-118-disk-1,cache=writethrough,size=1G
scsihw: virtio-scsi-pci
smbios1: uuid=5fa4fa83-a2eb-4575-91bd-ab8d5a4dfa06
sockets: 1


Thanks for your help!
 

tom

Proxmox Staff Member
Staff member
Aug 29, 2006
15,486
887
163
try our latest 4.10 kernel (instead of latest 4.13), some users/hardware reportet issues with 4.13.
 

speedbird

Member
Nov 3, 2017
45
5
13
Alright, I tried the downgrade to 4.10 but it didn't help at all. The problem is, that there is no way to reliably reproduce the problem. Sometimes the machine just reboots as intended, sometimes it just hangs with 100% cpu load and a black console screen, no log output, no console output, no nothing to really investigate anything at all.

This makes it really hard for support to create a setup where this can be reproduced.

My hardware config is a Xeon E3-1275v6, 64GB ECC RAM, 4x480GB SSD RAID10... running latest Proxmox 5.1 with all patches applied.

There's really got to be a fix because it is absolutely annoying having to forcefully shutdown and start the vm from the webgui again when we just wanted to do a quick reboot via ssh and wonder why the machine doesn't come back again and times out. Looking on the host you then see the 100% load and nothing else than a black screen.
 

Sebastian2000

Member
Oct 31, 2017
80
1
8
41
Alright, I tried the downgrade to 4.10 but it didn't help at all. The problem is, that there is no way to reliably reproduce the problem. Sometimes the machine just reboots as intended, sometimes it just hangs with 100% cpu load and a black console screen, no log output, no console output, no nothing to really investigate anything at all.

This makes it really hard for support to create a setup where this can be reproduced.

My hardware config is a Xeon E3-1275v6, 64GB ECC RAM, 4x480GB SSD RAID10... running latest Proxmox 5.1 with all patches applied.

There's really got to be a fix because it is absolutely annoying having to forcefully shutdown and start the vm from the webgui again when we just wanted to do a quick reboot via ssh and wonder why the machine doesn't come back again and times out. Looking on the host you then see the 100% load and nothing else than a black screen.

What a bad news... so the kernel is not the cause of problem :-(
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!