Kernel panic

gosha

Well-Known Member
Oct 20, 2014
302
25
58
Russia
3-nodes Proxmox 4.0 cluster (HP DL380 Gen8)

pic1.png

I needed to stop one of the servers.
I migrate all VMs from this server to another server and press Shutdown button in GUI.
After server servicing, I turned it on. The server boots normally, but after a short time the server stopped responding (in GUI).
In the server's console (via iLO), I found "Kernel panic" (see pic):

pic2.png

I rebooted the server and then it worked fine.
I repeated it to another server, and also received a kernel panic after loading and normal work after re-reboot. :(
I repeated it to third server... the same kernel panic...

On Proxmox 3.x on the same servers such situations did not happen...

Why is this happening? This is new fencing? :confused:
 
Last edited:
# pveversion -v
proxmox-ve: 4.0-22 (running kernel: 4.2.3-2-pve)
pve-manager: 4.0-57 (running version: 4.0-57/cc7c2b53)
pve-kernel-4.2.3-2-pve: 4.2.3-22
lvm2: 2.02.116-pve1
corosync-pve: 2.3.5-1
libqb0: 0.17.2-1
pve-cluster: 4.0-24
qemu-server: 4.0-35
pve-firmware: 1.1-7
libpve-common-perl: 4.0-36
libpve-access-control: 4.0-9
libpve-storage-perl: 4.0-29
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-12
pve-container: 1.0-21
pve-firewall: 2.0-13
pve-ha-manager: 1.0-13
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.4-3
lxcfs: 0.10-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
fence-agents-pve: not correctly installed
^^^^^

Wow!!! :confused:
 
We uploaded new package to pve-no-subscription repository. Please can you test with new kernel.
 
I just installed the latest updates and reboot all servers:

pveversion -v
proxmox-ve: 4.1-25 (running kernel: 4.2.6-1-pve)
pve-manager: 4.0-64 (running version: 4.0-64/fc76ac6c)
pve-kernel-4.2.6-1-pve: 4.2.6-25
pve-kernel-4.2.3-2-pve: 4.2.3-22
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 0.17.2-1
pve-cluster: 4.0-29
qemu-server: 4.0-39
pve-firmware: 1.1-7
libpve-common-perl: 4.0-41
libpve-access-control: 4.0-10
libpve-storage-perl: 4.0-37
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-16
pve-container: 1.0-32
pve-firewall: 2.0-14
pve-ha-manager: 1.0-14
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-5
lxcfs: 0.13-pve1
cgmanager: 0.39-pve1
criu: 1.6.0-1
fence-agents-pve: not correctly installed


I tried shutdown and boot again one of the servers and "Kernel panic" did not happen.
But I see again "fence-agents-pve: not correctly installed" in last line. :(
This is normal?
 
I tried shutdown and boot again one of the servers and "Kernel panic" did not happen.

great

But I see again "fence-agents-pve: not correctly installed" in last line. :(
This is normal?

That is a useless warning, because the package is currently not used. Either ignore the
warning, or install the packages:

# apt-get install fence-agents-pve

I will fix that with the next release.
 
Dietmar, thanks!

I installed this package and...

# pveversion -v
....
fence-agents-pve: 4.0.20-1

---
Best Regards!
Gosha
 
Gosha, we had the same problem. Could you please confirm that latest version resolves it ?

Have you tried migrating VMs from one node to another using this latest version ? No kernel panic or reboots happened ?

Thanks!
H
 
Gosha, we had the same problem. Could you please confirm that latest version resolves it ?

Have you tried migrating VMs from one node to another using this latest version ? No kernel panic or reboots happened ?

Thanks!
H

Hi!

I tried to Shutdown and start again one server and one times only. And I did not get the kernel panic.
I will try to repeat these steps several times in the near future.

I just tried to online migrate VM (two times). Its Ok. See picture.

pic3.png

No kernel panic and reboots.

--
Best regards!
 
Last edited:
Hi!

Today, I shutdown one server (cn1) again.
And a few minutes later all running VMs on the remaining servers (cn2,cn3) are stopped:

pic1.png

All the VMs have been stopped by stop (not shutdown).
This is very bad...
:(


P.S.
I boot first server and few minutes later (after restoring ceph-storage) I repeated shutdown this server, and then the two remaining servers was rebooted...

pic2.png

Horror! :(
 

Attachments

  • pic1.png
    pic1.png
    152 KB · Views: 9
Last edited:
Third attempt.

I repeated. Boot first server and few minutes later (after restoring ceph-storage) I repeated shutdown this server.
Remaining servers was normal. No stop VMs and reboots.

Hm...
:confused:
As a result - the latest update solved the problem kernel panic but does not solve the problem restart the servers (may be watchdog incorrect working?).
 
Last edited:
Maybe you are just connected the dead node? A shutdown of one node does not stop VMs on other nodes.


May be was such situation - after shutdown cn1 I'm moving away for a while. If we assume that while I was away, the remaining two servers rebooted
and HA did not have time to run the VMs when I come back...
However, after my return, I began writing a message on a forum, take a screenshot, it took a lot of time ... HA was time to start all VMs...

While I could not come up with an explanation of this situation... :confused:
 
May be was such situation - after shutdown cn1 I'm moving away for a while. If we assume that while I was away, the remaining two servers rebooted
and HA did not have time to run the VMs when I come back...
However, after my return, I began writing a message on a forum, take a screenshot, it took a lot of time ... HA was time to start all VMs...

While I could not come up with an explanation of this situation... :confused:

In addition to the above...
May be after loading two servers HA do not have time to run the VMs due to recovery ceph-storage, which also takes some time (my OSD without SSD-journal...).
:confused:
 
Last edited:
You did not even mentioned that you use HA in the initial post, so I have no real idea what you are doing.

Sorry. I did not know that it matters...
I assumed that the cluster without HA does not make sense... for me exactly...
Really in case of stopping the server, I'll be forced to remain without automatically migrated VMs?
:(
 
Last edited:
Cluster without HA features:
- single management UI
- live/offline VM migration

To be honest, I see platform level HA as ancient and obsoleted. Automatically starting the VM somewhere else is pretty useless in most cases I've encountered (of course not all). Examples:

- Databases: you do app-level HA (master/slave, master/master or replica sets)
- Stateless apps: why wouldn't you start multiple instances from the beginning ?
- Stateful realtime apps (e.g. PBX): you will lose state (current calls) anyway, but you can start multiple instances from the beginning

Even more, having shared storage, why would you risk a crashed hypervisor to corrupt the single data instance between HA instances?

I may be wrong, of course, but that's my view on HA regarding apps I've touched along the years.
 
Cluster without HA features:
- single management UI
- live/offline VM migration

HA - one of the key features. In my last messages I described the problem most likely related to the fencing of the nodes. Fencing is an essential part for Proxmox VE HA.
I really did not mention about using HA, but I did not mention about use of cluster without HA. And I did not know that this would be a problem.
And it seems strange to me such a response - "so I have no real idea what you are doing." :(

 
Last edited:
And it seems strange to me such a response - "so I have no real idea what you are doing." :(

The the topic is "Kernel panic" - I guess that problem is solved?

I suggest you open a new topic for the HA related problem, describing exactly what you do, what behaviour you expect, and what you think is the bug.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!