random kernel panics, possibly caused by e1000e module

olaszfiu · Feb 23, 2015

Hello,
I'm experiencing random kernel panics on a 6-nodes Proxmox VE cluster, recently updated to 3.4 (but they also used to occur with version 3.3).
My current PVE config is:

# pveversion -v
proxmox-ve-2.6.32: 3.3-147 (running kernel: 2.6.32-37-pve)
pve-manager: 3.4-1 (running version: 3.4-1/3f2d890e)
pve-kernel-2.6.32-32-pve: 2.6.32-136
pve-kernel-2.6.32-37-pve: 2.6.32-147
pve-kernel-2.6.32-34-pve: 2.6.32-140
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-2
pve-cluster: 3.0-16
qemu-server: 3.3-20
pve-firmware: 1.1-3
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-31
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-12
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1

I managed to capture the screenshot of one of those kernel panics (see attached netconsole.zip).
It seems to me they occur during an interupt handling caused by the e1000e module.

Each node has:

- 2x Intel 82574L NICs (module: e1000e).
The first NIC is unused; the second one is bridged on vmbr0, where several VMs are attached, each using a different VLAN tag.
The VMs virtual disks are hosted on a gluster replicated volume.

- 2x Intel 82576 NICs (module: igb).
Both NICs are bonded to bond0.
This is a dedicated interface used for client-server gluster communications.

Unfortunately, the kernel panics are not easily reproducible.
Sometimes they happen on boot, few minutes after all VMs have started.
A few times they happened during VM live migration.
Sometimes they also occurred when I installed the virtio net driver inside a Windows 8.1 VM.

Do you have any clue what the problem could be?

Thanks,
Rosario

bigfishinnet · Feb 23, 2015

olaszfiu said:
Hello,
I'm experiencing random kernel panics on a 6-nodes Proxmox VE cluster, recently updated to 3.4 (but they also used to occur with version 3.3).
My current PVE config is:

<snip>

Do you have any clue what the problem could be?

Thanks,
Rosario

I had issues with using e1000 network drivers in KVM Windows VM's they would just freeze. Only noticed it on 3.3 and 3.4 changing the drivers to Virtio (network and disk) worked for me (running 3.4).

Stephen

olaszfiu · Feb 23, 2015

Thanks Stephen,
but unfortunately the kernel panics occur on the hypervisors (which have real Intel network cards). The VMs use virtio drivers and they work fine.
Cheers, Rosario

bigfishinnet · Feb 24, 2015

olaszfiu said:
Thanks Stephen,
but unfortunately the kernel panics occur on the hypervisors (which have real Intel network cards). The VMs use virtio drivers and they work fine.
Cheers, Rosario

OK. Maybe this thread (its old) will help or generate some ideas for a solution?

http://forum.proxmox.com/threads/906-e1000e-driver-update-query-on-recommended-route

Stephen

olaszfiu · Feb 24, 2015

Thanks Stephen, but the info in that thread don't seem to be related to my case.

As a further test, today I tried to move the vmbr0 bridge on top of an Intel 82576 NIC (instead of the Intel 82574L NIC) and after a while, when all the VMs were started, I got the same kind of kernel panic (see attached: netconsole-igb.zip).
This time the "Fatal exception in interrupt" seems to be related to the "igb" kernel module.

This is getting really complicated to debug...
Could any Proxmox VE developers have a look at this, please ?

Thanks, Rosario

olaszfiu · Feb 25, 2015

It looks like disabling the GRO on the bridged NIC (eth2) has mitigated the problem.
I added in /etc/network/interfaces:

pre-up /sbin/ethtool -K eth2 gro off

No kernel panics observed since yesterday.

Search

Search

random kernel panics, possibly caused by e1000e module

olaszfiu

Renowned Member

Attachments

bigfishinnet

Member

olaszfiu

Renowned Member

bigfishinnet

Member

olaszfiu

Renowned Member

Attachments

olaszfiu

Renowned Member

We value your privacy