Hello,
I'm experiencing random kernel panics on a 6-nodes Proxmox VE cluster, recently updated to 3.4 (but they also used to occur with version 3.3).
My current PVE config is:
# pveversion -v
proxmox-ve-2.6.32: 3.3-147 (running kernel: 2.6.32-37-pve)
pve-manager: 3.4-1 (running version: 3.4-1/3f2d890e)
pve-kernel-2.6.32-32-pve: 2.6.32-136
pve-kernel-2.6.32-37-pve: 2.6.32-147
pve-kernel-2.6.32-34-pve: 2.6.32-140
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-2
pve-cluster: 3.0-16
qemu-server: 3.3-20
pve-firmware: 1.1-3
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-31
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-12
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
I managed to capture the screenshot of one of those kernel panics (see attached netconsole.zip).
It seems to me they occur during an interupt handling caused by the e1000e module.
Each node has:
- 2x Intel 82574L NICs (module: e1000e).
The first NIC is unused; the second one is bridged on vmbr0, where several VMs are attached, each using a different VLAN tag.
The VMs virtual disks are hosted on a gluster replicated volume.
- 2x Intel 82576 NICs (module: igb).
Both NICs are bonded to bond0.
This is a dedicated interface used for client-server gluster communications.
Unfortunately, the kernel panics are not easily reproducible.
Sometimes they happen on boot, few minutes after all VMs have started.
A few times they happened during VM live migration.
Sometimes they also occurred when I installed the virtio net driver inside a Windows 8.1 VM.
Do you have any clue what the problem could be?
Thanks,
Rosario
I'm experiencing random kernel panics on a 6-nodes Proxmox VE cluster, recently updated to 3.4 (but they also used to occur with version 3.3).
My current PVE config is:
# pveversion -v
proxmox-ve-2.6.32: 3.3-147 (running kernel: 2.6.32-37-pve)
pve-manager: 3.4-1 (running version: 3.4-1/3f2d890e)
pve-kernel-2.6.32-32-pve: 2.6.32-136
pve-kernel-2.6.32-37-pve: 2.6.32-147
pve-kernel-2.6.32-34-pve: 2.6.32-140
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-2
pve-cluster: 3.0-16
qemu-server: 3.3-20
pve-firmware: 1.1-3
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-31
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-12
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
I managed to capture the screenshot of one of those kernel panics (see attached netconsole.zip).
It seems to me they occur during an interupt handling caused by the e1000e module.
Each node has:
- 2x Intel 82574L NICs (module: e1000e).
The first NIC is unused; the second one is bridged on vmbr0, where several VMs are attached, each using a different VLAN tag.
The VMs virtual disks are hosted on a gluster replicated volume.
- 2x Intel 82576 NICs (module: igb).
Both NICs are bonded to bond0.
This is a dedicated interface used for client-server gluster communications.
Unfortunately, the kernel panics are not easily reproducible.
Sometimes they happen on boot, few minutes after all VMs have started.
A few times they happened during VM live migration.
Sometimes they also occurred when I installed the virtio net driver inside a Windows 8.1 VM.
Do you have any clue what the problem could be?
Thanks,
Rosario