Big error with 2.6.32-19-pve on HP server

amace

Renowned Member
Dec 17, 2012
24
3
68
Armpit, Hell
Hello, we updated our DL380 G7 to Proxmox version 2.3 over the weekend and have had to back down to pve-16 due to problems with high load average and random kernel panics. The big killer seems to be intel-iommu though there are other oops logged.

Mar 26 18:48:59 proxmox kernel: ------------[ cut here ]------------
Mar 26 18:48:59 proxmox kernel: WARNING: at drivers/pci/intel-iommu.c:2775 intel_unmap_page+0x15f/0x180() (Not tainted)
Mar 26 18:48:59 proxmox kernel: Hardware name: ProLiant DL380 G7
Mar 26 18:48:59 proxmox kernel: Driver unmaps unmatched page at PFN 0
Mar 26 18:48:59 proxmox kernel: Modules linked in: radeon ttm drm_kms_helper drm shpchp snd_pcsp i2c_algo_bit serio_raw i2c_core snd_pcm snd_timer i7core_edac edac_core hpwdt hpilo tpm_tis snd soundcore tpm tpm_bios power_meter snd_page_alloc ext3 jbd mbcache sg ata_generic pata_acpi ata_piix bnx2 e1000e hpsa [last unloaded: scsi_wait_scan]
Mar 26 18:48:59 proxmox kernel: Pid: 0, comm: swapper veid: 0 Not tainted 2.6.32-19-pve #1
Mar 26 18:48:59 proxmox kernel: Call Trace:
Mar 26 18:48:59 proxmox kernel: <IRQ> [<ffffffff8106d6c8>] ? warn_slowpath_common+0x88/0xc0
Mar 26 18:48:59 proxmox kernel: [<ffffffff8106d7b6>] ? warn_slowpath_fmt+0x46/0x50
Mar 26 18:48:59 proxmox kernel: [<ffffffff812a7c1b>] ? find_iova+0x5b/0x90
Mar 26 18:48:59 proxmox kernel: [<ffffffff812abe5f>] ? intel_unmap_page+0x15f/0x180
Mar 26 18:48:59 proxmox kernel: [<ffffffffa0076a65>] ? bnx2_poll_work+0x155/0x11d0 [bnx2]
Mar 26 18:48:59 proxmox kernel: [<ffffffff810eb300>] ? handle_IRQ_event+0x60/0x170
Mar 26 18:48:59 proxmox kernel: [<ffffffff810ed9e8>] ? handle_edge_irq+0x98/0x180
Mar 26 18:48:59 proxmox kernel: [<ffffffff8111dd86>] ? group_sched_in+0x26/0x170
Mar 26 18:48:59 proxmox kernel: [<ffffffffa0077b1d>] ? bnx2_poll_msix+0x3d/0xd0 [bnx2]
Mar 26 18:48:59 proxmox kernel: [<ffffffff81458f83>] ? net_rx_action+0x103/0x2f0
Mar 26 18:48:59 proxmox kernel: [<ffffffff81076573>] ? __do_softirq+0x103/0x260
Mar 26 18:48:59 proxmox kernel: [<ffffffff8100c2ac>] ? call_softirq+0x1c/0x30
Mar 26 18:48:59 proxmox kernel: [<ffffffff8100def5>] ? do_softirq+0x65/0xa0
Mar 26 18:48:59 proxmox kernel: [<ffffffff8107639d>] ? irq_exit+0xcd/0xd0
Mar 26 18:48:59 proxmox kernel: [<ffffffff81526545>] ? do_IRQ+0x75/0xf0
Mar 26 18:48:59 proxmox kernel: [<ffffffff8100ba93>] ? ret_from_intr+0x0/0x11
Mar 26 18:48:59 proxmox kernel: <EOI> [<ffffffff812d1dbe>] ? intel_idle+0xde/0x170
Mar 26 18:48:59 proxmox kernel: [<ffffffff812d1da1>] ? intel_idle+0xc1/0x170
Mar 26 18:48:59 proxmox kernel: [<ffffffff8109e52d>] ? sched_clock_cpu+0xcd/0x110
Mar 26 18:48:59 proxmox kernel: [<ffffffff81421827>] ? cpuidle_idle_call+0xa7/0x140
Mar 26 18:48:59 proxmox kernel: [<ffffffff8100a023>] ? cpu_idle+0xb3/0x110
Mar 26 18:48:59 proxmox kernel: [<ffffffff81505555>] ? rest_init+0x85/0x90
Mar 26 18:48:59 proxmox kernel: [<ffffffff81c2ef6e>] ? start_kernel+0x412/0x41e
Mar 26 18:48:59 proxmox kernel: [<ffffffff81c2e33a>] ? x86_64_start_reservations+0x125/0x129
Mar 26 18:48:59 proxmox kernel: [<ffffffff81c2e438>] ? x86_64_start_kernel+0xfa/0x109
Mar 26 18:48:59 proxmox kernel: ---[ end trace 83e11cbc4ff8ba9c ]---

Steps we have taken:

- firmware updates (BIOS, RAID)
- Compiled in the newest hpsa driver (3.2.0-3)
- disabled edac_core and i7core_edac

The changes did not help so we're running 2.6.32-16-pve for now.

Has anyone else seen this error? Are there any fixes available?

- - - Updated - - -

root@proxmox:~# pveversion -v
pve-manager: 2.3-13 (pve-manager/2.3/7946f1f1)
running kernel: 2.6.32-16-pve
proxmox-ve-2.6.32: 2.3-93
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-16-pve: 2.6.32-82
pve-kernel-2.6.32-19-pve: 2.6.32-93
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-4
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-36
qemu-server: 2.3-18
pve-firmware: 1.0-21
libpve-common-perl: 1.0-49
libpve-access-control: 1.0-26
libpve-storage-perl: 2.3-6
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.4-8
ksm-control-daemon: 1.1-1
 
We wound up having some problems on pve-16. Removing intel_iommu=on from /etc/default/grub cleared it up. now running smoothly on pve-16. Cheers!
 
In case you are wondering how to reboot to older kernel here is how I did it.
I changed the default value in Grub to 2 (the 2nd option in grub menu which was 2.6.32-17-pve kernel. Notice the first is always 0. Not 1.)

vi /etc/default/grub

GRUB_DEFAULT=2

Then update grub with this command.
update-grub
The pve-17 kernel will be used after the next reboot.




You can check the order of your kernel listing with this command.

less /boot/grub/grub.cfg

With our servers it was like this.
(Ignore the code on top of the grub.cfg file.)
### BEGIN /etc/grub.d/10_linux ###menuentry 'Proxmox Virtual Environment GNU/Linux, with Linux 2.6.32-19-pve' --class proxmox --class gnu-linux --class gnu --class os {
insmod part_msdos
insmod ext2
set root='(hd0,msdos1)'
search --no-floppy --fs-uuid --set 60faf10c-6b1c-4247-bb4a-c21efedbb59a
echo 'Loading Linux 2.6.32-19-pve ...'
linux /vmlinuz-2.6.32-19-pve root=/dev/mapper/pve-root ro quiet
echo 'Loading initial ramdisk ...'
initrd /initrd.img-2.6.32-19-pve
}
menuentry 'Proxmox Virtual Environment GNU/Linux, with Linux 2.6.32-18-pve' --class proxmox --class gnu-linux --class gnu --class os {
insmod part_msdos
insmod ext2
set root='(hd0,msdos1)'
search --no-floppy --fs-uuid --set 60faf10c-6b1c-4247-bb4a-c21efedbb59a
echo 'Loading Linux 2.6.32-18-pve ...'
linux /vmlinuz-2.6.32-18-pve root=/dev/mapper/pve-root ro quiet
echo 'Loading initial ramdisk ...'
initrd /initrd.img-2.6.32-18-pve
}
menuentry 'Proxmox Virtual Environment GNU/Linux, with Linux 2.6.32-17-pve' --class proxmox --class gnu-linux --class gnu --class os {
insmod part_msdos
insmod ext2
set root='(hd0,msdos1)'
search --no-floppy --fs-uuid --set 60faf10c-6b1c-4247-bb4a-c21efedbb59a
echo 'Loading Linux 2.6.32-17-pve ...'
linux /vmlinuz-2.6.32-17-pve root=/dev/mapper/pve-root ro quiet
echo 'Loading initial ramdisk ...'
initrd /initrd.img-2.6.32-17-pve
 
Last edited: