Hello, we updated our DL380 G7 to Proxmox version 2.3 over the weekend and have had to back down to pve-16 due to problems with high load average and random kernel panics. The big killer seems to be intel-iommu though there are other oops logged.
Mar 26 18:48:59 proxmox kernel: ------------[ cut here ]------------
Mar 26 18:48:59 proxmox kernel: WARNING: at drivers/pci/intel-iommu.c:2775 intel_unmap_page+0x15f/0x180() (Not tainted)
Mar 26 18:48:59 proxmox kernel: Hardware name: ProLiant DL380 G7
Mar 26 18:48:59 proxmox kernel: Driver unmaps unmatched page at PFN 0
Mar 26 18:48:59 proxmox kernel: Modules linked in: radeon ttm drm_kms_helper drm shpchp snd_pcsp i2c_algo_bit serio_raw i2c_core snd_pcm snd_timer i7core_edac edac_core hpwdt hpilo tpm_tis snd soundcore tpm tpm_bios power_meter snd_page_alloc ext3 jbd mbcache sg ata_generic pata_acpi ata_piix bnx2 e1000e hpsa [last unloaded: scsi_wait_scan]
Mar 26 18:48:59 proxmox kernel: Pid: 0, comm: swapper veid: 0 Not tainted 2.6.32-19-pve #1
Mar 26 18:48:59 proxmox kernel: Call Trace:
Mar 26 18:48:59 proxmox kernel: <IRQ> [<ffffffff8106d6c8>] ? warn_slowpath_common+0x88/0xc0
Mar 26 18:48:59 proxmox kernel: [<ffffffff8106d7b6>] ? warn_slowpath_fmt+0x46/0x50
Mar 26 18:48:59 proxmox kernel: [<ffffffff812a7c1b>] ? find_iova+0x5b/0x90
Mar 26 18:48:59 proxmox kernel: [<ffffffff812abe5f>] ? intel_unmap_page+0x15f/0x180
Mar 26 18:48:59 proxmox kernel: [<ffffffffa0076a65>] ? bnx2_poll_work+0x155/0x11d0 [bnx2]
Mar 26 18:48:59 proxmox kernel: [<ffffffff810eb300>] ? handle_IRQ_event+0x60/0x170
Mar 26 18:48:59 proxmox kernel: [<ffffffff810ed9e8>] ? handle_edge_irq+0x98/0x180
Mar 26 18:48:59 proxmox kernel: [<ffffffff8111dd86>] ? group_sched_in+0x26/0x170
Mar 26 18:48:59 proxmox kernel: [<ffffffffa0077b1d>] ? bnx2_poll_msix+0x3d/0xd0 [bnx2]
Mar 26 18:48:59 proxmox kernel: [<ffffffff81458f83>] ? net_rx_action+0x103/0x2f0
Mar 26 18:48:59 proxmox kernel: [<ffffffff81076573>] ? __do_softirq+0x103/0x260
Mar 26 18:48:59 proxmox kernel: [<ffffffff8100c2ac>] ? call_softirq+0x1c/0x30
Mar 26 18:48:59 proxmox kernel: [<ffffffff8100def5>] ? do_softirq+0x65/0xa0
Mar 26 18:48:59 proxmox kernel: [<ffffffff8107639d>] ? irq_exit+0xcd/0xd0
Mar 26 18:48:59 proxmox kernel: [<ffffffff81526545>] ? do_IRQ+0x75/0xf0
Mar 26 18:48:59 proxmox kernel: [<ffffffff8100ba93>] ? ret_from_intr+0x0/0x11
Mar 26 18:48:59 proxmox kernel: <EOI> [<ffffffff812d1dbe>] ? intel_idle+0xde/0x170
Mar 26 18:48:59 proxmox kernel: [<ffffffff812d1da1>] ? intel_idle+0xc1/0x170
Mar 26 18:48:59 proxmox kernel: [<ffffffff8109e52d>] ? sched_clock_cpu+0xcd/0x110
Mar 26 18:48:59 proxmox kernel: [<ffffffff81421827>] ? cpuidle_idle_call+0xa7/0x140
Mar 26 18:48:59 proxmox kernel: [<ffffffff8100a023>] ? cpu_idle+0xb3/0x110
Mar 26 18:48:59 proxmox kernel: [<ffffffff81505555>] ? rest_init+0x85/0x90
Mar 26 18:48:59 proxmox kernel: [<ffffffff81c2ef6e>] ? start_kernel+0x412/0x41e
Mar 26 18:48:59 proxmox kernel: [<ffffffff81c2e33a>] ? x86_64_start_reservations+0x125/0x129
Mar 26 18:48:59 proxmox kernel: [<ffffffff81c2e438>] ? x86_64_start_kernel+0xfa/0x109
Mar 26 18:48:59 proxmox kernel: ---[ end trace 83e11cbc4ff8ba9c ]---
Steps we have taken:
- firmware updates (BIOS, RAID)
- Compiled in the newest hpsa driver (3.2.0-3)
- disabled edac_core and i7core_edac
The changes did not help so we're running 2.6.32-16-pve for now.
Has anyone else seen this error? Are there any fixes available?
- - - Updated - - -
root@proxmox:~# pveversion -v
pve-manager: 2.3-13 (pve-manager/2.3/7946f1f1)
running kernel: 2.6.32-16-pve
proxmox-ve-2.6.32: 2.3-93
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-16-pve: 2.6.32-82
pve-kernel-2.6.32-19-pve: 2.6.32-93
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-4
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-36
qemu-server: 2.3-18
pve-firmware: 1.0-21
libpve-common-perl: 1.0-49
libpve-access-control: 1.0-26
libpve-storage-perl: 2.3-6
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.4-8
ksm-control-daemon: 1.1-1
Mar 26 18:48:59 proxmox kernel: ------------[ cut here ]------------
Mar 26 18:48:59 proxmox kernel: WARNING: at drivers/pci/intel-iommu.c:2775 intel_unmap_page+0x15f/0x180() (Not tainted)
Mar 26 18:48:59 proxmox kernel: Hardware name: ProLiant DL380 G7
Mar 26 18:48:59 proxmox kernel: Driver unmaps unmatched page at PFN 0
Mar 26 18:48:59 proxmox kernel: Modules linked in: radeon ttm drm_kms_helper drm shpchp snd_pcsp i2c_algo_bit serio_raw i2c_core snd_pcm snd_timer i7core_edac edac_core hpwdt hpilo tpm_tis snd soundcore tpm tpm_bios power_meter snd_page_alloc ext3 jbd mbcache sg ata_generic pata_acpi ata_piix bnx2 e1000e hpsa [last unloaded: scsi_wait_scan]
Mar 26 18:48:59 proxmox kernel: Pid: 0, comm: swapper veid: 0 Not tainted 2.6.32-19-pve #1
Mar 26 18:48:59 proxmox kernel: Call Trace:
Mar 26 18:48:59 proxmox kernel: <IRQ> [<ffffffff8106d6c8>] ? warn_slowpath_common+0x88/0xc0
Mar 26 18:48:59 proxmox kernel: [<ffffffff8106d7b6>] ? warn_slowpath_fmt+0x46/0x50
Mar 26 18:48:59 proxmox kernel: [<ffffffff812a7c1b>] ? find_iova+0x5b/0x90
Mar 26 18:48:59 proxmox kernel: [<ffffffff812abe5f>] ? intel_unmap_page+0x15f/0x180
Mar 26 18:48:59 proxmox kernel: [<ffffffffa0076a65>] ? bnx2_poll_work+0x155/0x11d0 [bnx2]
Mar 26 18:48:59 proxmox kernel: [<ffffffff810eb300>] ? handle_IRQ_event+0x60/0x170
Mar 26 18:48:59 proxmox kernel: [<ffffffff810ed9e8>] ? handle_edge_irq+0x98/0x180
Mar 26 18:48:59 proxmox kernel: [<ffffffff8111dd86>] ? group_sched_in+0x26/0x170
Mar 26 18:48:59 proxmox kernel: [<ffffffffa0077b1d>] ? bnx2_poll_msix+0x3d/0xd0 [bnx2]
Mar 26 18:48:59 proxmox kernel: [<ffffffff81458f83>] ? net_rx_action+0x103/0x2f0
Mar 26 18:48:59 proxmox kernel: [<ffffffff81076573>] ? __do_softirq+0x103/0x260
Mar 26 18:48:59 proxmox kernel: [<ffffffff8100c2ac>] ? call_softirq+0x1c/0x30
Mar 26 18:48:59 proxmox kernel: [<ffffffff8100def5>] ? do_softirq+0x65/0xa0
Mar 26 18:48:59 proxmox kernel: [<ffffffff8107639d>] ? irq_exit+0xcd/0xd0
Mar 26 18:48:59 proxmox kernel: [<ffffffff81526545>] ? do_IRQ+0x75/0xf0
Mar 26 18:48:59 proxmox kernel: [<ffffffff8100ba93>] ? ret_from_intr+0x0/0x11
Mar 26 18:48:59 proxmox kernel: <EOI> [<ffffffff812d1dbe>] ? intel_idle+0xde/0x170
Mar 26 18:48:59 proxmox kernel: [<ffffffff812d1da1>] ? intel_idle+0xc1/0x170
Mar 26 18:48:59 proxmox kernel: [<ffffffff8109e52d>] ? sched_clock_cpu+0xcd/0x110
Mar 26 18:48:59 proxmox kernel: [<ffffffff81421827>] ? cpuidle_idle_call+0xa7/0x140
Mar 26 18:48:59 proxmox kernel: [<ffffffff8100a023>] ? cpu_idle+0xb3/0x110
Mar 26 18:48:59 proxmox kernel: [<ffffffff81505555>] ? rest_init+0x85/0x90
Mar 26 18:48:59 proxmox kernel: [<ffffffff81c2ef6e>] ? start_kernel+0x412/0x41e
Mar 26 18:48:59 proxmox kernel: [<ffffffff81c2e33a>] ? x86_64_start_reservations+0x125/0x129
Mar 26 18:48:59 proxmox kernel: [<ffffffff81c2e438>] ? x86_64_start_kernel+0xfa/0x109
Mar 26 18:48:59 proxmox kernel: ---[ end trace 83e11cbc4ff8ba9c ]---
Steps we have taken:
- firmware updates (BIOS, RAID)
- Compiled in the newest hpsa driver (3.2.0-3)
- disabled edac_core and i7core_edac
The changes did not help so we're running 2.6.32-16-pve for now.
Has anyone else seen this error? Are there any fixes available?
- - - Updated - - -
root@proxmox:~# pveversion -v
pve-manager: 2.3-13 (pve-manager/2.3/7946f1f1)
running kernel: 2.6.32-16-pve
proxmox-ve-2.6.32: 2.3-93
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-16-pve: 2.6.32-82
pve-kernel-2.6.32-19-pve: 2.6.32-93
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-4
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-36
qemu-server: 2.3-18
pve-firmware: 1.0-21
libpve-common-perl: 1.0-49
libpve-access-control: 1.0-26
libpve-storage-perl: 2.3-6
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.4-8
ksm-control-daemon: 1.1-1