Proxmox 4.2 random reboot in HP proliant bl460c

melvinzen

New Member
May 24, 2016
2
1
1
41
Hi,

we have 2 cluster nodes with 4 HA vm's (work well with online migration) and NAS (nfs for shared storage for my vm's) i've already disable nmi watchdog and blacklist hpwdt both are inactive. but still rebooting every 24hrs. dont have any problems in network.. can anyone experience this kind of problem? thank you.. (sorry for my english :D)

# pveversion -v

proxmox-ve: 4.2-48 (running kernel: 4.4.6-1-pve)
pve-manager: 4.2-2 (running version: 4.2-2/725d76f0)
pve-kernel-4.4.6-1-pve: 4.4.6-48
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-39
qemu-server: 4.0-72
pve-firmware: 1.1-8
libpve-common-perl: 4.0-59
libpve-access-control: 4.0-16
libpve-storage-perl: 4.0-50
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-14
pve-container: 1.0-62
pve-firewall: 2.0-25
pve-ha-manager: 1.0-28
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve9~jessie
 
* syslog from webui

May 24 06:49:13 Blade01 kernel: Linux version 4.4.6-1-pve (root@elsa) (gcc version 4.9.2 (Debian 4.9.2-10) ) #1 SMP Thu Apr 21 11:25:40 CEST 2016 ()
May 24 06:49:13 Blade01 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-4.4.6-1-pve root=/dev/mapper/pve-root ro quiet nmi_watchdog=0
May 24 06:49:13 Blade01 kernel: SMBIOS 2.7 present.
May 24 06:49:13 Blade01 kernel: DMI: HP ProLiant BL460c G7, BIOS I27 07/02/2013
May 24 06:49:13 Blade01 kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
May 24 06:49:13 Blade01 kernel: e820: remove [mem 0x000a0000-0x000fffff] usable
May 24 06:49:13 Blade01 kernel: e820: last_pfn = 0x180bfff max_arch_pfn = 0x400000000
May 24 06:49:13 Blade01 kernel: MTRR default type: write-back
May 24 06:49:13 Blade01 kernel: MTRR fixed ranges enabled:
May 24 06:49:13 Blade01 kernel: 00000-9FFFF write-back
May 24 06:49:13 Blade01 kernel: A0000-BFFFF uncachable
May 24 06:49:13 Blade01 kernel: C0000-FFFFF write-protect
May 24 06:49:13 Blade01 kernel: MTRR variable ranges enabled:
May 24 06:49:13 Blade01 kernel: 0 base 00F4000000 mask FFFC000000 uncachable
May 24 06:49:13 Blade01 kernel: 1 base 00F8000000 mask FFF8000000 uncachable
May 24 06:49:13 Blade01 kernel: x86/PAT: Configuration [0-7]: WB WC UC- UC WB WC UC- WT
May 24 06:49:13 Blade01 kernel: e820: last_pfn = 0xf363d max_arch_pfn = 0x400000000
May 24 06:49:13 Blade01 kernel: found SMP MP-table at [mem 0x000f4f80-0x000f4f8f] mapped at [ffff8800000f4f80]
May 24 06:49:13 Blade01 kernel: Scanning 1 areas for low memory corruption
May 24 06:49:13 Blade01 kernel: Base memory trampoline at [ffff880000099000] 99000 size 24576
May 24 06:49:13 Blade01 kernel: Using GB pages for direct mapping
May 24 06:49:13 Blade01 kernel: RAMDISK: [mem 0x34e80000-0x36737fff]
May 24 06:49:13 Blade01 kernel: ACPI: Early table checksum verification disabled
May 24 06:49:13 Blade01 kernel: ACPI: RSDP 0x00000000000F4F00 000024 (v02 HP )
May 24 06:49:13 Blade01 kernel: [81B blob data]
May 24 06:49:13 Blade01 kernel: [81B blob data]
May 24 06:49:13 Blade01 kernel: ACPI BIOS Warning (bug): Invalid length for FADT/Pm1aControlBlock: 32, using default 16 (20150930/tbfadt-704)
May 24 06:49:13 Blade01 kernel: ACPI BIOS Warning (bug): Invalid length for FADT/Pm2ControlBlock: 32, using default 8 (20150930/tbfadt-704)
May 24 06:49:13 Blade01 kernel: ACPI: DSDT 0x00000000F3630240 00204D (v01 HP DSDT 00000001 INTL 20030228)
May 24 06:49:13 Blade01 kernel: ACPI: FACS 0x00000000F362F100 000040
May 24 06:49:13 Blade01 kernel: ACPI: FACS 0x00000000F362F100 000040
May 24 06:49:13 Blade01 kernel: [81B blob data]
May 24 06:49:13 Blade01 kernel: ACPI: MCFG 0x00000000F362F1C0 00003C (v01 HP ProLiant 00000001 00000000)
May 24 06:49:13 Blade01 kernel: ACPI: APIC 0x00000000F362F500 00015E (v01 HP ProLiant 00000002 00000000)
May 24 06:49:13 Blade01 kernel: Initmem setup node 0 [mem 0x0000000000001000-0x0000000c0bffffff]
May 24 06:49:13 Blade01 kernel: On node 0 totalpages: 12580302
May 24 06:49:13 Blade01 kernel: ACPI: PM-Timer IO Port: 0x908
May 24 06:49:13 Blade01 kernel: ACPI: Local APIC address 0xfee00000
May 24 06:49:13 Blade01 kernel: ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
May 24 06:49:13 Blade01 kernel: IOAPIC[0]: apic_id 8, version 32, address 0xfec00000, GSI 0-23
May 24 06:49:13 Blade01 kernel: IOAPIC[1]: apic_id 0, version 32, address 0xfec80000, GSI 24-47
May 24 06:49:13 Blade01 kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
May 24 06:49:13 Blade01 kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
May 24 06:49:13 Blade01 kernel: ACPI: IRQ0 used by override.
May 24 06:49:13 Blade01 kernel: ACPI: IRQ9 used by override.
May 24 06:49:13 Blade01 kernel: Using ACPI (MADT) for SMP configuration information
May 24 06:49:13 Blade01 kernel: ACPI: HPET id: 0x8086a201 base: 0xfed00000
May 24 06:49:13 Blade01 kernel: smpboot: Allowing 32 CPUs, 8 hotplug CPUs
May 24 06:49:13 Blade01 kernel: Booting paravirtualized kernel on bare hardware
May 24 06:49:13 Blade01 kernel: clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
May 24 06:49:13 Blade01 kernel: setup_percpu: NR_CPUS:256 nr_cpumask_bits:256 nr_cpu_ids:32 nr_node_ids:2
May 24 06:49:13 Blade01 kernel: PERCPU: Embedded 34 pages/cpu @ffff880bdb800000 s99160 r8192 d31912 u262144
May 24 06:49:13 Blade01 kernel: pcpu-alloc: s99160 r8192 d31912 u262144 alloc=1*2097152
May 24 06:49:13 Blade01 kernel: pcpu-alloc: [0] 00 02 04 06 08 10 12 14 [0] 16 18 20 22 25 27 29 31
May 24 06:49:13 Blade01 kernel: pcpu-alloc: [1] 01 03 05 07 09 11 13 15 [1] 17 19 21 23 24 26 28 30
May 24 06:49:13 Blade01 kernel: Built 2 zonelists in Node order, mobility grouping on. Total pages: 24769208
May 24 06:49:13 Blade01 kernel: Policy zone: Normal
May 24 06:49:13 Blade01 kernel: Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.4.6-1-pve root=/dev/mapper/pve-root ro quiet nmi_watchdog=0
May 24 06:49:13 Blade01 kernel: PID hash table entries: 4096 (order: 3, 32768 bytes)
May 24 06:49:13 Blade01 kernel: Calgary: detecting Calgary via BIOS EBDA area
May 24 06:49:13 Blade01 kernel: Calgary: Unable to locate Rio Grande table in EBDA - bailing!
May 24 06:49:13 Blade01 kernel: Memory: 98963140K/100652852K available (8496K kernel code, 1369K rwdata, 3940K rodata, 1496K init, 1300K bss, 1689712K reserved, 0K cma-reserved)
May 24 06:49:13 Blade01 kernel: SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=32, Nodes=2
May 24 06:49:13 Blade01 kernel: Hierarchical RCU implementation.
May 24 06:49:13 Blade01 kernel: Build-time adjustment of leaf fanout to 64.
May 24 06:49:13 Blade01 kernel: RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=32.
May 24 06:49:13 Blade01 kernel: RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=32
May 24 06:49:13 Blade01 kernel: NR_IRQS:16640 nr_irqs:1088 16
May 24 06:49:13 Blade01 kernel: Console: colour VGA+ 80x25
May 24 06:49:13 Blade01 kernel: console [tty0] enabled
May 24 06:49:13 Blade01 kernel: mempolicy: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
May 24 06:49:13 Blade01 kernel: clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484882848 ns
May 24 06:49:13 Blade01 kernel: hpet clockevent registered
May 24 06:49:13 Blade01 kernel: tsc: Fast TSC calibration using PIT
May 24 06:49:13 Blade01 kernel: tsc: Detected 2533.453 MHz processor
May 24 06:49:13 Blade01 kernel: Calibrating delay loop (skipped), value calculated using timer frequency.. 5066.90 BogoMIPS (lpj=10133812)
May 24 06:49:13 Blade01 kernel: pid_max: default: 32768 minimum: 301
May 24 06:49:13 Blade01 kernel: ACPI: Core revision 20150930
May 24 06:49:13 Blade01 kernel: ACPI: 4 ACPI AML tables successfully acquired and loaded
May 24 06:49:13 Blade01 kernel: Security Framework initialized
May 24 06:49:13 Blade01 kernel: Yama: becoming mindful.
May 24 06:49:13 Blade01 kernel: AppArmor: AppArmor initialized
May 24 06:49:13 Blade01 kernel: Dentry cache hash table entries: 16777216 (order: 15, 134217728 bytes)
May 24 06:49:13 Blade01 kernel: Inode-cache hash table entries: 8388608 (order: 14, 67108864 bytes)
May 24 06:49:13 Blade01 kernel: Mount-cache hash table entries: 262144 (order: 9, 2097152 bytes)
May 24 06:49:13 Blade01 kernel: Mountpoint-cache hash table entries: 262144 (order: 9, 2097152 bytes)
May 24 06:49:13 Blade01 kernel: ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
May 24 06:49:13 Blade01 kernel: ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8)
May 24 06:49:13 Blade01 kernel: mce: CPU supports 9 MCE banks
May 24 06:49:13 Blade01 kernel: CPU0: Thermal monitoring enabled (TM1)
May 24 06:49:13 Blade01 kernel: process: using mwait in idle threads
May 24 06:49:13 Blade01 kernel: Last level iTLB entries: 4KB 512, 2MB 7, 4MB 7
May 24 06:49:13 Blade01 kernel: Last level dTLB entries: 4KB 512, 2MB 32, 4MB 32, 1GB 0
May 24 06:49:13 Blade01 kernel: Freeing SMP alternatives memory: 28K (ffffffff820ce000 - ffffffff820d5000)
May 24 06:49:13 Blade01 kernel: ftrace: allocating 32428 entries in 127 pages
May 24 06:49:13 Blade01 kernel: smpboot: Max logical packages: 6
May 24 06:49:13 Blade01 kernel: smpboot: APIC(20) Converting physical 1 to logical package 0
May 24 06:49:13 Blade01 kernel: smpboot: APIC(10) Converting physical 0 to logical package 1
May 24 06:49:13 Blade01 kernel: DMAR-IR: This system BIOS has enabled interrupt remapping
on a chipset that contains an erratum making that
feature unstable. To maintain system stability
interrupt remapping is being disabled. Please
contact your BIOS vendor for an update
May 24 06:49:13 Blade01 kernel: Switched APIC routing to physical flat.
May 24 06:49:13 Blade01 kernel: ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
May 24 06:49:13 Blade01 kernel: smpboot: CPU0: Intel(R) Xeon(R) CPU E5649 @ 2.53GHz (family: 0x6, model: 0x2c, stepping: 0x2)
May 24 06:49:13 Blade01 kernel: Performance Events: PEBS fmt1+, 16-deep LBR, Westmere events, Broken BIOS detected, complain to your hardware vendor.
May 24 06:49:13 Blade01 kernel: [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is 330)
May 24 06:49:13 Blade01 kernel: Intel PMU driver.
May 24 06:49:13 Blade01 kernel: core: CPUID marked event: 'bus cycles' unavailable
May 24 06:49:13 Blade01 kernel: x86: Booted up 2 nodes, 24 CPUs
May 24 06:49:13 Blade01 kernel: smpboot: Total of 24 processors activated (121605.17 BogoMIPS)
May 24 06:49:13 Blade01 kernel: devtmpfs: initialized
May 24 06:49:13 Blade01 kernel: Using 2GB memory block size for large-memory system
May 24 06:49:13 Blade01 kernel: clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
 
  • Like
Reactions: jay_granil