Hello everyone,
Yesterday one of my proxmox servers starting to randomly restart itself every few hours. It had previously been running for a couple of weeks without any problems. I've checked the logs but don't see an obvious reason for the reboots. Maybe I'm missing something? Thank you in advance for any advice.
Info & Hardware:
Thanks again for any help you can offer.
Yesterday one of my proxmox servers starting to randomly restart itself every few hours. It had previously been running for a couple of weeks without any problems. I've checked the logs but don't see an obvious reason for the reboots. Maybe I'm missing something? Thank you in advance for any advice.
Info & Hardware:
- Proxmox 4.4-1/eb2d6f1e
- Dual Intel E5-2630 v4
- Supermicro X10DRL-i
- 128GB Ram
- 4 x 2TB Samsung 850 Pro + LSI 9271-8i
- Local LVM storage is used for VMs
- Supermicro IPMI event log shows no errors
- Supermico IPMI reports PSU status as ok, power logs never show a drop in power
Code:
Apr 25 16:16:03 node1 rrdcached[2761]: started new journal /var/lib/rrdcached/journal/rrd.journal.1493151363.141442
Apr 25 16:16:03 node1 rrdcached[2761]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1493144163.141408
Apr 25 16:17:01 node1 CRON[29061]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Apr 25 16:17:24 node1 pveproxy[25158]: worker exit
Apr 25 16:17:24 node1 pveproxy[37922]: worker 25158 finished
Apr 25 16:17:24 node1 pveproxy[37922]: starting 1 worker(s)
Apr 25 16:17:24 node1 pveproxy[37922]: worker 29242 started
Apr 25 16:18:31 node1 pvedaemon[7763]: <root@pam> successful auth for user 'root@pam'
Apr 25 16:28:57 node1 pvedaemon[2923]: worker exit
Apr 25 16:28:57 node1 pvedaemon[2920]: worker 2923 finished
Apr 25 16:28:57 node1 pvedaemon[2920]: starting 1 worker(s)
Apr 25 16:28:57 node1 pvedaemon[2920]: worker 34528 started
Apr 25 16:31:38 node1 systemd[1]: Starting Cleanup of Temporary Directories...
Apr 25 16:31:38 node1 systemd[1]: Started Cleanup of Temporary Directories.
Apr 25 16:33:32 node1 pvedaemon[34528]: <root@pam> successful auth for user 'root@pam'
Apr 25 16:43:53 node1 systemd-timesyncd[2366]: interval/delta/delay/jitter/drift 2048s/-0.000s/0.012s/0.000s/-6ppm
Apr 25 16:46:03 node1 smartd[2710]: Device: /dev/bus/0 [megaraid_disk_14] [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 77 to 76
Apr 25 16:48:33 node1 pvedaemon[34528]: <root@pam> successful auth for user 'root@pam'
Apr 25 17:03:34 node1 pvedaemon[34528]: <root@pam> successful auth for user 'root@pam'
Apr 25 17:06:19 node1 pveproxy[29242]: worker exit
Apr 25 17:06:19 node1 pveproxy[37922]: worker 29242 finished
Apr 25 17:06:19 node1 pveproxy[37922]: starting 1 worker(s)
Apr 25 17:06:19 node1 pveproxy[37922]: worker 12249 started
Apr 25 17:06:36 node1 pvedaemon[2922]: worker exit
Apr 25 17:06:36 node1 pvedaemon[2920]: worker 2922 finished
Apr 25 17:06:36 node1 pvedaemon[2920]: starting 1 worker(s)
Apr 25 17:06:36 node1 pvedaemon[2920]: worker 12401 started
Apr 25 17:14:28 node1 pvestatd[2908]: got timeout
Apr 25 17:17:07 node1 rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="2751" x- start
Apr 25 17:17:07 node1 systemd-modules-load[1035]: Module 'fuse' is builtin
Apr 25 17:17:07 node1 systemd-modules-load[1035]: Inserted module 'vhost_net'
Apr 25 17:17:07 node1 systemd[1]: Mounted FUSE Control File System.
Apr 25 17:17:07 node1 systemd[1]: Started Apply Kernel Variables.
Apr 25 17:17:07 node1 systemd[1]: Started udev Coldplug all Devices.
Apr 25 17:17:07 node1 systemd[1]: Starting udev Wait for Complete Device Initialization...
Apr 25 17:17:07 node1 kernel: [ 0.000000] Initializing cgroup subsys cpuset
Apr 25 17:17:07 node1 kernel: [ 0.000000] Initializing cgroup subsys cpu
Apr 25 17:17:07 node1 kernel: [ 0.000000] Initializing cgroup subsys cpuacct
Apr 25 17:17:07 node1 kernel: [ 0.000000] Linux version 4.4.35-1-pve (root@elsa) (gcc version 4.9.2 (Debian 4.9.2-10) ) #1 SMP Fri Dec 9 11:09:55 CET 2016 ()
Apr 25 17:17:07 node1 kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.4.35-1-pve root=/dev/mapper/pve-root ro quiet
Apr 25 17:17:07 node1 kernel: [ 0.000000] KERNEL supported cpus:
Apr 25 17:17:07 node1 kernel: [ 0.000000] Intel GenuineIntel
Apr 25 17:17:07 node1 systemd[1]: Started Create Static Device Nodes in /dev.
Apr 25 17:17:07 node1 kernel: [ 0.000000] AMD AuthenticAMD
Apr 25 17:17:07 node1 kernel: [ 0.000000] Centaur CentaurHauls
Apr 25 17:17:07 node1 kernel: [ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
Apr 25 17:17:07 node1 kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x01: 'x87 floating point registers'
Apr 25 17:17:07 node1 kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x02: 'SSE registers'
Apr 25 17:17:07 node1 systemd[1]: Starting udev Kernel Device Manager...
Apr 25 17:17:07 node1 kernel: [ 0.000000] x86/fpu: Supporting XSAVE feature 0x04: 'AVX registers'
Apr 25 17:17:07 node1 kernel: [ 0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
Apr 25 17:17:07 node1 kernel: [ 0.000000] x86/fpu: Using 'eager' FPU context switches.
Apr 25 17:17:07 node1 kernel: [ 0.000000] e820: BIOS-provided physical RAM map:
Apr 25 17:17:07 node1 systemd[1]: Started udev Kernel Device Manager.
Apr 25 17:17:07 node1 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009afff] usable
Apr 25 17:17:07 node1 kernel: [ 0.000000] BIOS-e820: [mem 0x000000000009b000-0x000000000009ffff] reserved
Apr 25 17:17:07 node1 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
Apr 25 17:17:07 node1 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000078f25fff] usable
Apr 25 17:17:07 node1 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000078f26000-0x0000000079856fff] reserved
Apr 25 17:17:07 node1 systemd[1]: Starting LSB: Set preliminary keymap...
Apr 25 17:17:07 node1 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000079857000-0x0000000079d4cfff] ACPI NVS
Apr 25 17:17:07 node1 systemd[1]: Starting LSB: Tune IDE hard disks...
Apr 25 17:17:07 node1 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000079d4d000-0x000000008fffffff] reserved
Apr 25 17:17:07 node1 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed44fff] reserved
Apr 25 17:17:07 node1 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
Apr 25 17:17:07 node1 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000207fffffff] usable
Apr 25 17:17:07 node1 hdparm[1075]: Setting parameters of disc: (none).
Apr 25 17:17:07 node1 kernel: [ 0.000000] NX (Execute Disable) protection: active
Apr 25 17:17:07 node1 kernel: [ 0.000000] SMBIOS 3.0 present.
Apr 25 17:17:07 node1 kernel: [ 0.000000] DMI: Supermicro Super Server/X10DRL-i, BIOS 2.0 12/18/2015
Apr 25 17:17:07 node1 kernel: [ 0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
Apr 25 17:17:07 node1 kernel: [ 0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable
Apr 25 17:17:07 node1 kernel: [ 0.000000] e820: last_pfn = 0x2080000 max_arch_pfn = 0x400000000
Apr 25 17:17:07 node1 systemd[1]: Starting system-lvm2\x2dpvscan.slice.
Apr 25 17:17:07 node1 kernel: [ 0.000000] MTRR default type: write-back
Apr 25 17:17:07 node1 kernel: [ 0.000000] MTRR fixed ranges enabled:
Apr 25 17:17:07 node1 kernel: [ 0.000000] 00000-9FFFF write-back
Apr 25 17:17:07 node1 systemd[1]: Created slice system-lvm2\x2dpvscan.slice.
Apr 25 17:17:07 node1 kernel: [ 0.000000] A0000-BFFFF uncachable
Apr 25 17:17:07 node1 systemd[1]: Starting LVM2 PV scan on device 8:3...
Apr 25 17:17:07 node1 kernel: [ 0.000000] C0000-FFFFF write-protect
Apr 25 17:17:07 node1 kernel: [ 0.000000] MTRR variable ranges enabled:
Apr 25 17:17:07 node1 kernel: [ 0.000000] 0 base 000080000000 mask 3FFF80000000 uncachable
Apr 25 17:17:07 node1 kernel: [ 0.000000] 1 base 380000000000 mask 3F8000000000 uncachable
Apr 25 17:17:07 node1 systemd[1]: Started LVM2 PV scan on device 8:3.
Apr 25 17:17:07 node1 kernel: [ 0.000000] 2 disabled
Apr 25 17:17:07 node1 kernel: [ 0.000000] 3 disabled
Apr 25 17:17:07 node1 kernel: [ 0.000000] 4 disabled
Apr 25 17:17:07 node1 kernel: [ 0.000000] 5 disabled
Apr 25 17:17:07 node1 keyboard-setup[1074]: Setting preliminary keymap...done.
Apr 25 17:17:07 node1 kernel: [ 0.000000] 6 disabled
Apr 25 17:17:07 node1 kernel: [ 0.000000] 7 disabled
Apr 25 17:17:07 node1 systemd[1]: Started LSB: Set preliminary keymap.
Apr 25 17:17:07 node1 kernel: [ 0.000000] 8 disabled
Apr 25 17:17:07 node1 kernel: [ 0.000000] 9 disabled
Apr 25 17:17:07 node1 kernel: [ 0.000000] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WC UC- WT
Apr 25 17:17:07 node1 kernel: [ 0.000000] e820: last_pfn = 0x78f26 max_arch_pfn = 0x400000000
Apr 25 17:17:07 node1 kernel: [ 0.000000] found SMP MP-table at [mem 0x000fdbe0-0x000fdbef] mapped at [ffff8800000fdbe0]
Apr 25 17:17:07 node1 kernel: [ 0.000000] Scanning 1 areas for low memory corruption
Apr 25 17:17:07 node1 kernel: [ 0.000000] Base memory trampoline at [ffff880000095000] 95000 size 24576
Apr 25 17:17:07 node1 kernel: [ 0.000000] Using GB pages for direct mapping
Apr 25 17:17:07 node1 kernel: [ 0.000000] BRK [0x02220000, 0x02220fff] PGTABLE
Apr 25 17:17:07 node1 kernel: [ 0.000000] BRK [0x02221000, 0x02221fff] PGTABLE
Apr 25 17:17:07 node1 kernel: [ 0.000000] BRK [0x02222000, 0x02222fff] PGTABLE
Apr 25 17:17:07 node1 kernel: [ 0.000000] RAMDISK: [mem 0x34cbf000-0x36656fff]
Apr 25 17:17:07 node1 systemd[1]: Starting Remount Root and Kernel File Systems...
Apr 25 17:17:07 node1 kernel: [ 0.000000] ACPI: Early table checksum verification disabled
Apr 25 17:17:07 node1 systemd[1]: Started Remount Root and Kernel File Systems.
Apr 25 17:17:07 node1 kernel: [ 0.000000] ACPI: RSDP 0x00000000000F0580 000024 (v02 SUPERM)
Apr 25 17:17:07 node1 kernel: [ 0.000000] ACPI: XSDT 0x00000000798BF0A8 0000CC (v01 01072009 AMI 00010013)
Apr 25 17:17:07 node1 kernel: [ 0.000000] ACPI: FACP 0x00000000798EFDB8 00010C (v05 SUPERM SMCI--MB 01072009 AMI 00010013)
Apr 25 17:17:07 node1 kernel: [ 0.000000] ACPI: DSDT 0x00000000798BF208 030BB0 (v02 SUPERM SMCI--MB 01072009 INTL 20091013)
Apr 25 17:17:07 node1 kernel: [ 0.000000] ACPI: FACS 0x0000000079D4BF80 000040
Apr 25 17:17:07 node1 kernel: [ 0.000000] ACPI: APIC 0x00000000798EFEC8 000294 (v03 SUPERM SMCI--MB 01072009 AMI 00010013)
Apr 25 17:17:07 node1 systemd[1]: Started Various fixups to make systemd work better on Debian.
Apr 25 17:17:07 node1 kernel: [ 0.000000] ACPI: FPDT 0x00000000798F0160 000044 (v01 SUPERM SMCI--MB 01072009 AMI 00010013)
Apr 25 17:17:07 node1 systemd[1]: Starting Load/Save Random Seed...
Apr 25 17:17:07 node1 kernel: [ 0.000000] ACPI: FIDT 0x00000000798F01A8 00009C (v01 SUPERM SMCI--MB 01072009 AMI 00010013)
Apr 25 17:17:07 node1 systemd[1]: Starting Local File Systems (Pre).
Apr 25 17:17:07 node1 kernel: [ 0.000000] ACPI: SPMI 0x00000000798F0248 000040 (v05 SUPERM SMCI--MB 00000000 AMI. 00000000)
Apr 25 17:17:07 node1 systemd[1]: Reached target Local File Systems (Pre).
Apr 25 17:17:07 node1 kernel: [ 0.000000] ACPI: MCFG 0x00000000798F0288 00003C (v01 SUPERM SMCI--MB 01072009 MSFT 00000097)
Apr 25 17:17:07 node1 kernel: [ 0.000000] ACPI: UEFI 0x00000000798F02C8 000042 (v01 SUPERM SMCI--MB 01072009 00000000)
Apr 25 17:17:07 node1 kernel: [ 0.000000] ACPI: HPET 0x00000000798F0310 000038 (v01 SUPERM SMCI--MB 00000001 INTL 20091013)
Apr 25 17:17:07 node1 kernel: [ 0.000000] ACPI: MSCT 0x00000000798F0348 000090 (v01 SUPERM SMCI--MB 00000001 INTL 20091013)
Apr 25 17:17:07 node1 kernel: [ 0.000000] ACPI: SLIT 0x00000000798F03D8 000030 (v01 SUPERM SMCI--MB 00000001 INTL 20091013)
Apr 25 17:17:07 node1 systemd[1]: Started Load/Save Random Seed.
Apr 25 17:17:07 node1 kernel: [ 0.000000] ACPI: SRAT 0x00000000798F0408 001158 (v03 SUPERM SMCI--MB 00000001 INTL 20091013)
Apr 25 17:17:07 node1 kernel: [ 0.000000] ACPI: WDDT 0x00000000798F1560 000040 (v01 SUPERM SMCI--MB 00000000 INTL 20091013)
Apr 25 17:17:07 node1 kernel: [ 0.000000] ACPI: SSDT 0x00000000798F15A0 01700F (v02 SUPERM PmMgt 00000001 INTL 20120913)
Apr 25 17:17:07 node1 kernel: [ 0.000000] ACPI: SSDT 0x00000000799085B0 00264C (v02 SUPERM SpsNm 00000002 INTL 20120913)
Apr 25 17:17:07 node1 systemd[1]: Started udev Wait for Complete Device Initialization.
Apr 25 17:17:07 node1 kernel: [ 0.000000] ACPI: SSDT 0x000000007990AC00 000064 (v02 SUPERM SpsNvs 00000002 INTL 20120913)
Apr 25 17:17:07 node1 systemd[1]: Starting Activation of LVM2 logical volumes...
Apr 25 17:17:07 node1 kernel: [ 0.000000] ACPI: PRAD 0x000000007990AC68 000102 (v02 SUPERM SMCI--MB 00000002 INTL 20120913)
Apr 25 17:17:07 node1 kernel: [ 0.000000] ACPI: DMAR 0x000000007990AD70 000130 (v01 SUPERM SMCI--MB 00000001 INTL 20091013)
Thanks again for any help you can offer.