PROXMOX node halts randomly.

nyquist

Member
Aug 7, 2013
9
1
23
Hi all,
Another PROXMOX noob here. My host was getting to a "halt" state in the past few days. There's no specific timing, sometimes it halts after 18 hours, sometimes in 2 days, but I've noticed that after rebooting the system and checking the logs, the last logged lines are always related with journal rotations. During these situations, the only solution is to manually press the reset button. I'm only hosting KVM guests.


I've found some threads from some guys with similar issues and they also reported logs with journal rotation tasks logged just at the end of their logs.

Here is the relevant portion of my syslog: (I've marked in red the last line before and the first one after the reboot)

Aug 12 00:17:01 pmx3host /USR/SBIN/CRON[94720]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Aug 12 00:46:24 pmx3host rrdcached[1817]: flushing old values
Aug 12 00:46:24 pmx3host rrdcached[1817]: rotating journals
Aug 12 00:46:24 pmx3host rrdcached[1817]: started new journal /var/lib/rrdcached/journal//rrd.journal.1376279184.400364
Aug 12 00:46:24 pmx3host rrdcached[1817]: removing old journal /var/lib/rrdcached/journal//rrd.journal.1376271984.400472
Aug 12 00:49:53 pmx3host pvedaemon[80670]: <root@pam> successful auth for user 'crioboo@pve'
Aug 12 00:58:06 pmx3host pvedaemon[80656]: <root@pam> successful auth for user 'crioboo@pve'
Aug 12 01:17:01 pmx3host /USR/SBIN/CRON[98222]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Aug 12 01:46:24 pmx3host rrdcached[1817]: flushing old values
Aug 12 01:46:24 pmx3host rrdcached[1817]: rotating journals
Aug 12 01:46:24 pmx3host rrdcached[1817]: started new journal /var/lib/rrdcached/journal//rrd.journal.1376282784.400572
Aug 12 01:46:24 pmx3host rrdcached[1817]: removing old journal /var/lib/rrdcached/journal//rrd.journal.1376275584.400502
Aug 12 02:17:01 pmx3host /USR/SBIN/CRON[101315]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Aug 12 02:41:38 pmx3host pvedaemon[80670]: <root@pam> successful auth for user 'crioboo@pve'
Aug 12 02:46:24 pmx3host rrdcached[1817]: flushing old values
Aug 12 02:46:24 pmx3host rrdcached[1817]: rotating journals
Aug 12 02:46:24 pmx3host rrdcached[1817]: started new journal /var/lib/rrdcached/journal//rrd.journal.1376286384.400474
Aug 12 02:46:24 pmx3host rrdcached[1817]: removing old journal /var/lib/rrdcached/journal//rrd.journal.1376279184.400364
Aug 12 02:51:25 pmx3host pvedaemon[80670]: <crioboo@pve> starting task UPID:pmx3host:0001940D:011E1FB7:520877DD:qmstop:101:crioboo@pve:
Aug 12 02:51:25 pmx3host pvedaemon[103437]: stop VM 101: UPID:pmx3host:0001940D:011E1FB7:520877DD:qmstop:101:crioboo@pve:
Aug 12 02:51:25 pmx3host kernel: vmbr0: port 2(tap101i0) entering disabled state
Aug 12 02:51:25 pmx3host kernel: vmbr0: port 2(tap101i0) entering disabled state
Aug 12 02:51:25 pmx3host ntpd[1776]: Deleting interface #11 tap101i0, fe80::9834:6ff:feee:ff20#123, interface stats: received=0, sent=0, dropped=0, active_time=186600 secs
Aug 12 02:51:26 pmx3host pvedaemon[80670]: <crioboo@pve> end task UPID:pmx3host:0001940D:011E1FB7:520877DD:qmstop:101:crioboo@pve: OK
Aug 12 02:51:37 pmx3host pvedaemon[103455]: start VM 101: UPID:pmx3host:0001941F:011E247C:520877E9:qmstart:101:crioboo@pve:
Aug 12 02:51:37 pmx3host pvedaemon[81161]: <crioboo@pve> starting task UPID:pmx3host:0001941F:011E247C:520877E9:qmstart:101:crioboo@pve:
Aug 12 02:51:38 pmx3host kernel: device tap101i0 entered promiscuous mode
Aug 12 02:51:38 pmx3host kernel: vmbr0: port 2(tap101i0) entering forwarding state
Aug 12 02:51:38 pmx3host pvedaemon[81161]: <crioboo@pve> end task UPID:pmx3host:0001941F:011E247C:520877E9:qmstart:101:crioboo@pve: OK
Aug 12 02:51:48 pmx3host kernel: tap101i0: no IPv6 routers present
Aug 12 02:56:25 pmx3host ntpd[1776]: Listen normally on 13 tap101i0 fe80::ac1c:d3ff:fece:89ed UDP 123
Aug 12 03:17:01 pmx3host /USR/SBIN/CRON[104871]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Aug 12 03:46:24 pmx3host rrdcached[1817]: flushing old values
Aug 12 03:46:24 pmx3host rrdcached[1817]: rotating journals
Aug 12 03:46:24 pmx3host rrdcached[1817]: started new journal /var/lib/rrdcached/journal//rrd.journal.1376289984.400500
Aug 12 03:46:24 pmx3host rrdcached[1817]: removing old journal /var/lib/rrdcached/journal//rrd.journal.1376282784.400572
Aug 12 04:17:01 pmx3host /USR/SBIN/CRON[107908]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Aug 12 04:46:24 pmx3host rrdcached[1817]: flushing old values
Aug 12 04:46:24 pmx3host rrdcached[1817]: rotating journals
Aug 12 04:46:24 pmx3host rrdcached[1817]: started new journal /var/lib/rrdcached/journal//rrd.journal.1376293584.424462
Aug 12 04:46:24 pmx3host rrdcached[1817]: removing old journal /var/lib/rrdcached/journal//rrd.journal.1376286384.400474
Aug 12 05:17:01 pmx3host /USR/SBIN/CRON[110882]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Aug 12 05:46:24 pmx3host rrdcached[1817]: flushing old values
Aug 12 05:46:24 pmx3host rrdcached[1817]: rotating journals
Aug 12 05:46:24 pmx3host rrdcached[1817]: started new journal /var/lib/rrdcached/journal//rrd.journal.1376297184.400686
Aug 12 05:46:24 pmx3host rrdcached[1817]: removing old journal /var/lib/rrdcached/journal//rrd.journal.1376289984.400500
Aug 12 06:17:01 pmx3host /USR/SBIN/CRON[113834]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Aug 12 06:25:01 pmx3host /USR/SBIN/CRON[114264]: (root) CMD (test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ))
06:25:02 pmx3host rsyslogd: [origin software="rsyslogd" swVersion="4.6.4" x-pid="1672" x-info="http://www.rsyslog.com"] rsyslogd was HUPed, type 'lightweight'.
Aug 12 06:25:02 pmx3host rsyslogd: [origin software="rsyslogd" swVersion="4.6.4" x-pid="1672" x-info="http://www.rsyslog.com"] rsyslogd was HUPed, type 'lightweight'.
Aug 12 06:46:24 pmx3host rrdcached[1817]: flushing old values
Aug 12 06:46:24 pmx3host rrdcached[1817]: rotating journals
Aug 12 06:46:24 pmx3host rrdcached[1817]: started new journal /var/lib/rrdcached/journal//rrd.journal.1376300784.400659
Aug 12 06:46:24 pmx3host rrdcached[1817]: removing old journal /var/lib/rrdcached/journal//rrd.journal.1376293584.424462
Aug 12 19:07:29 pmx3host kernel: imklog 4.6.4, log source = /proc/kmsg started.
Aug 12 19:07:29 pmx3host rsyslogd: [origin software="rsyslogd" swVersion="4.6.4" x-pid="1835" x-info="http://www.rsyslog.com"] (re)start
Aug 12 19:07:29 pmx3host kernel: Initializing cgroup subsys cpuset
Aug 12 19:07:29 pmx3host kernel: Initializing cgroup subsys cpu
Aug 12 19:07:29 pmx3host kernel: Linux version 2.6.32-19-pve (root@maui) (gcc version 4.4.5 (Debian 4.4.5-8) ) #1 SMP Wed May 15 07:32:52 CEST 2013
Aug 12 19:07:29 pmx3host kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-2.6.32-19-pve root=UUID=ef09aef3-8fdd-4738-80df-f5eb4caec0a5 ro quiet
Aug 12 19:07:29 pmx3host kernel: KERNEL supported cpus:
Aug 12 19:07:29 pmx3host kernel: Intel GenuineIntel
Aug 12 19:07:29 pmx3host kernel: AMD AuthenticAMD
Aug 12 19:07:29 pmx3host kernel: Centaur CentaurHauls
Aug 12 19:07:29 pmx3host kernel: BIOS-provided physical RAM map:
Aug 12 19:07:29 pmx3host kernel: BIOS-e820: 0000000000000000 - 000000000009c000 (usable)
Aug 12 19:07:29 pmx3host kernel: BIOS-e820: 000000000009c000 - 00000000000a0000 (reserved)
Aug 12 19:07:29 pmx3host kernel: BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
Aug 12 19:07:29 pmx3host kernel: BIOS-e820: 0000000000100000 - 000000008c012000 (usable)
Aug 12 19:07:29 pmx3host kernel: BIOS-e820: 000000008c012000 - 000000008c0f0000 (ACPI NVS)
Aug 12 19:07:29 pmx3host kernel: BIOS-e820: 000000008c0f0000 - 000000008c4fb000 (ACPI data)
Aug 12 19:07:29 pmx3host kernel: BIOS-e820: 000000008c4fb000 - 000000008d8fb000 (ACPI NVS)
Aug 12 19:07:29 pmx3host kernel: BIOS-e820: 000000008d8fb000 - 000000008f602000 (ACPI data)
Aug 12 19:07:29 pmx3host kernel: BIOS-e820: 000000008f602000 - 000000008f64f000 (reserved)
Aug 12 19:07:29 pmx3host kernel: BIOS-e820: 000000008f64f000 - 000000008f6e4000 (ACPI data)
Aug 12 19:07:29 pmx3host kernel: BIOS-e820: 000000008f6e4000 - 000000008f6ee000 (ACPI NVS)
Aug 12 19:07:29 pmx3host kernel: BIOS-e820: 000000008f6ee000 - 000000008f6f1000 (ACPI data)
Aug 12 19:07:29 pmx3host kernel: BIOS-e820: 000000008f6f1000 - 000000008f7cf000 (ACPI NVS)
Aug 12 19:07:29 pmx3host kernel: BIOS-e820: 000000008f7cf000 - 000000008f800000 (ACPI data)
Aug 12 19:07:29 pmx3host kernel: BIOS-e820: 000000008f800000 - 0000000090000000 (reserved)
Aug 12 19:07:29 pmx3host kernel: BIOS-e820: 00000000a0000000 - 00000000b0000000 (reserved)
Aug 12 19:07:29 pmx3host kernel: BIOS-e820: 00000000fc000000 - 00000000fd000000 (reserved)
Aug 12 19:07:29 pmx3host kernel: BIOS-e820: 00000000fed1c000 - 00000000fed45000 (reserved)
Aug 12 19:07:29 pmx3host kernel: BIOS-e820: 00000000ff800000 - 0000000100000000 (reserved)
Aug 12 19:07:29 pmx3host kernel: BIOS-e820: 0000000100000000 - 0000000c70000000 (usable)
Aug 12 19:07:29 pmx3host kernel: DMI 2.5 present.
Aug 12 19:07:29 pmx3host kernel: SMBIOS version 2.5 @ 0xF0440
Aug 12 19:07:29 pmx3host kernel: DMI: Intel Corporation S5520HC/S5520HC, BIOS S5500.86B.01.00.0060.090920111354 09/09/2011
Aug 12 19:07:29 pmx3host kernel: e820 update range: 0000000000000000 - 0000000000001000 (usable) ==> (reserved)
Aug 12 19:07:29 pmx3host kernel: e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
Aug 12 19:07:29 pmx3host kernel: last_pfn = 0xc70000 max_arch_pfn = 0x400000000
Aug 12 19:07:29 pmx3host kernel: MTRR default type: write-back
Aug 12 19:07:29 pmx3host kernel: MTRR fixed ranges enabled:
Aug 12 19:07:29 pmx3host kernel: 00000-9FFFF write-back
Aug 12 19:07:29 pmx3host kernel: A0000-BFFFF uncachable
Aug 12 19:07:29 pmx3host kernel: C0000-DFFFF write-through
Aug 12 19:07:29 pmx3host kernel: E0000-FFFFF write-protect
Aug 12 19:07:29 pmx3host kernel: MTRR variable ranges enabled:
Aug 12 19:07:29 pmx3host kernel: 0 base 00C0000000 mask FFC0000000 uncachable
Aug 12 19:07:29 pmx3host kernel: 1 base 00A0000000 mask FFE0000000 uncachable
Aug 12 19:07:29 pmx3host kernel: 2 base 0090000000 mask FFF0000000 uncachable
Aug 12 19:07:29 pmx3host kernel: 3 base 00B0000000 mask FFFF000000 write-combining
Aug 12 19:07:29 pmx3host kernel: 4 disabled
Aug 12 19:07:29 pmx3host kernel: 5 disabled
Aug 12 19:07:29 pmx3host kernel: 6 disabled
Aug 12 19:07:29 pmx3host kernel: 7 disabled
Aug 12 19:07:29 pmx3host kernel: x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
Aug 12 19:07:29 pmx3host kernel: last_pfn = 0x8c012 max_arch_pfn = 0x400000000
Aug 12 19:07:29 pmx3host kernel: initial memory mapped : 0 - 20000000
Aug 12 19:07:29 pmx3host kernel: init_memory_mapping: 0000000000000000-000000008c012000
Aug 12 19:07:29 pmx3host kernel: 0000000000 - 008c000000 page 2M
Aug 12 19:07:29 pmx3host kernel: 008c000000 - 008c012000 page 4k
Aug 12 19:07:29 pmx3host kernel: kernel direct mapping tables up to 8c012000 @ 8000-d000
Aug 12 19:07:29 pmx3host kernel: init_memory_mapping: 0000000100000000-0000000c70000000
Aug 12 19:07:29 pmx3host kernel: 0100000000 - 0c70000000 page 2M
Aug 12 19:07:29 pmx3host kernel: kernel direct mapping tables up to c70000000 @ b000-3e000
Aug 12 19:07:29 pmx3host kernel: RAMDISK: 3703f000 - 37fef1f6
Aug 12 19:07:29 pmx3host kernel: ACPI: RSDP 00000000000f0410 00024 (v02 INTEL )
Aug 12 19:07:29 pmx3host kernel: ACPI: XSDT 000000008f7fd120 0009C (v01 INTEL S5520HC 00000000 01000013)
Aug 12 19:07:29 pmx3host kernel: ACPI: FACP 000000008f7fb000 000F4 (v04 INTEL S5520HC 00000000 MSFT 0100000D)
Aug 12 19:07:29 pmx3host kernel: ACPI: DSDT 000000008f7f4000 06531 (v02 INTEL S5520HC 00000003 MSFT 0100000D)
Aug 12 19:07:29 pmx3host kernel: ACPI: FACS 000000008f6f1000 00040
Aug 12 19:07:29 pmx3host kernel: ACPI: APIC 000000008f7f3000 001A8 (v02 INTEL S5520HC 00000000 MSFT 0100000D)
Aug 12 19:07:29 pmx3host kernel: ACPI: MCFG 000000008f7f2000 0003C (v01 INTEL S5520HC 00000001 MSFT 0100000D)
Aug 12 19:07:29 pmx3host kernel: ACPI: HPET 000000008f7f1000 00038 (v01 INTEL S5520HC 00000001 MSFT 0100000D)
Aug 12 19:07:29 pmx3host kernel: ACPI: SLIT 000000008f7f0000 00030 (v01 INTEL S5520HC 00000001 MSFT 0100000D)
Aug 12 19:07:29 pmx3host kernel: ACPI: SRAT 000000008f7ef000 00430 (v02 INTEL S5520HC 00000001 MSFT 0100000D)
Aug 12 19:07:29 pmx3host kernel: ACPI: SPCR 000000008f7ee000 00050 (v01 INTEL S5520HC 00000000 MSFT 0100000D)
Aug 12 19:07:29 pmx3host kernel: ACPI: WDDT 000000008f7ed000 00040 (v01 INTEL S5520HC 00000000 MSFT 0100000D)
Aug 12 19:07:29 pmx3host kernel: ACPI: SSDT 000000008f7d2000 1AFC4 (v02 INTEL SSDT PM 00004000 INTL 20061109)
Aug 12 19:07:29 pmx3host kernel: ACPI: SSDT 000000008f7d1000 001D8 (v02 INTEL IPMI 00004000 INTL 20061109)
Aug 12 19:07:29 pmx3host kernel: ACPI: HEST 000000008f7d0000 000A8 (v01 INTEL S5520HC 00000001 INTL 00000001)
Aug 12 19:07:29 pmx3host kernel: ACPI: BERT 000000008f7cf000 00030 (v01 INTEL S5520HC 00000001 INTL 00000001)
Aug 12 19:07:29 pmx3host kernel: ACPI: ERST 000000008f6f0000 00230 (v01 INTEL S5520HC 00000001 INTL 00000001)
Aug 12 19:07:29 pmx3host kernel: ACPI: EINJ 000000008f6ef000 00130 (v01 INTEL S5520HC 00000001 INTL 00000001)
Aug 12 19:07:29 pmx3host kernel: ACPI: DMAR 000000008f6ee000 001C8 (v01 INTEL S5520HC 00000001 MSFT 0100000D)
Aug 12 19:07:29 pmx3host kernel: ACPI: Local APIC address 0xfee00000
Aug 12 19:07:29 pmx3host kernel: SRAT: PXM 0 -> APIC 0 -> Node 0
Aug 12 19:07:29 pmx3host kernel: SRAT: PXM 1 -> APIC 16 -> Node 1
Aug 12 19:07:29 pmx3host kernel: SRAT: PXM 0 -> APIC 2 -> Node 0
Aug 12 19:07:29 pmx3host kernel: SRAT: PXM 1 -> APIC 18 -> Node 1
Aug 12 19:07:29 pmx3host kernel: SRAT: PXM 0 -> APIC 4 -> Node 0
Aug 12 19:07:29 pmx3host kernel: SRAT: PXM 1 -> APIC 20 -> Node 1
Aug 12 19:07:29 pmx3host kernel: SRAT: PXM 0 -> APIC 6 -> Node 0
Aug 12 19:07:29 pmx3host kernel: SRAT: PXM 1 -> APIC 22 -> Node 1
Aug 12 19:07:29 pmx3host kernel: SRAT: PXM 0 -> APIC 1 -> Node 0
Aug 12 19:07:29 pmx3host kernel: SRAT: PXM 1 -> APIC 17 -> Node 1
Aug 12 19:07:29 pmx3host kernel: SRAT: PXM 0 -> APIC 3 -> Node 0
Aug 12 19:07:29 pmx3host kernel: SRAT: PXM 1 -> APIC 19 -> Node 1
Aug 12 19:07:29 pmx3host kernel: SRAT: PXM 0 -> APIC 5 -> Node 0
Aug 12 19:07:29 pmx3host kernel: SRAT: PXM 1 -> APIC 21 -> Node 1
Aug 12 19:07:29 pmx3host kernel: SRAT: PXM 0 -> APIC 7 -> Node 0






































































































































































Here is my pveversion -v

# pveversion -v
pve-manager: 2.3-13 (pve-manager/2.3/7946f1f1)
running kernel: 2.6.32-19-pve
proxmox-ve-2.6.32: 2.3-96
pve-kernel-2.6.32-19-pve: 2.6.32-96
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-4
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-36
qemu-server: 2.3-20
pve-firmware: 1.0-21
libpve-common-perl: 1.0-49
libpve-access-control: 1.0-26
libpve-storage-perl: 2.3-7
vncterm: 1.0-4
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.4-10
ksm-control-daemon: 1.1-1
#


Any ideas of what could be happening?

Regards,

 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!