Proxmox Crash (hung_task_timeout_secs for corosync)

Alex66955

New Member
Mar 3, 2016
6
0
1
36
Hey,

I have to evaluate ProxmoxVe for my company and for that I build a small test environment on our productive server.

The setup:
Because of the resource limitations I build a nested test environment. The ProxmoxVE-Server is running on a QEMU Host hypervisor. I give the ProxmoxVe VM the host cpu with all flags.
I build a small cluster with an additional laptop for testing cases.
There are about 4 LXC's and 2 VM's running..

The Problem:
I get every night a Proxmox crash (not on the laptop node).
  • The virtual machines are not responding
  • the ProxmoxVE webinterface is not responding
  • The ssh connection to the ProxmoxVe-Server works

More details:
  • Kernel Output
Code:
Apr  8 03:51:32 vp-proxmoxS2 systemd-timesyncd[1634]: interval/delta/delay/jitter/drift 2048s/+0.012s/0.056s/0.017s/+18ppm
Apr  8 03:55:01 vp-proxmoxS2 CRON[23033]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Apr  8 04:05:01 vp-proxmoxS2 CRON[23908]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Apr  8 04:06:39 vp-proxmoxS2 kernel: [71608.447233] INFO: task corosync:2281 blocked for more than 600 seconds.
Apr  8 04:06:39 vp-proxmoxS2 kernel: [71608.449373]       Tainted: P           O    4.2.8-1-pve #1
Apr  8 04:06:39 vp-proxmoxS2 kernel: [71608.449549] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr  8 04:06:39 vp-proxmoxS2 kernel: [71608.449743] corosync        D ffff8800babf0000     0  2281      1 0x00000000
Apr  8 04:06:39 vp-proxmoxS2 kernel: [71608.449751]  ffff8800ba647e78 0000000000000086 ffff880232b59b80 ffff8800babf0000
Apr  8 04:06:39 vp-proxmoxS2 kernel: [71608.449757]  ffff8800babf0000 ffff8800ba648000 ffff8800ba647ee8 ffffffff821051e0
Apr  8 04:06:39 vp-proxmoxS2 kernel: [71608.449761]  00000000000004e0 000055a3289c19d0 ffff8800ba647e98 ffffffff818069f7
Apr  8 04:06:39 vp-proxmoxS2 kernel: [71608.449766] Call Trace:
Apr  8 04:06:39 vp-proxmoxS2 kernel: [71608.451187]  [<ffffffff818069f7>] schedule+0x37/0x80
Apr  8 04:06:39 vp-proxmoxS2 kernel: [71608.452381]  [<ffffffff8105f976>] kvm_async_pf_task_wait+0x1a6/0x230
Apr  8 04:06:39 vp-proxmoxS2 kernel: [71608.452747]  [<ffffffff810a66f0>] ? wake_up_q+0x70/0x70
  • On the host machine there are several "backup" cronjobs at this time. Can this cause this issue?
  • Host sar output at the crashtime 04:05:01 (idle0% and kbdirty falling and %guest 99,60% on crashtime)
Code:
00:00:01        CPU      %usr     %nice      %sys   %iowait    %steal      %irq     %soft    %guest    %gnice     %idle
03:55:01         15      0,35      0,00      1,35      2,10      0,00      0,00      0,01      1,29      0,00     94,91
04:05:01        all      0,55      0,00      2,85      1,52      0,00      0,00      0,02     11,64      0,00     83,42
04:05:01          0      0,27      0,00      4,78      2,59      0,00      0,00      0,06      0,36      0,00     91,94
04:05:01          1      2,02      0,00      3,65      1,79      0,00      0,00      0,03     16,05      0,00     76,46
04:05:01          2      1,32      0,00      4,07      2,21      0,00      0,00      0,03      4,65      0,00     87,72
04:05:01          3      0,00      0,00      0,49      0,00      0,00      0,00      0,02     99,49      0,00      0,00
04:05:01          4      1,06      0,00      5,79      2,52      0,00      0,00      0,04      0,53      0,00     90,06
04:05:01          5      0,94      0,00      5,70      2,22      0,00      0,00      0,03      2,91      0,00     88,21
04:05:01          6      0,18      0,00      2,52      1,78      0,00      0,00      0,02     21,93      0,00     73,55
04:05:01          7      0,56      0,00      4,93      3,09      0,00      0,00      0,03      4,90      0,00     86,50

00:00:01    kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
03:55:01       181380  32724272     99,45   4877488  20447132  30088364     73,90  11686132  16496504    738576
04:05:01       179052  32726600     99,46   4974528  20621728  30086092     73,89  11779152  16406220   1074908
04:15:01       175808  32729844     99,47   5070740  20698664  30084200     73,89   9519196  18671356    808932
04:25:01       175564  32730088     99,47   5156956  20689968  30117036     73,97  11588312  16584712    755108
04:35:01       178180  32727472     99,46   5378360  20408392  30085616     73,89  11836064  16293196     63884
04:45:01       172920  32732732     99,47   5527116  20143312  30116620     73,97   9976676  17980968     96348
04:55:01      2139548  30766104     93,50   5556120  18294352  30080700     73,88   9869980  16258724       224
05:05:01      2031256  30874396     93,83   5575984  18394112  30059260     73,83   9962708  16273540       352

Code:
proxmox-ve: 4.1-41 (running kernel: 4.2.8-1-pve)
pve-manager: 4.1-22 (running version: 4.1-22/aca130cf)
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.2.8-1-pve: 4.2.8-41
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-36
qemu-server: 4.0-64
pve-firmware: 1.1-7
libpve-common-perl: 4.0-54
libpve-access-control: 4.0-13
libpve-storage-perl: 4.0-45
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-9
pve-container: 1.0-52
pve-firewall: 2.0-22
pve-ha-manager: 1.0-25
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve7~jessie
openvswitch-switch: 2.3.2-2

Solving Tries
  • vm.dirty_background_ratio = 5
  • vm.dirty_ratio = 10
 

Attachments

  • ps_faxl_proxmox.txt
    58.9 KB · Views: 1
  • sar_output_host.txt
    331.8 KB · Views: 0
  • sar_output_proxmox.txt
    200.1 KB · Views: 1
  • syslog-crash_proxmox.txt
    196.8 KB · Views: 1
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!