Hey,
I have to evaluate ProxmoxVe for my company and for that I build a small test environment on our productive server.
The setup:
Because of the resource limitations I build a nested test environment. The ProxmoxVE-Server is running on a QEMU Host hypervisor. I give the ProxmoxVe VM the host cpu with all flags.
I build a small cluster with an additional laptop for testing cases.
There are about 4 LXC's and 2 VM's running..
The Problem:
I get every night a Proxmox crash (not on the laptop node).
More details:
Solving Tries
I have to evaluate ProxmoxVe for my company and for that I build a small test environment on our productive server.
The setup:
Because of the resource limitations I build a nested test environment. The ProxmoxVE-Server is running on a QEMU Host hypervisor. I give the ProxmoxVe VM the host cpu with all flags.
I build a small cluster with an additional laptop for testing cases.
There are about 4 LXC's and 2 VM's running..
The Problem:
I get every night a Proxmox crash (not on the laptop node).
- The virtual machines are not responding
- the ProxmoxVE webinterface is not responding
- The ssh connection to the ProxmoxVe-Server works
More details:
- Kernel Output
Code:
Apr 8 03:51:32 vp-proxmoxS2 systemd-timesyncd[1634]: interval/delta/delay/jitter/drift 2048s/+0.012s/0.056s/0.017s/+18ppm
Apr 8 03:55:01 vp-proxmoxS2 CRON[23033]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Apr 8 04:05:01 vp-proxmoxS2 CRON[23908]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Apr 8 04:06:39 vp-proxmoxS2 kernel: [71608.447233] INFO: task corosync:2281 blocked for more than 600 seconds.
Apr 8 04:06:39 vp-proxmoxS2 kernel: [71608.449373] Tainted: P O 4.2.8-1-pve #1
Apr 8 04:06:39 vp-proxmoxS2 kernel: [71608.449549] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 8 04:06:39 vp-proxmoxS2 kernel: [71608.449743] corosync D ffff8800babf0000 0 2281 1 0x00000000
Apr 8 04:06:39 vp-proxmoxS2 kernel: [71608.449751] ffff8800ba647e78 0000000000000086 ffff880232b59b80 ffff8800babf0000
Apr 8 04:06:39 vp-proxmoxS2 kernel: [71608.449757] ffff8800babf0000 ffff8800ba648000 ffff8800ba647ee8 ffffffff821051e0
Apr 8 04:06:39 vp-proxmoxS2 kernel: [71608.449761] 00000000000004e0 000055a3289c19d0 ffff8800ba647e98 ffffffff818069f7
Apr 8 04:06:39 vp-proxmoxS2 kernel: [71608.449766] Call Trace:
Apr 8 04:06:39 vp-proxmoxS2 kernel: [71608.451187] [<ffffffff818069f7>] schedule+0x37/0x80
Apr 8 04:06:39 vp-proxmoxS2 kernel: [71608.452381] [<ffffffff8105f976>] kvm_async_pf_task_wait+0x1a6/0x230
Apr 8 04:06:39 vp-proxmoxS2 kernel: [71608.452747] [<ffffffff810a66f0>] ? wake_up_q+0x70/0x70
- On the host machine there are several "backup" cronjobs at this time. Can this cause this issue?
- Host sar output at the crashtime 04:05:01 (idle0% and kbdirty falling and %guest 99,60% on crashtime)
Code:
00:00:01 CPU %usr %nice %sys %iowait %steal %irq %soft %guest %gnice %idle
03:55:01 15 0,35 0,00 1,35 2,10 0,00 0,00 0,01 1,29 0,00 94,91
04:05:01 all 0,55 0,00 2,85 1,52 0,00 0,00 0,02 11,64 0,00 83,42
04:05:01 0 0,27 0,00 4,78 2,59 0,00 0,00 0,06 0,36 0,00 91,94
04:05:01 1 2,02 0,00 3,65 1,79 0,00 0,00 0,03 16,05 0,00 76,46
04:05:01 2 1,32 0,00 4,07 2,21 0,00 0,00 0,03 4,65 0,00 87,72
04:05:01 3 0,00 0,00 0,49 0,00 0,00 0,00 0,02 99,49 0,00 0,00
04:05:01 4 1,06 0,00 5,79 2,52 0,00 0,00 0,04 0,53 0,00 90,06
04:05:01 5 0,94 0,00 5,70 2,22 0,00 0,00 0,03 2,91 0,00 88,21
04:05:01 6 0,18 0,00 2,52 1,78 0,00 0,00 0,02 21,93 0,00 73,55
04:05:01 7 0,56 0,00 4,93 3,09 0,00 0,00 0,03 4,90 0,00 86,50
00:00:01 kbmemfree kbmemused %memused kbbuffers kbcached kbcommit %commit kbactive kbinact kbdirty
03:55:01 181380 32724272 99,45 4877488 20447132 30088364 73,90 11686132 16496504 738576
04:05:01 179052 32726600 99,46 4974528 20621728 30086092 73,89 11779152 16406220 1074908
04:15:01 175808 32729844 99,47 5070740 20698664 30084200 73,89 9519196 18671356 808932
04:25:01 175564 32730088 99,47 5156956 20689968 30117036 73,97 11588312 16584712 755108
04:35:01 178180 32727472 99,46 5378360 20408392 30085616 73,89 11836064 16293196 63884
04:45:01 172920 32732732 99,47 5527116 20143312 30116620 73,97 9976676 17980968 96348
04:55:01 2139548 30766104 93,50 5556120 18294352 30080700 73,88 9869980 16258724 224
05:05:01 2031256 30874396 93,83 5575984 18394112 30059260 73,83 9962708 16273540 352
Code:
proxmox-ve: 4.1-41 (running kernel: 4.2.8-1-pve)
pve-manager: 4.1-22 (running version: 4.1-22/aca130cf)
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.2.8-1-pve: 4.2.8-41
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-36
qemu-server: 4.0-64
pve-firmware: 1.1-7
libpve-common-perl: 4.0-54
libpve-access-control: 4.0-13
libpve-storage-perl: 4.0-45
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-9
pve-container: 1.0-52
pve-firewall: 2.0-22
pve-ha-manager: 1.0-25
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve7~jessie
openvswitch-switch: 2.3.2-2
Solving Tries
- vm.dirty_background_ratio = 5
- vm.dirty_ratio = 10
Attachments
Last edited: