Same problem started happening on another lxc node:
syslog just before crashing:
Feb 13 18:39:19 dx411-s07 kernel: [1164719.968823] Modules linked in: nfnetlink_queue act_police cls_basic sch_ingress sch_htb bluetooth dccp_diag dccp udp_diag nf_log_ipv6 xt_hl ip6t_rt dm_snapshot xt_recent...
I only saw it happening when there were some heavily loaded lxcs (100% of their allocated cpu and oom-killer killing processes as they were out of ram). I run many kvm nodes which had no problems and the hardware is identic.
Hi nseba
I tried "sysctl -w kernel.nmi_watchdog=0" but without any results. Made sure the hardware watchdog is also disabled.
I ended up keeping the cpu load on the node below 10% and moving the customers that used 100% of their cpu resources(which i believe caused the crash) to a single node...
Hi
I'm getting the following error:
can not fork user shell: Resource temporarily unavailable
when i try to run "su user" inside a lxc centos6 container.
Any idea what can cause it?
Since few days ago, daily, i'm getting such errors. Once they start, in few minutes, the whole node crashes. I cannot run any commands, i was logged from both idrac console and ssh. The only solution so far is a reboot (but this affects my uptime a lot)
Message from syslogd@dx411-s09 at Feb 12...
I saw this only happening on lxc nodes (completly taking them down)
From the logs, i only saw:
Feb 6 19:50:50 dx411-s11 kernel: [564033.729200] 0000000000000286 00000000b1327e45 ffff8829c95e3c90 ffffffff813f9523
Feb 6 19:50:50 dx411-s11 kernel: [564033.729208] ffff8829c95e3cc8...
Similar issue happened to me today after i reinstalled a node. Thanks ajhobbs for pointing the direction.
Solution:
1. Removing offending ssh key from the old node
ssh-keygen -f "/root/.ssh/known_hosts" -R 10.0.1.113 (or hostname)
2. Getting the new key.
ssh 10.0.1.113
3. Running pvecm...
Somehow processes from this container are still running, even if it shows as stopped.
My very RAW solution:
<?php
$lxc_id=917;
$z=shell_exec("ps aux|awk '{print $2}'");
$f=explode("\n",$z);
foreach($f as $k){
$k=trim($k);
if($k<1) continue;
$a=shell_exec("grep lxc\/".$lxc_id."/...
Same issue, not related to firewall, this is what i see in the logs:
Dec 29 01:24:39 dx411-s05 pmxcfs[2512]: [status] crit: cpg_send_message failed: 2
Dec 29 01:24:39 dx411-s05 pmxcfs[2512]: [status] crit: cpg_send_message failed: 2
Dec 29 01:24:39 dx411-s05 pmxcfs[2512]: [status] crit...
Currently, in syslog i see only:
Dec 23 05:05:39 dx411-s04 pmxcfs[2923]: [status] notice: received log
Dec 23 05:05:55 dx411-s04 pmxcfs[2923]: [status] notice: received log
Dec 23 05:06:03 dx411-s04 pmxcfs[2923]: [status] notice: received log
Dec 23 05:06:39 dx411-s04 pmxcfs[2923]: [status]...
I'm running the latest proxmox version on a 6 node cluster.
Today and yesterday, the cluster degraded and in all the nodes, i see this messages:
Dec 23 04:39:06 dx411-s06 kernel: [82074.269775] INFO: task pve-firewall:2668 blocked for more than 120 seconds.
Dec 23 04:39:06 dx411-s06 kernel...
I'm having the same issue:
http://prntscr.com/dm8m0u
Proxmox 4.4, lxc, unprivileged container, centos-7-default_20161207_amd64.tar.xz
For the filesystem rpm, the following command resolves it:
echo "%_netsharedpath /sys:/proc" >> /etc/rpm/macros.dist; yum -y update
But i'm still stuck with...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.