high load after processor failed

sir.jan · May 1, 2013

Hello,

I've a strange problem after the upgrade from 2.1 to 2.3.

In my setup there are two Servers in a cluster. On of them
works without problems, but several times a day the other
server has a realy high load (3-5) and all clients on this host
are freezing.

The only solution to get the server working again is to do a restart.

This Server has a Raid5 on SAS-HDDs, after I added "elevator=deadline" to
grub the problem does not shown itself so often.

In syslog I found this logs, and I think after that the Server got these problems.

Code:

May  1 04:34:14 desokvm1 pvestatd[1934]: WARNING: closeing with write buffer at /usr/share/perl5/IO/Multiplex.pm line 913.
May  1 05:13:21 desokvm1 corosync[1565]:   [TOTEM ] A processor failed, forming new configuration.
May  1 05:13:30 desokvm1 corosync[1565]:   [CLM   ] CLM CONFIGURATION CHANGE
May  1 05:13:30 desokvm1 corosync[1565]:   [CLM   ] New Configuration:
May  1 05:13:30 desokvm1 corosync[1565]:   [CLM   ] #011r(0) ip(10.0.3.1) 
May  1 05:13:30 desokvm1 corosync[1565]:   [CLM   ] #011r(0) ip(10.0.3.2) 
May  1 05:13:30 desokvm1 corosync[1565]:   [CLM   ] Members Left:
May  1 05:13:30 desokvm1 corosync[1565]:   [CLM   ] Members Joined:
May  1 05:13:30 desokvm1 corosync[1565]:   [CLM   ] CLM CONFIGURATION CHANGE
May  1 05:13:30 desokvm1 corosync[1565]:   [CLM   ] New Configuration:
May  1 05:13:30 desokvm1 corosync[1565]:   [CLM   ] #011r(0) ip(10.0.3.1) 
May  1 05:13:30 desokvm1 corosync[1565]:   [CLM   ] #011r(0) ip(10.0.3.2) 
May  1 05:13:30 desokvm1 corosync[1565]:   [CLM   ] Members Left:
May  1 05:13:30 desokvm1 corosync[1565]:   [CLM   ] Members Joined:
May  1 05:13:30 desokvm1 corosync[1565]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
May  1 05:13:32 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 10
May  1 05:13:33 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 20
May  1 05:13:34 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 30
May  1 05:13:35 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 40
May  1 05:13:36 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 50
May  1 05:13:37 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 60
May  1 05:13:38 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 70
May  1 05:13:39 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 80
May  1 05:13:40 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 90
May  1 05:13:41 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 100
May  1 05:13:41 desokvm1 pmxcfs[1434]: [dcdb] notice: cpg_send_message retried 100 times
May  1 05:13:41 desokvm1 pmxcfs[1434]: [status] crit: cpg_send_message failed: 6
May  1 05:13:42 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 10
May  1 05:13:43 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 20
May  1 05:13:44 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 30
May  1 05:13:45 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 40
May  1 05:13:46 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 50
May  1 05:13:47 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 60
May  1 05:13:48 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 70
May  1 05:13:49 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 80
May  1 05:13:50 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 90
May  1 05:13:51 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 100
May  1 05:13:51 desokvm1 pmxcfs[1434]: [dcdb] notice: cpg_send_message retried 100 times
May  1 05:13:51 desokvm1 pmxcfs[1434]: [status] crit: cpg_send_message failed: 6
May  1 05:13:52 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 10
May  1 05:13:53 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 20
May  1 05:13:54 desokvm1 corosync[1565]:   [CPG   ] chosen downlist: sender r(0) ip(10.0.3.1) ; members(old:2 left:0)
May  1 05:13:54 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 30
May  1 05:13:55 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 40
May  1 05:13:56 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 50
May  1 05:13:57 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 60
May  1 05:13:58 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 70
May  1 05:13:59 desokvm1 corosync[1565]:   [MAIN  ] Completed service synchronization, ready to provide service.
May  1 05:13:59 desokvm1 pmxcfs[1434]: [dcdb] notice: cpg_send_message retried 75 times

Does anyone know what I can do to solve the problem?

hotwired007 · May 1, 2013

May 1 05:13:30 desokvm1 corosync[1565]: [TOTEM ] A processor joined or left the membership and a new membership was formed.

this message is always displayed in when a server in the cluster reboots.

Do you have a particular VM on that serevr that has a high load and drags down the rest?

dietmar · May 2, 2013

Any do you run latest version?

sir.jan · May 2, 2013

hotwired007 said:
this message is always displayed in when a server in the cluster reboots.

Do you have a particular VM on that serevr that has a high load and drags down the rest?

no, the servers that are running are "looking" good.

dietmar said:
Any do you run latest version?

Yes, I've installed the latest versions.

Code:

# pveversion -v
pve-manager: 2.3-13 (pve-manager/2.3/7946f1f1)
running kernel: 2.6.32-19-pve
proxmox-ve-2.6.32: 2.3-95
pve-kernel-2.6.32-19-pve: 2.6.32-95
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-4
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-36
qemu-server: 2.3-20
pve-firmware: 1.0-21
libpve-common-perl: 1.0-49
libpve-access-control: 1.0-26
libpve-storage-perl: 2.3-7
vncterm: 1.0-4
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.4-10
ksm-control-daemon: 1.1-1

sir.jan · May 3, 2013

nobody who can help me with this problem?

Search

Search

high load after processor failed

sir.jan

New Member

hotwired007

Member

dietmar

Proxmox Staff Member

sir.jan

New Member

sir.jan

New Member

We value your privacy