high load after processor failed

sir.jan

New Member
Jan 2, 2012
7
0
1
Germany
Hello,

I've a strange problem after the upgrade from 2.1 to 2.3.

In my setup there are two Servers in a cluster. On of them
works without problems, but several times a day the other
server has a realy high load (3-5) and all clients on this host
are freezing.

The only solution to get the server working again is to do a restart.

This Server has a Raid5 on SAS-HDDs, after I added "elevator=deadline" to
grub the problem does not shown itself so often.

In syslog I found this logs, and I think after that the Server got these problems.

Code:
May  1 04:34:14 desokvm1 pvestatd[1934]: WARNING: closeing with write buffer at /usr/share/perl5/IO/Multiplex.pm line 913.
May  1 05:13:21 desokvm1 corosync[1565]:   [TOTEM ] A processor failed, forming new configuration.
May  1 05:13:30 desokvm1 corosync[1565]:   [CLM   ] CLM CONFIGURATION CHANGE
May  1 05:13:30 desokvm1 corosync[1565]:   [CLM   ] New Configuration:
May  1 05:13:30 desokvm1 corosync[1565]:   [CLM   ] #011r(0) ip(10.0.3.1) 
May  1 05:13:30 desokvm1 corosync[1565]:   [CLM   ] #011r(0) ip(10.0.3.2) 
May  1 05:13:30 desokvm1 corosync[1565]:   [CLM   ] Members Left:
May  1 05:13:30 desokvm1 corosync[1565]:   [CLM   ] Members Joined:
May  1 05:13:30 desokvm1 corosync[1565]:   [CLM   ] CLM CONFIGURATION CHANGE
May  1 05:13:30 desokvm1 corosync[1565]:   [CLM   ] New Configuration:
May  1 05:13:30 desokvm1 corosync[1565]:   [CLM   ] #011r(0) ip(10.0.3.1) 
May  1 05:13:30 desokvm1 corosync[1565]:   [CLM   ] #011r(0) ip(10.0.3.2) 
May  1 05:13:30 desokvm1 corosync[1565]:   [CLM   ] Members Left:
May  1 05:13:30 desokvm1 corosync[1565]:   [CLM   ] Members Joined:
May  1 05:13:30 desokvm1 corosync[1565]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
May  1 05:13:32 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 10
May  1 05:13:33 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 20
May  1 05:13:34 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 30
May  1 05:13:35 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 40
May  1 05:13:36 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 50
May  1 05:13:37 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 60
May  1 05:13:38 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 70
May  1 05:13:39 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 80
May  1 05:13:40 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 90
May  1 05:13:41 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 100
May  1 05:13:41 desokvm1 pmxcfs[1434]: [dcdb] notice: cpg_send_message retried 100 times
May  1 05:13:41 desokvm1 pmxcfs[1434]: [status] crit: cpg_send_message failed: 6
May  1 05:13:42 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 10
May  1 05:13:43 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 20
May  1 05:13:44 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 30
May  1 05:13:45 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 40
May  1 05:13:46 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 50
May  1 05:13:47 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 60
May  1 05:13:48 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 70
May  1 05:13:49 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 80
May  1 05:13:50 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 90
May  1 05:13:51 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 100
May  1 05:13:51 desokvm1 pmxcfs[1434]: [dcdb] notice: cpg_send_message retried 100 times
May  1 05:13:51 desokvm1 pmxcfs[1434]: [status] crit: cpg_send_message failed: 6
May  1 05:13:52 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 10
May  1 05:13:53 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 20
May  1 05:13:54 desokvm1 corosync[1565]:   [CPG   ] chosen downlist: sender r(0) ip(10.0.3.1) ; members(old:2 left:0)
May  1 05:13:54 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 30
May  1 05:13:55 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 40
May  1 05:13:56 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 50
May  1 05:13:57 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 60
May  1 05:13:58 desokvm1 pmxcfs[1434]: [status] notice: cpg_send_message retry 70
May  1 05:13:59 desokvm1 corosync[1565]:   [MAIN  ] Completed service synchronization, ready to provide service.
May  1 05:13:59 desokvm1 pmxcfs[1434]: [dcdb] notice: cpg_send_message retried 75 times

Does anyone know what I can do to solve the problem?
 
May 1 05:13:30 desokvm1 corosync[1565]: [TOTEM ] A processor joined or left the membership and a new membership was formed.

this message is always displayed in when a server in the cluster reboots.

Do you have a particular VM on that serevr that has a high load and drags down the rest?
 
this message is always displayed in when a server in the cluster reboots.

Do you have a particular VM on that serevr that has a high load and drags down the rest?

no, the servers that are running are "looking" good.

Any do you run latest version?
Yes, I've installed the latest versions.

Code:
# pveversion -v
pve-manager: 2.3-13 (pve-manager/2.3/7946f1f1)
running kernel: 2.6.32-19-pve
proxmox-ve-2.6.32: 2.3-95
pve-kernel-2.6.32-19-pve: 2.6.32-95
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-4
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-36
qemu-server: 2.3-20
pve-firmware: 1.0-21
libpve-common-perl: 1.0-49
libpve-access-control: 1.0-26
libpve-storage-perl: 2.3-7
vncterm: 1.0-4
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.4-10
ksm-control-daemon: 1.1-1
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!