High Availibility Overhead

rpuglisi

Member
Sep 1, 2011
30
0
6
New Jersey
Hello,
I have two Intel servers running one container with this configuration:

pveversion -v
pve-manager: 2.1-14 (pve-manager/2.1/f32f3f46)
running kernel: 2.6.32-14-pve
proxmox-ve-2.6.32: 2.1-73
pve-kernel-2.6.32-13-pve: 2.6.32-72
pve-kernel-2.6.32-12-pve: 2.6.32-68
pve-kernel-2.6.32-14-pve: 2.6.32-73
pve-kernel-2.6.32-7-pve: 2.6.32-60
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.92-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.8-1
pve-cluster: 1.0-27
qemu-server: 2.0-48
pve-firmware: 1.0-18
libpve-common-perl: 1.0-30
libpve-access-control: 1.0-24
libpve-storage-perl: 2.0-30
vncterm: 1.0-2
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.1-7
ksm-control-daemon: 1.1-1

and we are getting ready to go live with this platform.

I have migrated the container to the other server and back, after which I have noticed that the system load on the servers is 1.00 or greater when there is no real processing going on. After much investigating, it seems to be connected to HA. I turned HA off on the container and the system load went down to 0.00. Does container HA place that much of a load on the system? Is this normal? Thanks.
Rich
 
Hello,
I have migrated the container to the other server and back, after which I have noticed that the system load on the servers is 1.00 or greater when there is no real processing going on. After much investigating, it seems to be connected to HA. I turned HA off on the container and the system load went down to 0.00. Does container HA place that much of a load on the system? Is this normal? Thanks.
Rich
That must be some kind of misconfiguration in your setup. I have one CT under HA and the node running it currently shows this for load: 0.04 0.01 0.00
 
That must be some kind of misconfiguration in your setup. I have one CT under HA and the node running it currently shows this for load: 0.04 0.01 0.00

This is my config:

<?xml version="1.0"?>
<cluster config_version="13" name="proxprod">
<cman expected_votes="1" keyfile="/var/lib/pve-cluster/corosync.authkey" two_node="1"/>
<dlm plock_ownership="1" plock_rate_limit="0"/>
<gfs_controld plock_rate_limit="0"/>
<clusternodes>
<clusternode name="prox1" nodeid="1" votes="1">
<fence>uptime
<method name="1">
<device action="reboot" name="ipmi1"/>
</method>
</fence>
</clusternode>
<clusternode name="prox2" nodeid="2" votes="1">
<fence>
<method name="1">
<device action="reboot" name="ipmi2"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice agent="fence_ipmilan" ipaddr="10.10.2.7" lanplus="1" login="root" name="ipmi1" passwd="password" power_wait="5"/>
<fencedevice agent="fence_ipmilan" ipaddr="10.10.2.8" lanplus="1" login="root" name="ipmi2" passwd="password" power_wait="5"/>
</fencedevices>
<rm>
<pvevm autostart="1" vmid="100"/>
</rm>
</cluster>

Pretty straight forward. After re-adding the container to HA, my server load with the one container seems to be normal:

uptime
09:37:40 up 2 days, 17:51, 1 user, load average: 0.00, 0.00, 0.00

Let me keep an eye on it.
 
Thanks Mir.

I'm still facing this issue. Strange. It's almost like something doesn't get cleaned up after the migration and/or it's looking for something.
 
Thanks Mir.

I'm still facing this issue. Strange. It's almost like something doesn't get cleaned up after the migration and/or it's looking for something.
To me it sounds like a process running and endless loop. Properly a network related service which cannot survive a migration - eg. a service tightly associated with either a hardware or memory address.
 
is the fencing working? i found that when the fencing was configured incorrectly i was getting a lot of extra load on each machine, what servers are they? mine were dell servers usig the impi function and i ahd to take the lanplus="1" option out of the ipmi_fence command.
 
is the fencing working? i found that when the fencing was configured incorrectly i was getting a lot of extra load on each machine, what servers are they? mine were dell servers usig the impi function and i ahd to take the lanplus="1" option out of the ipmi_fence command.

Very interesting. I believe it was when I tested it about 7 months ago. Now we have finally gone production with it. The Intel RMM (remote management fucnction) is not working. I have found out that it cannot be on the same subnet as the IPMI BMC. So I will need to change that.

They are Intel based white box servers with IPMI. Why did you have to remove the lanplus="1" option?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!