High Availibility Overhead

rpuglisi · Aug 24, 2012

Hello,
I have two Intel servers running one container with this configuration:

pveversion -v
pve-manager: 2.1-14 (pve-manager/2.1/f32f3f46)
running kernel: 2.6.32-14-pve
proxmox-ve-2.6.32: 2.1-73
pve-kernel-2.6.32-13-pve: 2.6.32-72
pve-kernel-2.6.32-12-pve: 2.6.32-68
pve-kernel-2.6.32-14-pve: 2.6.32-73
pve-kernel-2.6.32-7-pve: 2.6.32-60
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.92-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.8-1
pve-cluster: 1.0-27
qemu-server: 2.0-48
pve-firmware: 1.0-18
libpve-common-perl: 1.0-30
libpve-access-control: 1.0-24
libpve-storage-perl: 2.0-30
vncterm: 1.0-2
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.1-7
ksm-control-daemon: 1.1-1

and we are getting ready to go live with this platform.

I have migrated the container to the other server and back, after which I have noticed that the system load on the servers is 1.00 or greater when there is no real processing going on. After much investigating, it seems to be connected to HA. I turned HA off on the container and the system load went down to 0.00. Does container HA place that much of a load on the system? Is this normal? Thanks.
Rich

mir · Aug 24, 2012

rpuglisi said:
Hello,
I have migrated the container to the other server and back, after which I have noticed that the system load on the servers is 1.00 or greater when there is no real processing going on. After much investigating, it seems to be connected to HA. I turned HA off on the container and the system load went down to 0.00. Does container HA place that much of a load on the system? Is this normal? Thanks.
Rich

That must be some kind of misconfiguration in your setup. I have one CT under HA and the node running it currently shows this for load: 0.04 0.01 0.00

tom · Aug 25, 2012

maybe this one?
http://bugzilla.openvz.org/show_bug.cgi?id=2345

rpuglisi · Aug 27, 2012

mir said:
That must be some kind of misconfiguration in your setup. I have one CT under HA and the node running it currently shows this for load: 0.04 0.01 0.00

This is my config:

<?xml version="1.0"?>
<cluster config_version="13" name="proxprod">
<cman expected_votes="1" keyfile="/var/lib/pve-cluster/corosync.authkey" two_node="1"/>
<dlm plock_ownership="1" plock_rate_limit="0"/>
<gfs_controld plock_rate_limit="0"/>
<clusternodes>
<clusternode name="prox1" nodeid="1" votes="1">
<fence>uptime
<method name="1">
<device action="reboot" name="ipmi1"/>
</method>
</fence>
</clusternode>
<clusternode name="prox2" nodeid="2" votes="1">
<fence>
<method name="1">
<device action="reboot" name="ipmi2"/>
</method>
</fence>
</clusternode>
</clusternodes>
<fencedevices>
<fencedevice agent="fence_ipmilan" ipaddr="10.10.2.7" lanplus="1" login="root" name="ipmi1" passwd="password" power_wait="5"/>
<fencedevice agent="fence_ipmilan" ipaddr="10.10.2.8" lanplus="1" login="root" name="ipmi2" passwd="password" power_wait="5"/>
</fencedevices>
<rm>
<pvevm autostart="1" vmid="100"/>
</rm>
</cluster>

Pretty straight forward. After re-adding the container to HA, my server load with the one container seems to be normal:

uptime
09:37:40 up 2 days, 17:51, 1 user, load average: 0.00, 0.00, 0.00

Let me keep an eye on it.

rpuglisi · Aug 27, 2012

nscd is not running.

rpuglisi · Oct 2, 2012

mir said:
That must be some kind of misconfiguration in your setup. I have one CT under HA and the node running it currently shows this for load: 0.04 0.01 0.00

Mir,
Have you tried to migrate the container to another node?
Rich

mir · Oct 2, 2012

rpuglisi said:
Mir,
Have you tried to migrate the container to another node?
Rich

Yes, both forced and automatic. No difference in load.

rpuglisi · Oct 2, 2012

mir said:
Yes, both forced and automatic. No difference in load.

What about via the web interface migrate function or the vzmigrate command?

mir · Oct 2, 2012

rpuglisi said:
What about via the web interface migrate function or the vzmigrate command?

I have only used the web interface for forced migration and automatic is done when a server is taken down for service or the like.

rpuglisi · Oct 2, 2012

Thanks Mir.

I'm still facing this issue. Strange. It's almost like something doesn't get cleaned up after the migration and/or it's looking for something.

mir · Oct 2, 2012

rpuglisi said:
Thanks Mir.

I'm still facing this issue. Strange. It's almost like something doesn't get cleaned up after the migration and/or it's looking for something.

To me it sounds like a process running and endless loop. Properly a network related service which cannot survive a migration - eg. a service tightly associated with either a hardware or memory address.

hotwired007 · Oct 3, 2012

is the fencing working? i found that when the fencing was configured incorrectly i was getting a lot of extra load on each machine, what servers are they? mine were dell servers usig the impi function and i ahd to take the lanplus="1" option out of the ipmi_fence command.

rpuglisi · Oct 3, 2012

hotwired007 said:
is the fencing working? i found that when the fencing was configured incorrectly i was getting a lot of extra load on each machine, what servers are they? mine were dell servers usig the impi function and i ahd to take the lanplus="1" option out of the ipmi_fence command.

Very interesting. I believe it was when I tested it about 7 months ago. Now we have finally gone production with it. The Intel RMM (remote management fucnction) is not working. I have found out that it cannot be on the same subnet as the IPMI BMC. So I will need to change that.

They are Intel based white box servers with IPMI. Why did you have to remove the lanplus="1" option?

Search

Search

High Availibility Overhead

rpuglisi

Member

mir

Famous Member

tom

Proxmox Staff Member

rpuglisi

Member

rpuglisi

Member

rpuglisi

Member

mir

Famous Member

rpuglisi

Member

mir

Famous Member

rpuglisi

Member

mir

Famous Member

hotwired007

Member

rpuglisi

Member

We value your privacy