rgmanager is running but does not show in clutstat or pve->datacenter->summary

fcmatt · Dec 5, 2012

Hi all,

I am in the test stage of using proxmox.
I am running 3 servers in a cluster with HA.
It was working fine for a few days but then I made some changes to my switches.
Now I can no longer use HA with a test VM. For example a migrate fails with:

"Executing HA migrate for VM 100 to node proxmox1
Trying to migrate pvevm:100 to proxmox1...Could not connect to resource group manager
TASK ERROR: command 'clusvcadm -M pvevm:100 -m proxmox1' failed: exit code 1"

rgmanager is running on each server.
what is odd is that when i run clutstat I do not see it showing up.

I tried rebooting each server, one by one, but that did not help.
I have removed the test VM from being a HA VM and everything works fine including migration.

Here is some information from server1 which is identical to the others.

root@proxmox1:~# pvecm status
Version: 6.2.0
Config Version: 8
Cluster Name: xx-Srv-Cluster
Cluster Id: 28852
Cluster Member: Yes
Cluster Generation: 76
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Total votes: 3
Node votes: 1
Quorum: 2
Active subsystems: 6
Flags:
Ports Bound: 0
Node name: proxmox1
Node ID: 1
Multicast addresses: 239.192.112.37
Node addresses: 10.180.1.100

root@proxmox1:~# clustat
Cluster Status for xx-Srv-Cluster @ Tue Dec 4 19:28:26 2012
Member Status: Quorate

Member Name ID Status
------ ---- ---- ------
proxmox1 1 Online, Local
proxmox2 2 Online
proxmox3 3 Online

root@proxmox1:~# clustat -x
<?xml version="1.0"?>
<clustat version="4.1.1">
<cluster name="xx-Srv-Cluster" id="28852" generation="76"/>
<quorum quorate="1" groupmember="0"/>
<nodes>
<node name="proxmox1" state="1" local="1" estranged="0" rgmanager="0" rgmanager_master="0" qdisk="0" nodeid="0x00000001"/>
<node name="proxmox2" state="1" local="0" estranged="0" rgmanager="0" rgmanager_master="0" qdisk="0" nodeid="0x00000002"/>
<node name="proxmox3" state="1" local="0" estranged="0" rgmanager="0" rgmanager_master="0" qdisk="0" nodeid="0x00000003"/>
</nodes>
</clustat>

root@proxmox1:~# fence_tool ls
fence domain
member count 3
victim count 0
victim now 0
master nodeid 2
wait state none
members 1 2 3

root@proxmox1:~# cat /etc/pve/cluster.conf
<?xml version="1.0"?>
<cluster config_version="8" name="xx-Srv-Cluster">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<fencedevices>
<fencedevice agent="fence_ipmilan" ipaddr="10.180.0.100" lanplus="1" login="ADMIN" name="ipmi1" passwd="xxx" power_wait="5"/>
<fencedevice agent="fence_ipmilan" ipaddr="10.180.0.101" lanplus="1" login="ADMIN" name="ipmi2" passwd="xxx" power_wait="5"/>
<fencedevice agent="fence_ipmilan" ipaddr="10.180.0.102" lanplus="1" login="ADMIN" name="ipmi3" passwd="xxx" power_wait="5"/>
</fencedevices>
<clusternodes>
<clusternode name="proxmox1" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="ipmi1"/>
</method>
</fence>
</clusternode>
<clusternode name="proxmox2" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="ipmi2"/>
</method>
</fence>
</clusternode>
<clusternode name="proxmox3" nodeid="3" votes="1">
<fence>
<method name="1">
<device name="ipmi3"/>
</method>
</fence>
</clusternode>
</clusternodes>
<rm/>
</cluster>

root@proxmox1:~# pveversion -v
pve-manager: 2.2-31 (pve-manager/2.2/e94e95e9)
running kernel: 2.6.32-16-pve
proxmox-ve-2.6.32: 2.2-82
pve-kernel-2.6.32-16-pve: 2.6.32-82
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-33
qemu-server: 2.0-69
pve-firmware: 1.0-21
libpve-common-perl: 1.0-39
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.2-7
ksm-control-daemon: 1.1-1

When attempting to use /etc/init.d/rgmanager things just hang. Unless I kill rgmanager and then start it fresh.
root@proxmox1:~# /etc/init.d/rgmanager status
rgmanager (pid 67073 67071) is running...
root@proxmox1:~# ps auxwww| grep rgmanager
root 67071 0.0 0.0 32320 5716 ? S<Ls 19:08 0:00 rgmanager
root 67073 0.0 0.0 41096 1780 ? S<l 19:08 0:00 rgmanager
root@proxmox1:~# /etc/init.d/rgmanager restart
Stopping Cluster Service Manager: <<<<<<<<<<<<<<--- hangs

Please tell me if anyone requires more information to help or to take a stab at what I should try.
My logs in /var/log are just not giving any clue. It is like rgmanager is running but not doing a darn
thing.

I realize running HA requires a redundant and stable network but it would not be wise of me not to
mess around at this stage. And yes, my network is redundant except for multipath which I installed
but have not configured yet. Otherwise fencing is working, two switches, multicast tested, equallogic
iscsi with two cards in the chassis, minimum of 3 servers, multicast on its own vlan with a dedicated
nic, dual power supplies in each server, etc...

I am at the point I needed to post here for advice. Seems like a couple of others have run into a similar problem.
See this post: http://forum.proxmox.com/threads/9962-rgmanager-running-per-cli-but-not-pve?p=55904#post55904

thanks,
matt

dietmar · Dec 5, 2012

Any hint in /var/log/cluster/rgmanager.log ?

hotwired007 · Dec 5, 2012

Otherwise fencing is working

i had issues using IPMI for fencing whilst having lanplus="1" in my agent config - i'm running dell poweredge 1950s and 860s as soon as i removed it, everything worked fine although i've now moved over to a separate physical fence device as i have more control with it.

fcmatt · Dec 5, 2012

I did not find any glaring hint in rgmanager.log while trying to make rgmanager work again or before/after I messed around with my switches.
I thought by running it in the foreground in debug mode I might learn something useful but I got no output. I am going through the logs as
I type this but I cannot find anything interesting to share.

---

As for fencing you just helped me greatly. I must have set up this system 3-4 times now using proxmox 2.1 and 2.2 as I learned.
Over time I readjusted my network to suit my needs and I failed to change the IPs in my cluster.conf file to my ipmi interface's addresses.
I tested it about a week ago but you made me test again and I found my simple mistake.

root@proxmox1:/var/log/cluster# fence_ipmilan -l ADMIN -p xxxx -a 10.180.0.112 -o status -v
Getting status of IPMI:10.180.0.112...Spawning: '/usr/bin/ipmitool -I lan -H '10.180.0.112' -U 'ADMIN' -P '[set]' -v chassis power status'...
Chassis power = On
Done

---------- IP differs then my clsuter.conf which I failed to update two days ago.

Now I once again verified I have it setup right.

root@proxmox1:/var/log/cluster# fence_node proxmox3 -vv
fence proxmox3 dev 0.0 agent fence_ipmilan result: success
agent args: nodename=proxmox3 agent=fence_ipmilan ipaddr=10.180.0.112 lanplus=1 login=ADMIN passwd=xxxxxx power_wait=5
fence proxmox3 success

root@proxmox3:~# uptime
10:57:13 up 6 min, 1 user, load average: 0.00, 0.08, 0.06

So thank you for mentioning it. I am running monster supermicro boxes. I forget the model name but xeon 4 cpu - 24 core/192 gig ram machines. With that ipmi
card you insert into the MB and run a cable to a NIC interface you install at the back of the box.

--------------------------------

I have solved the problem in a not a very satisfactory way. I rebooted all 3 servers at the same time and they came up correctly.
Rebooting one by one did not help. It was this post I found online which gave me the idea.

http://www.spinics.net/lists/cluster/msg20335.html

I don't know what to tell you... but I am back up and running properly.

fcmatt · Dec 6, 2012

I realize this is a moderated forum, which by the way is hard to tell when you first post and you get the next page which tells you this and it only shows for a few seconds, but my last post to this thread with what I did to fix it is not here.

Why is this? Did I not post my last message properly? Was there something about my last post that caused it not to be posted?

thanks,
fcmatt

ejc317 · Dec 11, 2012

The lanplus setting won't work on dell servers unless you go into the settings in the DRAC / IPMI settings page and enable it over lan fyi.

Did you restart pvestatd?

Search

Search

rgmanager is running but does not show in clutstat or pve->datacenter->summary

fcmatt

New Member

dietmar

Proxmox Staff Member

hotwired007

Member

fcmatt

New Member

fcmatt

New Member

ejc317

Member