rgmanager is running but does not show in clutstat or pve->datacenter->summary

fcmatt

New Member
Dec 5, 2012
3
0
1
Hi all,

I am in the test stage of using proxmox.
I am running 3 servers in a cluster with HA.
It was working fine for a few days but then I made some changes to my switches.
Now I can no longer use HA with a test VM. For example a migrate fails with:

"Executing HA migrate for VM 100 to node proxmox1
Trying to migrate pvevm:100 to proxmox1...Could not connect to resource group manager
TASK ERROR: command 'clusvcadm -M pvevm:100 -m proxmox1' failed: exit code 1"

rgmanager is running on each server.
what is odd is that when i run clutstat I do not see it showing up.

I tried rebooting each server, one by one, but that did not help.
I have removed the test VM from being a HA VM and everything works fine including migration.

Here is some information from server1 which is identical to the others.

root@proxmox1:~# pvecm status
Version: 6.2.0
Config Version: 8
Cluster Name: xx-Srv-Cluster
Cluster Id: 28852
Cluster Member: Yes
Cluster Generation: 76
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Total votes: 3
Node votes: 1
Quorum: 2
Active subsystems: 6
Flags:
Ports Bound: 0
Node name: proxmox1
Node ID: 1
Multicast addresses: 239.192.112.37
Node addresses: 10.180.1.100


root@proxmox1:~# clustat
Cluster Status for xx-Srv-Cluster @ Tue Dec 4 19:28:26 2012
Member Status: Quorate


Member Name ID Status
------ ---- ---- ------
proxmox1 1 Online, Local
proxmox2 2 Online
proxmox3 3 Online

root@proxmox1:~# clustat -x
<?xml version="1.0"?>
<clustat version="4.1.1">
<cluster name="xx-Srv-Cluster" id="28852" generation="76"/>
<quorum quorate="1" groupmember="0"/>
<nodes>
<node name="proxmox1" state="1" local="1" estranged="0" rgmanager="0" rgmanager_master="0" qdisk="0" nodeid="0x00000001"/>
<node name="proxmox2" state="1" local="0" estranged="0" rgmanager="0" rgmanager_master="0" qdisk="0" nodeid="0x00000002"/>
<node name="proxmox3" state="1" local="0" estranged="0" rgmanager="0" rgmanager_master="0" qdisk="0" nodeid="0x00000003"/>
</nodes>
</clustat>



root@proxmox1:~# fence_tool ls
fence domain
member count 3
victim count 0
victim now 0
master nodeid 2
wait state none
members 1 2 3





root@proxmox1:~# cat /etc/pve/cluster.conf
<?xml version="1.0"?>
<cluster config_version="8" name="xx-Srv-Cluster">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<fencedevices>
<fencedevice agent="fence_ipmilan" ipaddr="10.180.0.100" lanplus="1" login="ADMIN" name="ipmi1" passwd="xxx" power_wait="5"/>
<fencedevice agent="fence_ipmilan" ipaddr="10.180.0.101" lanplus="1" login="ADMIN" name="ipmi2" passwd="xxx" power_wait="5"/>
<fencedevice agent="fence_ipmilan" ipaddr="10.180.0.102" lanplus="1" login="ADMIN" name="ipmi3" passwd="xxx" power_wait="5"/>
</fencedevices>
<clusternodes>
<clusternode name="proxmox1" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="ipmi1"/>
</method>
</fence>
</clusternode>
<clusternode name="proxmox2" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="ipmi2"/>
</method>
</fence>
</clusternode>
<clusternode name="proxmox3" nodeid="3" votes="1">
<fence>
<method name="1">
<device name="ipmi3"/>
</method>
</fence>
</clusternode>
</clusternodes>
<rm/>
</cluster>






root@proxmox1:~# pveversion -v
pve-manager: 2.2-31 (pve-manager/2.2/e94e95e9)
running kernel: 2.6.32-16-pve
proxmox-ve-2.6.32: 2.2-82
pve-kernel-2.6.32-16-pve: 2.6.32-82
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-33
qemu-server: 2.0-69
pve-firmware: 1.0-21
libpve-common-perl: 1.0-39
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.2-7
ksm-control-daemon: 1.1-1


When attempting to use /etc/init.d/rgmanager things just hang. Unless I kill rgmanager and then start it fresh.
root@proxmox1:~# /etc/init.d/rgmanager status
rgmanager (pid 67073 67071) is running...
root@proxmox1:~# ps auxwww| grep rgmanager
root 67071 0.0 0.0 32320 5716 ? S<Ls 19:08 0:00 rgmanager
root 67073 0.0 0.0 41096 1780 ? S<l 19:08 0:00 rgmanager
root@proxmox1:~# /etc/init.d/rgmanager restart
Stopping Cluster Service Manager: <<<<<<<<<<<<<<--- hangs






Please tell me if anyone requires more information to help or to take a stab at what I should try.
My logs in /var/log are just not giving any clue. It is like rgmanager is running but not doing a darn
thing.

I realize running HA requires a redundant and stable network but it would not be wise of me not to
mess around at this stage. And yes, my network is redundant except for multipath which I installed
but have not configured yet. Otherwise fencing is working, two switches, multicast tested, equallogic
iscsi with two cards in the chassis, minimum of 3 servers, multicast on its own vlan with a dedicated
nic, dual power supplies in each server, etc...

I am at the point I needed to post here for advice. Seems like a couple of others have run into a similar problem.
See this post: http://forum.proxmox.com/threads/9962-rgmanager-running-per-cli-but-not-pve?p=55904#post55904

thanks,
matt
 
Otherwise fencing is working

i had issues using IPMI for fencing whilst having lanplus="1" in my agent config - i'm running dell poweredge 1950s and 860s as soon as i removed it, everything worked fine although i've now moved over to a separate physical fence device as i have more control with it.
 
I did not find any glaring hint in rgmanager.log while trying to make rgmanager work again or before/after I messed around with my switches.
I thought by running it in the foreground in debug mode I might learn something useful but I got no output. I am going through the logs as
I type this but I cannot find anything interesting to share.

---

As for fencing you just helped me greatly. I must have set up this system 3-4 times now using proxmox 2.1 and 2.2 as I learned.
Over time I readjusted my network to suit my needs and I failed to change the IPs in my cluster.conf file to my ipmi interface's addresses.
I tested it about a week ago but you made me test again and I found my simple mistake.

root@proxmox1:/var/log/cluster# fence_ipmilan -l ADMIN -p xxxx -a 10.180.0.112 -o status -v
Getting status of IPMI:10.180.0.112...Spawning: '/usr/bin/ipmitool -I lan -H '10.180.0.112' -U 'ADMIN' -P '[set]' -v chassis power status'...
Chassis power = On
Done

---------- IP differs then my clsuter.conf which I failed to update two days ago.

Now I once again verified I have it setup right.

root@proxmox1:/var/log/cluster# fence_node proxmox3 -vv
fence proxmox3 dev 0.0 agent fence_ipmilan result: success
agent args: nodename=proxmox3 agent=fence_ipmilan ipaddr=10.180.0.112 lanplus=1 login=ADMIN passwd=xxxxxx power_wait=5
fence proxmox3 success

root@proxmox3:~# uptime
10:57:13 up 6 min, 1 user, load average: 0.00, 0.08, 0.06

So thank you for mentioning it. I am running monster supermicro boxes. I forget the model name but xeon 4 cpu - 24 core/192 gig ram machines. With that ipmi
card you insert into the MB and run a cable to a NIC interface you install at the back of the box.




--------------------------------


I have solved the problem in a not a very satisfactory way. I rebooted all 3 servers at the same time and they came up correctly.
Rebooting one by one did not help. It was this post I found online which gave me the idea.

http://www.spinics.net/lists/cluster/msg20335.html

I don't know what to tell you... but I am back up and running properly.
 
I realize this is a moderated forum, which by the way is hard to tell when you first post and you get the next page which tells you this and it only shows for a few seconds, but my last post to this thread with what I did to fix it is not here.

Why is this? Did I not post my last message properly? Was there something about my last post that caused it not to be posted?

thanks,
fcmatt
 
The lanplus setting won't work on dell servers unless you go into the settings in the DRAC / IPMI settings page and enable it over lan fyi.

Did you restart pvestatd?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!