HI everyone first things first found PM 3 weeks ago and thumbs up
But I have some issues with the HA side of things.
Namely.
1. On physical machine restart I have to stop CMan -> CRON -> and restart PVECluster then start -> CRON -> CMan before RGmanager will start no a big issue but wondering why and also suspecting it might have something to do with issue 2.
Error when starting straight after restart if i dont do the above.
Starting Cluster Service Manager: [FAILED]
TASK ERROR: command '/etc/init.d/rgmanager start' failed: exit code 1
2. HA works fine RGManagers moves KVM's no issue but if i create a VM then add it to be HA managed it just does nothing gives 255, 254 and 250 errors on migration and HA migration and VM start. If i stop the host after it's been added to the HA it will not restarts. This leaves me two options, 1. take the server back out of the HA cluster.conf and machine starts without issue and can be migrated (offline and online) 2. I have to shut down the whole HA cluster all VMS nodes and then start them and go through the first item and stop start cycle, then everything works as expected e.g. machine is HA and start stops as expected.
error when trying to start newly added VM
Apr 23 17:48:39 VMS-BC-AM-003 pvedaemon[2462]: <root@pam> end task UPID:VMS-BC-AM-003:00003B95:0002CBD0:4F9587E6:hastart:106:root@pam: command 'clusvcadm -e pvevm:106 -m VMS-BC-AM-003' failed: exit code 254
Executing HA migrate for VM 106 to node VMS-BC-AM-001
Trying to migrate pvevm:106 to VMS-BC-AM-001...Temporary failure; try again
TASK ERROR: command 'clusvcadm -M pvevm:106 -m VMS-BC-AM-001' failed: exit code 250
2 x openfilers Cluster
iSCSI target single LUN -> LVM on top
3 x Servers as nodes (2 x G5 + 1 x Dell SC1435 all Fenced, see cluster.conf)
Latest PM updated yesterday
Network conf server 1 all are the same bar IP's
------------------------------------------------------------
# Network interface settings
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet static
address 10.0.0.30
netmask 255.255.255.0
auto eth0.8
iface eth0.8 inet static
address 10.0.2.31
netmask 255.255.255.0
auto eth0.10
iface eth0.10 inet manual
auto eth0.7
iface eth0.7 inet manual
auto eth1
iface eth1 inet static
address 10.0.0.31
netmask 255.255.255.0
auto eth1.8
iface eth1.8 inet static
address 10.0.2.32
netmask 255.255.255.0
auto eth1.10
iface eth1.10 inet manual
auto eth1.7
iface eth1.7 inet manual
auto vmbr0
iface vmbr0 inet static
address 10.1.0.30
netmask 255.255.255.0
gateway 10.1.0.1
bridge_ports eth0.10
bridge_stp off
bridge_fd 0
auto vmbr1
iface vmbr1 inet static
address 10.1.0.31
netmask 255.255.255.0
bridge_ports eth1.10
bridge_stp off
bridge_fd 0
auto vmbr0.7
iface vmbr0.7 inet manual
bridge_ports eth0.7
bridge_stp off
bridge_fd 0
auto vmbr1.7
iface vmbr1.7 inet manual
bridge_ports eth1.7
bridge_stp off
bridge_fd 0
--------------------------------------------------
Cluster.conf
<?xml version="1.0"?>
<cluster config_version="48" name="bollicomp">
<logging debug="on"/>
<cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<fencedevices>
<fencedevice agent="fence_ilo" ipaddr="10.0.2.30" login="root" name="node1" passwd="xxxx"/>
<fencedevice agent="fence_ipmilan" ipaddr="10.0.2.40" lanplus="1" login="root" name="node2" passwd="xxxx" power_wait="5"/>
<fencedevice agent="fence_ilo" ipaddr="10.0.2.50" login="root" name="node3" passwd="xxxx"/>
</fencedevices>
<clusternodes>
<clusternode name="VMS-BC-AM-001" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="node1"/>
</method>
</fence>
</clusternode>
<clusternode name="VMS-BC-AM-002" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="node2"/>
</method>
</fence>
</clusternode>
<clusternode name="VMS-BC-AM-003" nodeid="3" votes="1">
<fence>
<method name="1">
<device name="node3"/>
</method>
</fence>
</clusternode>
</clusternodes>
<rm>
<service autostart="1" exclusive="0" name="TestIP" recovery="relocate">
<ip address="10.1.0.60"/>
</service>
<pvevm autostart="1" vmid="102"/>
<pvevm autostart="1" vmid="105"/>
<pvevm autostart="1" vmid="103"/>
<pvevm autostart="1" vmid="104"/>
<pvevm autostart="1" vmid="106"/>
</rm>
</cluster>
-------------------------------------
Error is easily recreatable e.g. everytime I add a machine to HA.
Any ideas known bug maybe?
Regards
Dave Webster.
P.S. My linux skills are not perfect so might be simple lack of skills on my part so please bear this in mind
also any further info please let me know.

But I have some issues with the HA side of things.
Namely.
1. On physical machine restart I have to stop CMan -> CRON -> and restart PVECluster then start -> CRON -> CMan before RGmanager will start no a big issue but wondering why and also suspecting it might have something to do with issue 2.
Error when starting straight after restart if i dont do the above.
Starting Cluster Service Manager: [FAILED]
TASK ERROR: command '/etc/init.d/rgmanager start' failed: exit code 1
2. HA works fine RGManagers moves KVM's no issue but if i create a VM then add it to be HA managed it just does nothing gives 255, 254 and 250 errors on migration and HA migration and VM start. If i stop the host after it's been added to the HA it will not restarts. This leaves me two options, 1. take the server back out of the HA cluster.conf and machine starts without issue and can be migrated (offline and online) 2. I have to shut down the whole HA cluster all VMS nodes and then start them and go through the first item and stop start cycle, then everything works as expected e.g. machine is HA and start stops as expected.
error when trying to start newly added VM
Apr 23 17:48:39 VMS-BC-AM-003 pvedaemon[2462]: <root@pam> end task UPID:VMS-BC-AM-003:00003B95:0002CBD0:4F9587E6:hastart:106:root@pam: command 'clusvcadm -e pvevm:106 -m VMS-BC-AM-003' failed: exit code 254
Executing HA migrate for VM 106 to node VMS-BC-AM-001
Trying to migrate pvevm:106 to VMS-BC-AM-001...Temporary failure; try again
TASK ERROR: command 'clusvcadm -M pvevm:106 -m VMS-BC-AM-001' failed: exit code 250
2 x openfilers Cluster
iSCSI target single LUN -> LVM on top
3 x Servers as nodes (2 x G5 + 1 x Dell SC1435 all Fenced, see cluster.conf)
Latest PM updated yesterday
Network conf server 1 all are the same bar IP's
------------------------------------------------------------
# Network interface settings
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet static
address 10.0.0.30
netmask 255.255.255.0
auto eth0.8
iface eth0.8 inet static
address 10.0.2.31
netmask 255.255.255.0
auto eth0.10
iface eth0.10 inet manual
auto eth0.7
iface eth0.7 inet manual
auto eth1
iface eth1 inet static
address 10.0.0.31
netmask 255.255.255.0
auto eth1.8
iface eth1.8 inet static
address 10.0.2.32
netmask 255.255.255.0
auto eth1.10
iface eth1.10 inet manual
auto eth1.7
iface eth1.7 inet manual
auto vmbr0
iface vmbr0 inet static
address 10.1.0.30
netmask 255.255.255.0
gateway 10.1.0.1
bridge_ports eth0.10
bridge_stp off
bridge_fd 0
auto vmbr1
iface vmbr1 inet static
address 10.1.0.31
netmask 255.255.255.0
bridge_ports eth1.10
bridge_stp off
bridge_fd 0
auto vmbr0.7
iface vmbr0.7 inet manual
bridge_ports eth0.7
bridge_stp off
bridge_fd 0
auto vmbr1.7
iface vmbr1.7 inet manual
bridge_ports eth1.7
bridge_stp off
bridge_fd 0
--------------------------------------------------
Cluster.conf
<?xml version="1.0"?>
<cluster config_version="48" name="bollicomp">
<logging debug="on"/>
<cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<fencedevices>
<fencedevice agent="fence_ilo" ipaddr="10.0.2.30" login="root" name="node1" passwd="xxxx"/>
<fencedevice agent="fence_ipmilan" ipaddr="10.0.2.40" lanplus="1" login="root" name="node2" passwd="xxxx" power_wait="5"/>
<fencedevice agent="fence_ilo" ipaddr="10.0.2.50" login="root" name="node3" passwd="xxxx"/>
</fencedevices>
<clusternodes>
<clusternode name="VMS-BC-AM-001" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="node1"/>
</method>
</fence>
</clusternode>
<clusternode name="VMS-BC-AM-002" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="node2"/>
</method>
</fence>
</clusternode>
<clusternode name="VMS-BC-AM-003" nodeid="3" votes="1">
<fence>
<method name="1">
<device name="node3"/>
</method>
</fence>
</clusternode>
</clusternodes>
<rm>
<service autostart="1" exclusive="0" name="TestIP" recovery="relocate">
<ip address="10.1.0.60"/>
</service>
<pvevm autostart="1" vmid="102"/>
<pvevm autostart="1" vmid="105"/>
<pvevm autostart="1" vmid="103"/>
<pvevm autostart="1" vmid="104"/>
<pvevm autostart="1" vmid="106"/>
</rm>
</cluster>
-------------------------------------
Error is easily recreatable e.g. everytime I add a machine to HA.
Any ideas known bug maybe?
Regards
Dave Webster.
P.S. My linux skills are not perfect so might be simple lack of skills on my part so please bear this in mind

Last edited: