Hi all,I have 3 HP DL 165 G7 servers which im trying to use with proxmox. If we can get this working its the plan that we get atleast a community licens but for this proof of concept we still running the open source free edition.The Problem:We are trying to setup fecing and its somewhat working. If i reboot a node the VMs fail over to other nodes. The problem comes when i pull the power cord directly from a server. This does NOT result the vm in migrating to other host. They stay on the failed host and over time shows the web icon for being powered off. If i at any point put power back to the node it will start to migrate the VMs during the servers post operation.another issue:These servers has ILO100 BMC controller and you access the BMC controller by adding a IPMI IP in the BIOS. You also have to choose shared nics which means you can reach IPMI web interface on all nics in the server. Currently my ip are as followsproxmox00 10.10.99.20 - BMC IP 10.10.99.30proxmox01 10.10.99.21 - BMC IP 10.10.99.31proxmox02 10.10.99.22 - BMC IP 10.10.99.32we have been able to both ping the BMC IP and access the web untill a few days ago. All of sudden we couldn't ping OR access the web interface. further troupleshooting on this has showed we ARE ABLE to ping the BMC IP during POST. The ping and http access stops as soon as proxmox start CMAN process. By that i mean it writes Starting CMAN.... OKHere is my cluster conf:root@proxmox00:~# cat /etc/pve/cluster.conf
When we try
it fails with:
using
fails with
but the fence commands are being issued while the BMC IP isn't responding so it COULD be that it fails because there is no connection to the BMC controller in gennerel.. Still i'm confused and not sure if this is right cause my VMs do migrate when i do a reboot of the proxmox node which should indicate that fencing is working... How did we set this up...Well you got the cluster conf alleady.. All nodes are fully upgraded. Ofcause we don't get the stable updates since the servers are not licensed yetipmitool is installed on all serversredhat-cluster-pve has FENCE_JOIN set to yesfence_tool join has been run on all serversfence_tool ls shows:
ANY input would be GREATLY appriciated we have been working on this for perhaps a week now... THANKSCasperEDIT:The servers are also fully firmware updated.. if you want to give the servers a look one of them has the following serial number CZJ2120JSW
Code:
root@proxmox00:~#
Code:
fence_node proxmox00 -vv
Code:
root@proxmox00:~# fence_node proxmox01 -vvfence proxmox01 dev 0.0 agent fence_ipmilan result: error from agentagent args: nodename=proxmox01 agent=fence_ipmilan ipaddr=10.10.99.21 lanplus=1 auth=password login=admin passwd=XXXXX power_wait=5 method=cyclefence proxmox01 failed
Code:
fence_ipmilan
Code:
root@proxmox00:~# fence_ipmilan -l admin -p XXXXXXXX -P -a 10.10.99.32 -T 4 -o off -vPowering off machine @ IPMI:10.10.99.32...Spawning: '/usr/bin/ipmitool -I lanplus -H '10.10.99.32' -U 'admin' -P '[set]' -v chassis power status'...ipmilan: Failed to connect after 20 secondsFailedroot@proxmox00:~#
Code:
root@proxmox00:~# fence_tool lsfence domainmember count 3victim count 0victim now 0master nodeid 1wait state nonemembers 1 2 3
Last edited: