- Hello
We're planning to migrate part of our production environment to Proxmox if it posible. I have set up a testing env this week with 3 Proxmox nodes: 2 physical and 1 virtual and 1 basic NFS Debian Server, but I could not make HA work in any way.
When I plug off the network cable or power some node off just show the node unreachable, but the VM allocated inside does not automove to another node avalaible
. I have encountered some weird behaviour with HA-manager:
Current conf:
--- Cluster status
-- HA-Manager weird issue (old timestamp - dead?)
I set up watchdog module file etc/modprobe.d/ipmi_watchdog.conf with
options ipmi_watchdog action=power_cycle panic_wdt_timeout=10
After I disconnect the network cable, syslog shows this.
Dec 14 16:17:38 pmtest1 pve-ha-crm[1073]: successfully acquired lock 'ha_manager_lock'
Dec 14 16:17:38 pmtest1 pve-ha-crm[1073]: ERROR: unable to open watchdog socket - No such file or directory
Dec 14 16:17:38 pmtest1 pve-ha-crm[1073]: server received shutdown request
Dec 14 16:17:38 pmtest1 pve-ha-crm[1073]: server stopped
Dec 14 16:17:38 pmtest1 systemd[1]: pve-ha-crm.service: Main process exited, code=exited, status=255/n/a
Dec 14 16:17:39 pmtest1 pve-ha-lrm[1083]: successfully acquired lock 'ha_agent_pmtest1_lock'
Dec 14 16:17:39 pmtest1 pve-ha-lrm[1083]: ERROR: unable to open watchdog socket - No such file or directory
Dec 14 16:17:39 pmtest1 pve-ha-lrm[1083]: restart LRM, freeze all services
Dec 14 16:17:39 pmtest1 pve-ha-lrm[1083]: server stopped
Dec 14 16:17:39 pmtest1 systemd[1]: pve-ha-lrm.service: Main process exited, code=exited, status=255/n/a
Dec 14 16:17:39 pmtest1 systemd[1]: pve-ha-crm.service: Unit entered failed state.
Dec 14 16:17:39 pmtest1 systemd[1]: pve-ha-crm.service: Failed with result 'exit-code'.
Dec 14 16:17:39 pmtest1 systemd[1]: pve-ha-lrm.service: Unit entered failed state.
Dec 14 16:17:39 pmtest1 systemd[1]: pve-ha-lrm.service: Failed with result 'exit-code'
Could anybody help me? I dont know what else to do.
Thanks.