HA Problems

ChrisJM

Well-Known Member
Mar 12, 2018
54
1
48
38
Hello,

I have added a new node to a Proxmox CEPH Cluster and added the new node to the HA Group and when i try to move a VM to the new node nothing happens or erorr, if i remove the VM from the HA Group it will migrate then when i add the VM back to the HA group and will say fencing then the VM will move back to the orginal node.

The status for the node on the LRM is saying idle.

Is there any reason for this or how to fix it?
 
Hi!

can you please post the output of:
Code:
# from a few nodes
systemctl status pve-ha-crm pve-ha-lrm watchdog-mux
ha-manager status
# for the next one node is enough
ha-manager config

After you created a group and added a VM/CT to it?
 
Here is the output

Something does seem very wrong, and i cant see anything in the logs.

Code:
root@INETC1434:~# systemctl status pve-ha-crm pve-ha-lrm watchdog-mux
● pve-ha-crm.service - PVE Cluster Ressource Manager Daemon
   Loaded: loaded (/lib/systemd/system/pve-ha-crm.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2019-01-29 13:29:54 GMT; 2min 49s ago
  Process: 2638 ExecStart=/usr/sbin/pve-ha-crm start (code=exited, status=0/SUCCESS)
 Main PID: 2665 (pve-ha-crm)
    Tasks: 1 (limit: 4915)
   Memory: 80.0M
      CPU: 608ms
   CGroup: /system.slice/pve-ha-crm.service
           └─2665 pve-ha-crm

Jan 29 13:29:53 INETC1434 systemd[1]: Starting PVE Cluster Ressource Manager Daemon...
Jan 29 13:29:54 INETC1434 pve-ha-crm[2665]: starting server
Jan 29 13:29:54 INETC1434 pve-ha-crm[2665]: status change startup => wait_for_quorum
Jan 29 13:29:54 INETC1434 systemd[1]: Started PVE Cluster Ressource Manager Daemon.
Jan 29 13:29:59 INETC1434 pve-ha-crm[2665]: status change wait_for_quorum => slave

● pve-ha-lrm.service - PVE Local HA Ressource Manager Daemon
   Loaded: loaded (/lib/systemd/system/pve-ha-lrm.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2019-01-29 13:29:54 GMT; 2min 48s ago
  Process: 2667 ExecStart=/usr/sbin/pve-ha-lrm start (code=exited, status=0/SUCCESS)
 Main PID: 2709 (pve-ha-lrm)
    Tasks: 1 (limit: 4915)
   Memory: 79.7M
      CPU: 589ms
   CGroup: /system.slice/pve-ha-lrm.service
           └─2709 pve-ha-lrm

Jan 29 13:29:54 INETC1434 systemd[1]: Starting PVE Local HA Ressource Manager Daemon...
Jan 29 13:29:54 INETC1434 pve-ha-lrm[2709]: starting server
Jan 29 13:29:54 INETC1434 pve-ha-lrm[2709]: status change startup => wait_for_agent_lock
Jan 29 13:29:54 INETC1434 systemd[1]: Started PVE Local HA Ressource Manager Daemon.

● watchdog-mux.service - Proxmox VE watchdog multiplexer
   Loaded: loaded (/lib/systemd/system/watchdog-mux.service; static; vendor preset: enabled)
   Active: active (running) since Tue 2019-01-29 13:29:40 GMT; 3min 2s ago
 Main PID: 976 (watchdog-mux)
    Tasks: 1 (limit: 4915)
   Memory: 528.0K
      CPU: 6ms
   CGroup: /system.slice/watchdog-mux.service
           └─976 /usr/sbin/watchdog-mux

Jan 29 13:29:40 INETC1434 systemd[1]: Started Proxmox VE watchdog multiplexer.
Jan 29 13:29:40 INETC1434 watchdog-mux[976]: Watchdog driver 'Software Watchdog', version 0
...skipping...
Jan 29 13:29:53 INETC1434 systemd[1]: Starting PVE Cluster Ressource Manager Daemon...
Jan 29 13:29:54 INETC1434 pve-ha-crm[2665]: starting server
Jan 29 13:29:54 INETC1434 pve-ha-crm[2665]: status change startup => wait_for_quorum
Jan 29 13:29:54 INETC1434 systemd[1]: Started PVE Cluster Ressource Manager Daemon.
Jan 29 13:29:59 INETC1434 pve-ha-crm[2665]: status change wait_for_quorum => slave

● pve-ha-lrm.service - PVE Local HA Ressource Manager Daemon
   Loaded: loaded (/lib/systemd/system/pve-ha-lrm.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2019-01-29 13:29:54 GMT; 2min 48s ago
  Process: 2667 ExecStart=/usr/sbin/pve-ha-lrm start (code=exited, status=0/SUCCESS)
 Main PID: 2709 (pve-ha-lrm)
    Tasks: 1 (limit: 4915)
   Memory: 79.7M
      CPU: 589ms
   CGroup: /system.slice/pve-ha-lrm.service
           └─2709 pve-ha-lrm

Jan 29 13:29:54 INETC1434 systemd[1]: Starting PVE Local HA Ressource Manager Daemon...
Jan 29 13:29:54 INETC1434 pve-ha-lrm[2709]: starting server
Jan 29 13:29:54 INETC1434 pve-ha-lrm[2709]: status change startup => wait_for_agent_lock
Jan 29 13:29:54 INETC1434 systemd[1]: Started PVE Local HA Ressource Manager Daemon.

● watchdog-mux.service - Proxmox VE watchdog multiplexer
   Loaded: loaded (/lib/systemd/system/watchdog-mux.service; static; vendor preset: enabled)
   Active: active (running) since Tue 2019-01-29 13:29:40 GMT; 3min 2s ago
 Main PID: 976 (watchdog-mux)
    Tasks: 1 (limit: 4915)
   Memory: 528.0K
      CPU: 6ms
   CGroup: /system.slice/watchdog-mux.service
           └─976 /usr/sbin/watchdog-mux

Jan 29 13:29:40 INETC1434 systemd[1]: Started Proxmox VE watchdog multiplexer.
Jan 29 13:29:40 INETC1434 watchdog-mux[976]: Watchdog driver 'Software Watchdog', version 0
...skipping...
Jan 29 13:29:53 INETC1434 systemd[1]: Starting PVE Cluster Ressource Manager Daemon...
Jan 29 13:29:54 INETC1434 pve-ha-crm[2665]: starting server
Jan 29 13:29:54 INETC1434 pve-ha-crm[2665]: status change startup => wait_for_quorum
Jan 29 13:29:54 INETC1434 systemd[1]: Started PVE Cluster Ressource Manager Daemon.
Jan 29 13:29:59 INETC1434 pve-ha-crm[2665]: status change wait_for_quorum => slave

● pve-ha-lrm.service - PVE Local HA Ressource Manager Daemon
   Loaded: loaded (/lib/systemd/system/pve-ha-lrm.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2019-01-29 13:29:54 GMT; 2min 48s ago
  Process: 2667 ExecStart=/usr/sbin/pve-ha-lrm start (code=exited, status=0/SUCCESS)
 Main PID: 2709 (pve-ha-lrm)
    Tasks: 1 (limit: 4915)
   Memory: 79.7M
      CPU: 589ms
   CGroup: /system.slice/pve-ha-lrm.service
           └─2709 pve-ha-lrm

Jan 29 13:29:54 INETC1434 systemd[1]: Starting PVE Local HA Ressource Manager Daemon...
Jan 29 13:29:54 INETC1434 pve-ha-lrm[2709]: starting server
Jan 29 13:29:54 INETC1434 pve-ha-lrm[2709]: status change startup => wait_for_agent_lock
Jan 29 13:29:54 INETC1434 systemd[1]: Started PVE Local HA Ressource Manager Daemon.

● watchdog-mux.service - Proxmox VE watchdog multiplexer
   Loaded: loaded (/lib/systemd/system/watchdog-mux.service; static; vendor preset: enabled)
   Active: active (running) since Tue 2019-01-29 13:29:40 GMT; 3min 2s ago
 Main PID: 976 (watchdog-mux)
    Tasks: 1 (limit: 4915)
   Memory: 528.0K
      CPU: 6ms
   CGroup: /system.slice/watchdog-mux.service
           └─976 /usr/sbin/watchdog-mux

Jan 29 13:29:40 INETC1434 systemd[1]: Started Proxmox VE watchdog multiplexer.
Jan 29 13:29:40 INETC1434 watchdog-mux[976]: Watchdog driver 'Software Watchdog', version 0
...skipping...
Jan 29 13:29:53 INETC1434 systemd[1]: Starting PVE Cluster Ressource Manager Daemon...
Jan 29 13:29:54 INETC1434 pve-ha-crm[2665]: starting server
Jan 29 13:29:54 INETC1434 pve-ha-crm[2665]: status change startup => wait_for_quorum
Jan 29 13:29:54 INETC1434 systemd[1]: Started PVE Cluster Ressource Manager Daemon.
Jan 29 13:29:59 INETC1434 pve-ha-crm[2665]: status change wait_for_quorum => slave

● pve-ha-lrm.service - PVE Local HA Ressource Manager Daemon
   Loaded: loaded (/lib/systemd/system/pve-ha-lrm.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2019-01-29 13:29:54 GMT; 2min 48s ago
  Process: 2667 ExecStart=/usr/sbin/pve-ha-lrm start (code=exited, status=0/SUCCESS)
 Main PID: 2709 (pve-ha-lrm)
    Tasks: 1 (limit: 4915)
   Memory: 79.7M
      CPU: 589ms
   CGroup: /system.slice/pve-ha-lrm.service
           └─2709 pve-ha-lrm

Jan 29 13:29:54 INETC1434 systemd[1]: Starting PVE Local HA Ressource Manager Daemon...
Jan 29 13:29:54 INETC1434 pve-ha-lrm[2709]: starting server
Jan 29 13:29:54 INETC1434 pve-ha-lrm[2709]: status change startup => wait_for_agent_lock
Jan 29 13:29:54 INETC1434 systemd[1]: Started PVE Local HA Ressource Manager Daemon.

● watchdog-mux.service - Proxmox VE watchdog multiplexer
   Loaded: loaded (/lib/systemd/system/watchdog-mux.service; static; vendor preset: enabled)
   Active: active (running) since Tue 2019-01-29 13:29:40 GMT; 3min 2s ago
 Main PID: 976 (watchdog-mux)
    Tasks: 1 (limit: 4915)
   Memory: 528.0K
      CPU: 6ms
   CGroup: /system.slice/watchdog-mux.service
           └─976 /usr/sbin/watchdog-mux

Jan 29 13:29:40 INETC1434 systemd[1]: Started Proxmox VE watchdog multiplexer.
Jan 29 13:29:40 INETC1434 watchdog-mux[976]: Watchdog driver 'Software Watchdog', version 0
...skipping...
Jan 29 13:29:53 INETC1434 systemd[1]: Starting PVE Cluster Ressource Manager Daemon...
Jan 29 13:29:54 INETC1434 pve-ha-crm[2665]: starting server
Jan 29 13:29:54 INETC1434 pve-ha-crm[2665]: status change startup => wait_for_quorum
Jan 29 13:29:54 INETC1434 systemd[1]: Started PVE Cluster Ressource Manager Daemon.
Jan 29 13:29:59 INETC1434 pve-ha-crm[2665]: status change wait_for_quorum => slave

● pve-ha-lrm.service - PVE Local HA Ressource Manager Daemon
   Loaded: loaded (/lib/systemd/system/pve-ha-lrm.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2019-01-29 13:29:54 GMT; 2min 48s ago
  Process: 2667 ExecStart=/usr/sbin/pve-ha-lrm start (code=exited, status=0/SUCCESS)
 Main PID: 2709 (pve-ha-lrm)
    Tasks: 1 (limit: 4915)
   Memory: 79.7M
      CPU: 589ms
   CGroup: /system.slice/pve-ha-lrm.service
           └─2709 pve-ha-lrm

Jan 29 13:29:54 INETC1434 systemd[1]: Starting PVE Local HA Ressource Manager Daemon...
Jan 29 13:29:54 INETC1434 pve-ha-lrm[2709]: starting server
Jan 29 13:29:54 INETC1434 pve-ha-lrm[2709]: status change startup => wait_for_agent_lock
Jan 29 13:29:54 INETC1434 systemd[1]: Started PVE Local HA Ressource Manager Daemon.

● watchdog-mux.service - Proxmox VE watchdog multiplexer
   Loaded: loaded (/lib/systemd/system/watchdog-mux.service; static; vendor preset: enabled)
   Active: active (running) since Tue 2019-01-29 13:29:40 GMT; 3min 2s ago
 Main PID: 976 (watchdog-mux)
    Tasks: 1 (limit: 4915)
   Memory: 528.0K
      CPU: 6ms
   CGroup: /system.slice/watchdog-mux.service
           └─976 /usr/sbin/watchdog-mux

Jan 29 13:29:40 INETC1434 systemd[1]: Started Proxmox VE watchdog multiplexer.
Jan 29 13:29:40 INETC1434 watchdog-mux[976]: Watchdog driver 'Software Watchdog', version 0
...skipping...
Jan 29 13:29:53 INETC1434 systemd[1]: Starting PVE Cluster Ressource Manager Daemon...
Jan 29 13:29:54 INETC1434 pve-ha-crm[2665]: starting server
Jan 29 13:29:54 INETC1434 pve-ha-crm[2665]: status change startup => wait_for_quorum
Jan 29 13:29:54 INETC1434 systemd[1]: Started PVE Cluster Ressource Manager Daemon.
Jan 29 13:29:59 INETC1434 pve-ha-crm[2665]: status change wait_for_quorum => slave

● pve-ha-lrm.service - PVE Local HA Ressource Manager Daemon
   Loaded: loaded (/lib/systemd/system/pve-ha-lrm.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2019-01-29 13:29:54 GMT; 2min 48s ago
  Process: 2667 ExecStart=/usr/sbin/pve-ha-lrm start (code=exited, status=0/SUCCESS)
 Main PID: 2709 (pve-ha-lrm)
    Tasks: 1 (limit: 4915)
   Memory: 79.7M
      CPU: 589ms
   CGroup: /system.slice/pve-ha-lrm.service
           └─2709 pve-ha-lrm

Jan 29 13:29:54 INETC1434 systemd[1]: Starting PVE Local HA Ressource Manager Daemon...
Jan 29 13:29:54 INETC1434 pve-ha-lrm[2709]: starting server
Jan 29 13:29:54 INETC1434 pve-ha-lrm[2709]: status change startup => wait_for_agent_lock
Jan 29 13:29:54 INETC1434 systemd[1]: Started PVE Local HA Ressource Manager Daemon.

● watchdog-mux.service - Proxmox VE watchdog multiplexer
   Loaded: loaded (/lib/systemd/system/watchdog-mux.service; static; vendor preset: enabled)
   Active: active (running) since Tue 2019-01-29 13:29:40 GMT; 3min 2s ago
 Main PID: 976 (watchdog-mux)
    Tasks: 1 (limit: 4915)
   Memory: 528.0K
      CPU: 6ms
   CGroup: /system.slice/watchdog-mux.service
           └─976 /usr/sbin/watchdog-mux

Jan 29 13:29:40 INETC1434 systemd[1]: Started Proxmox VE watchdog multiplexer.
Jan 29 13:29:40 INETC1434 watchdog-mux[976]: Watchdog driver 'Software Watchdog', version 0
...skipping...
Jan 29 13:29:53 INETC1434 systemd[1]: Starting PVE Cluster Ressource Manager Daemon...
Jan 29 13:29:54 INETC1434 pve-ha-crm[2665]: starting server
Jan 29 13:29:54 INETC1434 pve-ha-crm[2665]: status change startup => wait_for_quorum
Jan 29 13:29:54 INETC1434 systemd[1]: Started PVE Cluster Ressource Manager Daemon.
Jan 29 13:29:59 INETC1434 pve-ha-crm[2665]: status change wait_for_quorum => slave

● pve-ha-lrm.service - PVE Local HA Ressource Manager Daemon
   Loaded: loaded (/lib/systemd/system/pve-ha-lrm.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2019-01-29 13:29:54 GMT; 2min 48s ago
  Process: 2667 ExecStart=/usr/sbin/pve-ha-lrm start (code=exited, status=0/SUCCESS)
 Main PID: 2709 (pve-ha-lrm)
    Tasks: 1 (limit: 4915)
   Memory: 79.7M
      CPU: 589ms
   CGroup: /system.slice/pve-ha-lrm.service
           └─2709 pve-ha-lrm

Jan 29 13:29:54 INETC1434 systemd[1]: Starting PVE Local HA Ressource Manager Daemon...
Jan 29 13:29:54 INETC1434 pve-ha-lrm[2709]: starting server
Jan 29 13:29:54 INETC1434 pve-ha-lrm[2709]: status change startup => wait_for_agent_lock
Jan 29 13:29:54 INETC1434 systemd[1]: Started PVE Local HA Ressource Manager Daemon.

● watchdog-mux.service - Proxmox VE watchdog multiplexer
   Loaded: loaded (/lib/systemd/system/watchdog-mux.service; static; vendor preset: enabled)
   Active: active (running) since Tue 2019-01-29 13:29:40 GMT; 3min 2s ago
 Main PID: 976 (watchdog-mux)
    Tasks: 1 (limit: 4915)
   Memory: 528.0K
      CPU: 6ms
   CGroup: /system.slice/watchdog-mux.service
           └─976 /usr/sbin/watchdog-mux

Jan 29 13:29:40 INETC1434 systemd[1]: Started Proxmox VE watchdog multiplexer.
Jan 29 13:29:40 INETC1434 watchdog-mux[976]: Watchdog driver 'Software Watchdog', version 0
root@INETC1434:~#
root@INETC1434:~# ha-manager status
quorum OK
master INETC1536 (active, Tue Jan 29 13:36:09 2019)
lrm INETC1209 (active, Tue Jan 29 13:36:09 2019)
lrm INETC1242 (active, Tue Jan 29 13:36:10 2019)
lrm INETC1434 (idle, Tue Jan 29 13:36:15 2019)
lrm INETC1536 (active, Tue Jan 29 13:36:17 2019)
service vm:100 (INETC1209, started)
service vm:1075 (INETC1536, started)
service vm:1485 (INETC1536, started)
service vm:1488 (INETC1242, started)
service vm:1641 (INETC1209, started)
service vm:1723 (INETC1536, started)
service vm:1724 (INETC1242, started)
 
I did not meant that the systemctl command should be done a few times for a single node but for some (different) nodes, also the "ha-manager config" command output misses, additionally as you talked about groups so the "ha-manager groupconfig" would be nice too.

The idle, as of now, simply is from the fact that INETC1434 has no enabled service configured, so its Local Resource Manager stays idle. You say a migration fails, can you get me some logs from that time?
The one from the master (currently INETC1536), the migration source and target node would be great.
 
The node INETC1434 just rebooted by itself when i added dummy VM to the HA

Code:
root@INETC1434:~# ha-manager groupconfig
group: HA
        nodes INETC1242,INETC1536,INETC1209,INETC1434
        nofailback 0
        restricted 0

when i now try and move a VM back to INETC1434 all i get is the following and nothing will happen.

Code:
Requesting HA migration for VM 100 to node INETC1434
TASK OK
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!