HA groups fails to migrate after upgrade 8 to 9

sbarmen

New Member
Dec 28, 2023
9
2
3
Norway
www.barmen.no
Hello all, I have upgraded my cluster to 9 (currently on 9.1.1) and I find that my HA groups have not been migrated during the upgrade. I get the following error when I try to make changes.

1764409070956.png

I believe that this is caused by me removing some old nodes in the cluster previously. When the cluster was installed back in the day I had two nodes, thinkserver and pvrserver. Now they are gone and I only have px0-rv, px1-rv and px2-rv.

I think I followed the guide properly to remove the old nodes but still they are causing problems. Note, the current nodes run CEPH, but the two old nodes were never part of the CEPH cluster. Any pointers for me?

Code:
root@px2-rv:~# pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         4          1 px0-rv
         5          1 px1-rv
         6          1 px2-rv (local)

Code:
root@px2-rv:/etc/pve/nodes# ls
px0-rv  px1-rv  px2-rv
root@px2-rv:/etc/pve/nodes# pvecm delnode pvrserver
Node/IP: pvrserver is not a known host of the cluster.
root@px2-rv:/etc/pve/nodes# pvecm delnode thinkserver
Node/IP: thinkserver is not a known host of the cluster.
root@px2-rv:/etc/pve/nodes#

I also find errors in pve-ha-crm service running journalctl:

Code:
Nov 29 10:50:46 px2-rv pve-ha-crm[1874]: ha groups migration: node 'pvrserver' is in state 'gone'
Nov 29 10:50:46 px2-rv pve-ha-crm[1874]: abort ha groups migration: node 'pvrserver' is not online
Nov 29 10:50:46 px2-rv pve-ha-crm[1874]: ha groups migration failed
Nov 29 10:50:46 px2-rv pve-ha-crm[1874]: retry ha groups migration in 6 rounds (~ 60 seconds)
Nov 29 10:50:56 px2-rv pve-ha-crm[1874]: unable to read file '/etc/pve/nodes/pvrserver/lrm_status'
Nov 29 10:50:56 px2-rv pve-ha-crm[1874]: unable to read file '/etc/pve/nodes/thinkserver/lrm_status'
Nov 29 10:51:06 px2-rv pve-ha-crm[1874]: unable to read file '/etc/pve/nodes/pvrserver/lrm_status'
Nov 29 10:51:06 px2-rv pve-ha-crm[1874]: unable to read file '/etc/pve/nodes/thinkserver/lrm_status'
Nov 29 10:51:16 px2-rv pve-ha-crm[1874]: unable to read file '/etc/pve/nodes/pvrserver/lrm_status'
Nov 29 10:51:16 px2-rv pve-ha-crm[1874]: unable to read file '/etc/pve/nodes/thinkserver/lrm_status'


Bash:
root@px2-rv:~# ha-manager status -v
unable to read file '/etc/pve/nodes/pvrserver/lrm_status'
unable to read file '/etc/pve/nodes/thinkserver/lrm_status'
quorum OK
master px2-rv (active, Sat Nov 29 10:32:06 2025)
lrm pvrserver (unable to read lrm status)
lrm px0-rv (active, Sat Nov 29 10:32:06 2025)
lrm px1-rv (active, Sat Nov 29 10:32:05 2025)
lrm px2-rv (active, Sat Nov 29 10:32:11 2025)
lrm thinkserver (unable to read lrm status)
service ct:101 (px0-rv, started)
service ct:115 (px0-rv, started)
service vm:103 (px1-rv, started)
service vm:104 (px1-rv, started)
service vm:106 (px2-rv, started)
service vm:107 (px2-rv, started)
service vm:108 (px2-rv, started)
service vm:111 (px0-rv, started)
service vm:113 (px1-rv, started)
service vm:300 (px2-rv, started)
service vm:305 (px2-rv, disabled)
service vm:306 (px1-rv, started)
service vm:307 (px2-rv, started)
full cluster state:
unable to read file '/etc/pve/nodes/pvrserver/lrm_status'
unable to read file '/etc/pve/nodes/thinkserver/lrm_status'
{
   "lrm_status" : {
      "pvrserver" : {
         "mode" : "unknown"
      },
      "px0-rv" : {
         "mode" : "active",
         "results" : {
            "UID-REDACTED-001" : {
               "exit_code" : 0,
               "sid" : "ct:115",
               "state" : "started"
            },
            "UID-REDACTED-002" : {
               "exit_code" : 0,
               "sid" : "vm:111",
               "state" : "started"
            },
            "UID-REDACTED-003" : {
               "exit_code" : 0,
               "sid" : "ct:101",
               "state" : "started"
            }
         },
         "state" : "active",
         "timestamp" : 1764408726
      },
      "px1-rv" : {
         "mode" : "active",
         "results" : {
            "UID-REDACTED-004" : {
               "exit_code" : 0,
               "sid" : "vm:103",
               "state" : "started"
            },
            "UID-REDACTED-005" : {
               "exit_code" : 0,
               "sid" : "vm:104",
               "state" : "started"
            },
            "UID-REDACTED-006" : {
               "exit_code" : 0,
               "sid" : "vm:113",
               "state" : "started"
            },
            "UID-REDACTED-007" : {
               "exit_code" : 0,
               "sid" : "vm:306",
               "state" : "started"
            }
         },
         "state" : "active",
         "timestamp" : 1764408725
      },
      "px2-rv" : {
         "mode" : "active",
         "results" : {
            "UID-REDACTED-008" : {
               "exit_code" : 0,
               "sid" : "vm:307",
               "state" : "started"
            },
            "UID-REDACTED-009" : {
               "exit_code" : 0,
               "sid" : "vm:107",
               "state" : "started"
            },
            "UID-REDACTED-010" : {
               "exit_code" : 0,
               "sid" : "vm:305",
               "state" : "stopped"
            },
            "UID-REDACTED-011" : {
               "exit_code" : 0,
               "sid" : "vm:300",
               "state" : "started"
            },
            "UID-REDACTED-012" : {
               "exit_code" : 0,
               "sid" : "vm:108",
               "state" : "started"
            },
            "UID-REDACTED-013" : {
               "exit_code" : 0,
               "sid" : "vm:106",
               "state" : "started"
            }
         },
         "state" : "active",
         "timestamp" : 1764408731
      },
      "thinkserver" : {
         "mode" : "unknown"
      }
   },
   "manager_status" : {
      "master_node" : "px2-rv",
      "node_status" : {
         "pvrserver" : "gone",
         "px0-rv" : "online",
         "px1-rv" : "online",
         "px2-rv" : "online",
         "thinkserver" : "gone"
      },
      "service_status" : {
         "ct:101" : {
            "node" : "px0-rv",
            "running" : 1,
            "state" : "started",
            "uid" : "UID-REDACTED-003"
         },
         "ct:115" : {
            "node" : "px0-rv",
            "running" : 1,
            "state" : "started",
            "uid" : "UID-REDACTED-001"
         },
         "vm:103" : {
            "node" : "px1-rv",
            "running" : 1,
            "state" : "started",
            "uid" : "UID-REDACTED-004"
         },
         "vm:104" : {
            "node" : "px1-rv",
            "running" : 1,
            "state" : "started",
            "uid" : "UID-REDACTED-005"
         },
         "vm:106" : {
            "node" : "px2-rv",
            "running" : 1,
            "state" : "started",
            "uid" : "UID-REDACTED-013"
         },
         "vm:107" : {
            "node" : "px2-rv",
            "running" : 1,
            "state" : "started",
            "uid" : "UID-REDACTED-009"
         },
         "vm:108" : {
            "node" : "px2-rv",
            "running" : 1,
            "state" : "started",
            "uid" : "UID-REDACTED-012"
         },
         "vm:111" : {
            "node" : "px0-rv",
            "running" : 1,
            "state" : "started",
            "uid" : "UID-REDACTED-002"
         },
         "vm:113" : {
            "node" : "px1-rv",
            "running" : 1,
            "state" : "started",
            "uid" : "UID-REDACTED-006"
         },
         "vm:300" : {
            "node" : "px2-rv",
            "running" : 1,
            "state" : "started",
            "uid" : "UID-REDACTED-011"
         },
         "vm:305" : {
            "node" : "px2-rv",
            "state" : "stopped",
            "uid" : "UID-REDACTED-010"
         },
         "vm:306" : {
            "node" : "px1-rv",
            "running" : 1,
            "state" : "started",
            "uid" : "UID-REDACTED-007"
         },
         "vm:307" : {
            "node" : "px2-rv",
            "running" : 1,
            "state" : "started",
            "uid" : "UID-REDACTED-008"
         }
      },
      "timestamp" : 1764408726
   },
   "quorum" : {
      "node" : "px2-rv",
      "quorate" : "1"
   }
}
 
Hi!

The HA group to HA node affinity rules migration is done every 6 HA rounds, i.e., around every minute with an HA round lasting ~10 seconds. To be sure that everything is fine, it will only do the migration if the cluster is quorate, all nodes are online, the LRMs are active or idle, and all nodes have been upgraded to 9.0.0+.

The HA Manager will mark a node as 'gone' if it's status is unknown (e.g. not online, deleted) and delete it from the HA Manager status after around an hour of being in the state 'gone'. Has the HA Manager caught up to that yet? The log message should be deleting gone node '<nodename>', not a cluster member anymore.

Does either /etc/pve/nodes/pvrserver/ or /etc/pve/nodes/thinkserver/ still exist?
 
Hello @dakralex, the nodes thinkserver and pvrserver were delete many months ago and they were never part of the v9 upgrade. When starting to troubleshoot this problem I found that the folders in /etc/pve/nodes did infact include thinkserver and pvrserver folders, but I have since deleted them.

And now, while looking into your reply it seems that the cluster has fixed itself. Probably it was remedied by deleting the folders and giving it some time.

Case closed, thanks for replying!
 
  • Like
Reactions: dakralex
Hello,

I have the "same" problem after upgrading the 3 nodes cluster from 8.4.14 to 9.1.2 in-place the HA configuration was not migrated.

I have the files "/etc/pve/groups.cfg" and "/etc/pve/resources.cfg" and not the new file "/etc/pve/rules.cfg"
and when I try to modify the ha configuration I have the message "ha groups have not been migrated yet "(500)""

How can I migrate the HA configuration alter the upgrade ?

I removed the 'groups.cfg' and 'resources.cfg' files and reconfigure HA but I have few VMs.

Best regards.
Francis
 
Last edited:
I have the files "/etc/pve/groups.cfg" and "/etc/pve/resources.cfg" and not the new file "/etc/pve/rules.cfg"
and when I try to modify the ha configuration I have the message "ha groups have not been migrated yet "(500)""

How can I migrate the HA configuration alter the upgrade ?
See my reply above, the syslog on the current HA Manager (master) node should show a regular error that the HA groups couldn't be migrated and the reason why. This should point you in the direction what is missing (e.g. maintenance mode, not quorate, non-existent nodes, etc.). All nodes must have at least pve-manager 9.0.0 installed, the cluster must be quorate, and the LRMs must be active or idle.

I removed the 'groups.cfg' and 'resources.cfg' files and reconfigure HA but I have few VMs.
But as soon as you remove the groups.cfg, there is nothing to migrate anymore since there is no groups.cfg left.
 
Hello dakalex,

See my reply above, the syslog on the current HA Manager (master) node should show a regular error that the HA groups couldn't be migrated and the reason why. This should point you in the direction what is missing (e.g. maintenance mode, not quorate, non-existent nodes, etc.). All nodes must have at least pve-manager 9.0.0 installed, the cluster must be quorate, and the LRMs must be active or idle.


But as soon as you remove the groups.cfg, there is nothing to migrate anymore since there is no groups.cfg left.

l removed the files and reconfigure after the upgrade.

I found the problem... one node is not upgraded correctly :( !

Thank you.

Best regards.
Francis