Cannot create new HA, old node still in config.

kroem · Apr 9, 2019

I'm trying to add some HA config to my cluster, but the HA tab is locked due to an error saying:

"unable to read file '/etc/pve/nodes/dog/lrm_status' (500)"

"dog" is an old node which I have since removed. Any idea where I need to remove some config in order for PVE to not look for this config file?

Chris · Apr 9, 2019

Hi,
it seems that you did not remove the node correctly. Can you post your '/etc/pve/corosync.conf'? You probably have to remove the node entry there if it is present, see https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_corosync_configuration as guide on how to edit this file correctly.

kroem · Apr 9, 2019

Chris said:
Hi,
it seems that you did not remove the node correctly. Can you post your '/etc/pve/corosync.conf'? You probably have to remove the node entry there if it is present, see https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_corosync_configuration as guide on how to edit this file correctly.

Maybe, but the node is not there

Code:

root@cat:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: bug
    nodeid: 3
    quorum_votes: 1
    ring0_addr: bug
  }
  node {
    name: cat
    nodeid: 1
    quorum_votes: 1
    ring0_addr: cat
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: kroem
  config_version: 6
  interface {
    bindnetaddr: 10.1.1.134
    ringnumber: 0
  }
  ip_version: ipv4
  secauth: on
  version: 2
}

Oh yes, also, I have a rpi as a monitor node (only 2 pve hosts). Or, I HAD, now that's removed from /etc/corosync/corosync.conf (?). Does pve overwrite that file?

EDIT:

Code:

root@cat:~# pvecm status
Quorum information
------------------
Date:             Tue Apr  9 11:21:48 2019
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          3/8868
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2  
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000003          1 10.1.1.132
0x00000001          1 10.1.1.134 (local)
0x00000004          1 10.1.1.136
root@cat:~# service corosync status
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2019-04-08 12:45:06 CEST; 22h ago
     Docs: man:corosync
           man:corosync.conf
           man:corosync_overview
 Main PID: 11537 (corosync)
    Tasks: 2 (limit: 629145)
   Memory: 43.7M
      CPU: 11min 49.376s
   CGroup: /system.slice/corosync.service
           └─11537 /usr/sbin/corosync -f

Apr 09 10:04:01 cat corosync[11537]: warning [CPG   ] downlist left_list: 0 received
Apr 09 10:04:01 cat corosync[11537]:  [CPG   ] downlist left_list: 0 received
Apr 09 10:04:01 cat corosync[11537]: warning [CPG   ] downlist left_list: 0 received
Apr 09 10:04:01 cat corosync[11537]:  [CPG   ] downlist left_list: 0 received
Apr 09 10:04:01 cat corosync[11537]:  [CPG   ] downlist left_list: 0 received
Apr 09 10:04:01 cat corosync[11537]: warning [CPG   ] downlist left_list: 0 received
Apr 09 10:04:01 cat corosync[11537]: notice  [QUORUM] Members[3]: 3 1 4
Apr 09 10:04:01 cat corosync[11537]: notice  [MAIN  ] Completed service synchronization, ready to provide service.
Apr 09 10:04:01 cat corosync[11537]:  [QUORUM] Members[3]: 3 1 4
Apr 09 10:04:01 cat corosync[11537]:  [MAIN  ] Completed service synchronization, ready to provide service.
root@cat:~#

from the rpi

Code:

root@rpi-pve:~# cat /etc/corosync/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: bug
    nodeid: 3
    quorum_votes: 1
    ring0_addr: bug
  }
  node {
    name: cat
    nodeid: 1
    quorum_votes: 1
    ring0_addr: cat
  }
  node {
    name: rpi-pve
    nodeid: 4
    quorum_votes: 1
    ring0_addr: rpi-pve
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: kroem
  config_version: 6
  interface {
    bindnetaddr: 10.1.1.134
    ringnumber: 0
  }
  ip_version: ipv4
  secauth: on
  version: 2
}

Chris · Apr 9, 2019

You should not edit the file /etc/corosync/corosync.conf as changes to that file will not be propagated to /etc/pve/corosync.conf, while in the other way it will.
What's the output of `ha-manager status`? Any reference to the old node named 'dog' in `grep -r dog /etc/pve/ha/*`?

kroem · Apr 9, 2019

Chris said:
You should not edit the file /etc/corosync/corosync.conf as changes to that file will not be propagated to /etc/pve/corosync.conf, while in the other way it will.

Ok, was just following the wiki: https://pve.proxmox.com/wiki/Raspberry_Pi_as_third_node (I did think it was a little bit strange)

Chris said:
What's the output of `ha-manager status`? Any reference to the old node named 'dog' in `grep -r dog /etc/pve/ha/*`?

There's a log entry(?)

root@cat:~# grep -r dog /etc/pve/ha/*
/etc/pve/ha/manager_status:{"master_node":"cat","timestamp":1526833643,"node_status":{"dog":"online","bug":"online","cat":"online"},"service_status":{}}

Chris · Apr 9, 2019

Well, on the raspi you have no other option then to edit the /etc/corosync/corosync.conf, so this is correct. It seems that your pve-ha-crm still thinks 'dog' is online. Any errors in `journalctl -u pve-ha-crm` (needs to be executed on your master node), maybe try to restart the service by running `systemctl restart pve-ha-crm.service` (again on the master node).

kroem · Apr 10, 2019

Chris said:
Well, on the raspi you have no other option then to edit the /etc/corosync/corosync.conf, so this is correct.

OK, but on the pve nodes should I edit the /etc/pve/corosync files? If so, the wiki should be update

Chris said:
Any errors in `journalctl -u pve-ha-crm` (needs to be executed on your master node), maybe try to restart the service by running `systemctl restart pve-ha-crm.service` (again on the master node).

No errors. Both nodes have been reloaded multiple times, but this is what the log says if I restart the service:

Code:

root@cat:~# journalctl -u pve-ha-crm
-- Logs begin at Mon 2019-04-08 12:44:31 CEST, end at Wed 2019-04-10 00:03:31 CEST. --
Apr 08 12:45:07 cat systemd[1]: Starting PVE Cluster Ressource Manager Daemon...
Apr 08 12:45:08 cat pve-ha-crm[12811]: starting server
Apr 08 12:45:08 cat pve-ha-crm[12811]: status change startup => wait_for_quorum
Apr 08 12:45:08 cat systemd[1]: Started PVE Cluster Ressource Manager Daemon.
root@cat:~#
root@cat:~#
root@cat:~#
root@cat:~#
root@cat:~#
root@cat:~#
root@cat:~#
root@cat:~#
root@cat:~#
root@cat:~#
root@cat:~#
root@cat:~# systemctl restart pve-ha-crm.service
root@cat:~# journalctl -u pve-ha-crm
-- Logs begin at Mon 2019-04-08 12:44:31 CEST, end at Wed 2019-04-10 00:04:01 CEST. --
Apr 08 12:45:07 cat systemd[1]: Starting PVE Cluster Ressource Manager Daemon...
Apr 08 12:45:08 cat pve-ha-crm[12811]: starting server
Apr 08 12:45:08 cat pve-ha-crm[12811]: status change startup => wait_for_quorum
Apr 08 12:45:08 cat systemd[1]: Started PVE Cluster Ressource Manager Daemon.
Apr 10 00:03:57 cat systemd[1]: Stopping PVE Cluster Ressource Manager Daemon...
Apr 10 00:03:58 cat pve-ha-crm[12811]: received signal TERM
Apr 10 00:03:58 cat pve-ha-crm[12811]: server received shutdown request
Apr 10 00:03:58 cat pve-ha-crm[12811]: server stopped
Apr 10 00:03:59 cat systemd[1]: Stopped PVE Cluster Ressource Manager Daemon.
Apr 10 00:03:59 cat systemd[1]: Starting PVE Cluster Ressource Manager Daemon...
Apr 10 00:03:59 cat pve-ha-crm[972861]: starting server
Apr 10 00:03:59 cat pve-ha-crm[972861]: status change startup => wait_for_quorum
Apr 10 00:03:59 cat systemd[1]: Started PVE Cluster Ressource Manager Daemon.

No errors, right? (I'm not sure - but it does have quorum)

Chris · Apr 10, 2019

kroem said:
OK, but on the pve nodes should I edit the /etc/pve/corosync files? If so, the wiki should be update

Sorry my mistake, for the raspi setup this is intended, the wiki is fine. What is your current output of `pve-ha-crm status`, does the problem persist?

kroem · Apr 10, 2019

Chris said:
Sorry my mistake, for the raspi setup this is intended, the wiki is fine. What is your current output of `pve-ha-crm status`, does the problem persist?

Ok, thanks.

The status is "running", quorum is fine, but I still have that error message

Code:

root@cat:~# pve-ha-crm status       
running
root@cat:~# ssh bug pve-ha-crm status
running

Chris · Apr 10, 2019

I just noticed that you did not post the '/etc/corosync/corosync.conf' from the PVE nodes, are they the same as '/etc/pve/corosync.conf'? Did you increase the version number while editing the corosync.conf? How exactly did you remove the raspi? From the pvecm status output it seems the cluster still thinks the raspi is part of it.

kroem · Apr 10, 2019

Chris said:
I just noticed that you did not post the '/etc/corosync/corosync.conf' from the PVE nodes, are they the same as '/etc/pve/corosync.conf'? Did you increase the version number while editing the corosync.conf? How exactly did you remove the raspi? From the pvecm status output it seems the cluster still thinks the raspi is part of it.

Sorry, they are below. And to confirm - I have not remove the raspi, I have removed a node called "dog" (long time ago). I did not increment the config version when editing the corosync.conf files from the raspi (from wiki). Should I do that, and send the /etc/corosync/corosync.conf file to the pve nodes? Will that file populate /etc/pve/corosync.conf?

Still, feel like the problem might be somewhere else, as "dog" is not listed anywhere in these files.

Code:

root@cat:~# cat /etc/corosync/corosync.conf
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: bug
nodeid: 3
quorum_votes: 1
ring0_addr: bug
}
node {
name: cat
nodeid: 1
quorum_votes: 1
ring0_addr: cat
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: kroem
config_version: 6
interface {
bindnetaddr: 10.1.1.134
ringnumber: 0
}
ip_version: ipv4
secauth: on
version: 2
}

root@cat:~# cat /etc/pve/corosync.conf
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: bug
nodeid: 3
quorum_votes: 1
ring0_addr: bug
}
node {
name: cat
nodeid: 1
quorum_votes: 1
ring0_addr: cat
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: kroem
config_version: 6
interface {
bindnetaddr: 10.1.1.134
ringnumber: 0
}
ip_version: ipv4
secauth: on
version: 2
}

Code:

root@cat:~# ssh bug cat /etc/corosync/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: bug
    nodeid: 3
    quorum_votes: 1
    ring0_addr: bug
  }
  node {
    name: cat
    nodeid: 1
    quorum_votes: 1
    ring0_addr: cat
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: kroem
  config_version: 6
  interface {
    bindnetaddr: 10.1.1.134
    ringnumber: 0
  }
  ip_version: ipv4
  secauth: on
  version: 2
}


root@cat:~# ssh bug cat /etc/pve/corosync.conf            
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: bug
    nodeid: 3
    quorum_votes: 1
    ring0_addr: bug
  }
  node {
    name: cat
    nodeid: 1
    quorum_votes: 1
    ring0_addr: cat
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: kroem
  config_version: 6
  interface {
    bindnetaddr: 10.1.1.134
    ringnumber: 0
  }
  ip_version: ipv4
  secauth: on
  version: 2
}

Code:

root@cat:~# ssh rpi-pve cat /etc/corosync/corosync.conf  
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: bug
    nodeid: 3
    quorum_votes: 1
    ring0_addr: bug
  }
  node {
    name: cat
    nodeid: 1
    quorum_votes: 1
    ring0_addr: cat
  }
  node {
    name: rpi-pve
    nodeid: 4
    quorum_votes: 1
    ring0_addr: rpi-pve
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: kroem
  config_version: 6
  interface {
    bindnetaddr: 10.1.1.134
    ringnumber: 0
  }
  ip_version: ipv4
  secauth: on
  version: 2
}

Big thanks for your help ...

Chris · Apr 10, 2019

Okay so the problem seems to be in the /etc/pve/ha/manager_status. So please try the following procedure to solve the issue https://forum.proxmox.com/threads/ha-vm-always-starting-status.44468/page-2#post-244511

kroem · Apr 10, 2019

Chris said:
Okay so the problem seems to be in the /etc/pve/ha/manager_status. So please try the following procedure to solve the issue https://forum.proxmox.com/threads/ha-vm-always-starting-status.44468/page-2#post-244511

AH! It worked ! Thanks you so very very much!

Chris · Apr 10, 2019

Thanks go to @t.lamprecht for his guide. Glad it worked.

Search

Search

Cannot create new HA, old node still in config.

kroem

Well-Known Member

Chris

Proxmox Staff Member

kroem

Well-Known Member

Chris

Proxmox Staff Member

kroem

Well-Known Member

Chris

Proxmox Staff Member

kroem

Well-Known Member

Chris

Proxmox Staff Member

kroem

Well-Known Member

Chris

Proxmox Staff Member

kroem

Well-Known Member

Chris

Proxmox Staff Member

kroem

Well-Known Member

Chris

Proxmox Staff Member

We value your privacy