Cannot create new HA, old node still in config.

kroem

Well-Known Member
Jul 12, 2016
45
0
46
39
I'm trying to add some HA config to my cluster, but the HA tab is locked due to an error saying:

"unable to read file '/etc/pve/nodes/dog/lrm_status' (500)"

"dog" is an old node which I have since removed. Any idea where I need to remove some config in order for PVE to not look for this config file?
 
Hi,
it seems that you did not remove the node correctly. Can you post your '/etc/pve/corosync.conf'? You probably have to remove the node entry there if it is present, see https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_corosync_configuration as guide on how to edit this file correctly.

Maybe, but the node is not there

Code:
root@cat:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: bug
    nodeid: 3
    quorum_votes: 1
    ring0_addr: bug
  }
  node {
    name: cat
    nodeid: 1
    quorum_votes: 1
    ring0_addr: cat
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: kroem
  config_version: 6
  interface {
    bindnetaddr: 10.1.1.134
    ringnumber: 0
  }
  ip_version: ipv4
  secauth: on
  version: 2
}

Oh yes, also, I have a rpi as a monitor node (only 2 pve hosts). Or, I HAD, now that's removed from /etc/corosync/corosync.conf (?). Does pve overwrite that file?

EDIT:

Code:
root@cat:~# pvecm status
Quorum information
------------------
Date:             Tue Apr  9 11:21:48 2019
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000001
Ring ID:          3/8868
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2  
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000003          1 10.1.1.132
0x00000001          1 10.1.1.134 (local)
0x00000004          1 10.1.1.136
root@cat:~# service corosync status
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2019-04-08 12:45:06 CEST; 22h ago
     Docs: man:corosync
           man:corosync.conf
           man:corosync_overview
 Main PID: 11537 (corosync)
    Tasks: 2 (limit: 629145)
   Memory: 43.7M
      CPU: 11min 49.376s
   CGroup: /system.slice/corosync.service
           └─11537 /usr/sbin/corosync -f

Apr 09 10:04:01 cat corosync[11537]: warning [CPG   ] downlist left_list: 0 received
Apr 09 10:04:01 cat corosync[11537]:  [CPG   ] downlist left_list: 0 received
Apr 09 10:04:01 cat corosync[11537]: warning [CPG   ] downlist left_list: 0 received
Apr 09 10:04:01 cat corosync[11537]:  [CPG   ] downlist left_list: 0 received
Apr 09 10:04:01 cat corosync[11537]:  [CPG   ] downlist left_list: 0 received
Apr 09 10:04:01 cat corosync[11537]: warning [CPG   ] downlist left_list: 0 received
Apr 09 10:04:01 cat corosync[11537]: notice  [QUORUM] Members[3]: 3 1 4
Apr 09 10:04:01 cat corosync[11537]: notice  [MAIN  ] Completed service synchronization, ready to provide service.
Apr 09 10:04:01 cat corosync[11537]:  [QUORUM] Members[3]: 3 1 4
Apr 09 10:04:01 cat corosync[11537]:  [MAIN  ] Completed service synchronization, ready to provide service.
root@cat:~#

from the rpi

Code:
root@rpi-pve:~# cat /etc/corosync/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: bug
    nodeid: 3
    quorum_votes: 1
    ring0_addr: bug
  }
  node {
    name: cat
    nodeid: 1
    quorum_votes: 1
    ring0_addr: cat
  }
  node {
    name: rpi-pve
    nodeid: 4
    quorum_votes: 1
    ring0_addr: rpi-pve
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: kroem
  config_version: 6
  interface {
    bindnetaddr: 10.1.1.134
    ringnumber: 0
  }
  ip_version: ipv4
  secauth: on
  version: 2
}
 
Last edited:
You should not edit the file /etc/corosync/corosync.conf as changes to that file will not be propagated to /etc/pve/corosync.conf, while in the other way it will.
What's the output of `ha-manager status`? Any reference to the old node named 'dog' in `grep -r dog /etc/pve/ha/*`?
 
You should not edit the file /etc/corosync/corosync.conf as changes to that file will not be propagated to /etc/pve/corosync.conf, while in the other way it will.
Ok, was just following the wiki: https://pve.proxmox.com/wiki/Raspberry_Pi_as_third_node (I did think it was a little bit strange)

What's the output of `ha-manager status`? Any reference to the old node named 'dog' in `grep -r dog /etc/pve/ha/*`?

There's a log entry(?)

root@cat:~# grep -r dog /etc/pve/ha/*
/etc/pve/ha/manager_status:{"master_node":"cat","timestamp":1526833643,"node_status":{"dog":"online","bug":"online","cat":"online"},"service_status":{}}
 
Well, on the raspi you have no other option then to edit the /etc/corosync/corosync.conf, so this is correct. It seems that your pve-ha-crm still thinks 'dog' is online. Any errors in `journalctl -u pve-ha-crm` (needs to be executed on your master node), maybe try to restart the service by running `systemctl restart pve-ha-crm.service` (again on the master node).
 
Well, on the raspi you have no other option then to edit the /etc/corosync/corosync.conf, so this is correct.
OK, but on the pve nodes should I edit the /etc/pve/corosync files? If so, the wiki should be update :)

Any errors in `journalctl -u pve-ha-crm` (needs to be executed on your master node), maybe try to restart the service by running `systemctl restart pve-ha-crm.service` (again on the master node).


No errors. Both nodes have been reloaded multiple times, but this is what the log says if I restart the service:
Code:
root@cat:~# journalctl -u pve-ha-crm
-- Logs begin at Mon 2019-04-08 12:44:31 CEST, end at Wed 2019-04-10 00:03:31 CEST. --
Apr 08 12:45:07 cat systemd[1]: Starting PVE Cluster Ressource Manager Daemon...
Apr 08 12:45:08 cat pve-ha-crm[12811]: starting server
Apr 08 12:45:08 cat pve-ha-crm[12811]: status change startup => wait_for_quorum
Apr 08 12:45:08 cat systemd[1]: Started PVE Cluster Ressource Manager Daemon.
root@cat:~#
root@cat:~#
root@cat:~#
root@cat:~#
root@cat:~#
root@cat:~#
root@cat:~#
root@cat:~#
root@cat:~#
root@cat:~#
root@cat:~#
root@cat:~# systemctl restart pve-ha-crm.service
root@cat:~# journalctl -u pve-ha-crm
-- Logs begin at Mon 2019-04-08 12:44:31 CEST, end at Wed 2019-04-10 00:04:01 CEST. --
Apr 08 12:45:07 cat systemd[1]: Starting PVE Cluster Ressource Manager Daemon...
Apr 08 12:45:08 cat pve-ha-crm[12811]: starting server
Apr 08 12:45:08 cat pve-ha-crm[12811]: status change startup => wait_for_quorum
Apr 08 12:45:08 cat systemd[1]: Started PVE Cluster Ressource Manager Daemon.
Apr 10 00:03:57 cat systemd[1]: Stopping PVE Cluster Ressource Manager Daemon...
Apr 10 00:03:58 cat pve-ha-crm[12811]: received signal TERM
Apr 10 00:03:58 cat pve-ha-crm[12811]: server received shutdown request
Apr 10 00:03:58 cat pve-ha-crm[12811]: server stopped
Apr 10 00:03:59 cat systemd[1]: Stopped PVE Cluster Ressource Manager Daemon.
Apr 10 00:03:59 cat systemd[1]: Starting PVE Cluster Ressource Manager Daemon...
Apr 10 00:03:59 cat pve-ha-crm[972861]: starting server
Apr 10 00:03:59 cat pve-ha-crm[972861]: status change startup => wait_for_quorum
Apr 10 00:03:59 cat systemd[1]: Started PVE Cluster Ressource Manager Daemon.

No errors, right? (I'm not sure - but it does have quorum)
 
OK, but on the pve nodes should I edit the /etc/pve/corosync files? If so, the wiki should be update :)
Sorry my mistake, for the raspi setup this is intended, the wiki is fine. What is your current output of `pve-ha-crm status`, does the problem persist?
 
Sorry my mistake, for the raspi setup this is intended, the wiki is fine. What is your current output of `pve-ha-crm status`, does the problem persist?
Ok, thanks.

The status is "running", quorum is fine, but I still have that error message

Code:
root@cat:~# pve-ha-crm status       
running
root@cat:~# ssh bug pve-ha-crm status
running
 
I just noticed that you did not post the '/etc/corosync/corosync.conf' from the PVE nodes, are they the same as '/etc/pve/corosync.conf'? Did you increase the version number while editing the corosync.conf? How exactly did you remove the raspi? From the pvecm status output it seems the cluster still thinks the raspi is part of it.
 
I just noticed that you did not post the '/etc/corosync/corosync.conf' from the PVE nodes, are they the same as '/etc/pve/corosync.conf'? Did you increase the version number while editing the corosync.conf? How exactly did you remove the raspi? From the pvecm status output it seems the cluster still thinks the raspi is part of it.

Sorry, they are below. And to confirm - I have not remove the raspi, I have removed a node called "dog" (long time ago). I did not increment the config version when editing the corosync.conf files from the raspi (from wiki). Should I do that, and send the /etc/corosync/corosync.conf file to the pve nodes? Will that file populate /etc/pve/corosync.conf?

Still, feel like the problem might be somewhere else, as "dog" is not listed anywhere in these files.

Code:
root@cat:~# cat /etc/corosync/corosync.conf
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: bug
nodeid: 3
quorum_votes: 1
ring0_addr: bug
}
node {
name: cat
nodeid: 1
quorum_votes: 1
ring0_addr: cat
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: kroem
config_version: 6
interface {
bindnetaddr: 10.1.1.134
ringnumber: 0
}
ip_version: ipv4
secauth: on
version: 2
}


root@cat:~# cat /etc/pve/corosync.conf
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: bug
nodeid: 3
quorum_votes: 1
ring0_addr: bug
}
node {
name: cat
nodeid: 1
quorum_votes: 1
ring0_addr: cat
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: kroem
config_version: 6
interface {
bindnetaddr: 10.1.1.134
ringnumber: 0
}
ip_version: ipv4
secauth: on
version: 2
}

Code:
root@cat:~# ssh bug cat /etc/corosync/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: bug
    nodeid: 3
    quorum_votes: 1
    ring0_addr: bug
  }
  node {
    name: cat
    nodeid: 1
    quorum_votes: 1
    ring0_addr: cat
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: kroem
  config_version: 6
  interface {
    bindnetaddr: 10.1.1.134
    ringnumber: 0
  }
  ip_version: ipv4
  secauth: on
  version: 2
}


root@cat:~# ssh bug cat /etc/pve/corosync.conf            
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: bug
    nodeid: 3
    quorum_votes: 1
    ring0_addr: bug
  }
  node {
    name: cat
    nodeid: 1
    quorum_votes: 1
    ring0_addr: cat
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: kroem
  config_version: 6
  interface {
    bindnetaddr: 10.1.1.134
    ringnumber: 0
  }
  ip_version: ipv4
  secauth: on
  version: 2
}

Code:
root@cat:~# ssh rpi-pve cat /etc/corosync/corosync.conf  
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: bug
    nodeid: 3
    quorum_votes: 1
    ring0_addr: bug
  }
  node {
    name: cat
    nodeid: 1
    quorum_votes: 1
    ring0_addr: cat
  }
  node {
    name: rpi-pve
    nodeid: 4
    quorum_votes: 1
    ring0_addr: rpi-pve
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: kroem
  config_version: 6
  interface {
    bindnetaddr: 10.1.1.134
    ringnumber: 0
  }
  ip_version: ipv4
  secauth: on
  version: 2
}

Big thanks for your help ...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!