Proxmox cluster discovery service adding hosts by itself

Imad Daou

Renowned Member
Nov 29, 2014
24
3
68
47
California, United States
imaddaou.com
Dear ProxMox community,

I have critical strange situation, I already have a cluster of 2 hosts (foghorn and leghorn). I was planning to create a separate new cluster with another 2 new hosts. I Installed Debian base on the new hosts (Rocky and Bullwinkle) and then installed proxmox on them, I left them runing since I was planing to create a new cluster this week and add them to it. Today, I found out that the older cluster called (claghorn) some how discovered the new hosts that has proxmox installed and tried to join them.

Looking at the attached image, you will see that both Rocky and Bullwinkle the new hosts were trying to be part of the claghorn cluster by themselves. Those are brand new names hosts that never used before. catting the pve members below at foghorn, shows that indeed the newly installed proxmox are there!

root@foghorn:~# cat /etc/pve/.members
{
"nodename": "foghorn",
"version": 24,
"cluster": { "name": "claghorn", "version": 4, "nodes": 4, "quorate": 1 },
"nodelist": {
"foghorn": { "id": 1, "online": 1, "ip": "10.20.35.14"},
"leghorn": { "id": 2, "online": 1, "ip": "10.20.35.15"},
"bullwinkle": { "id": 4, "online": 0, "ip": "10.20.35.18"},
"rocky": { "id": 3, "online": 0, "ip": "10.20.35.17"}
}
}
root@foghorn:~#

There must be a service some where trying to add the new Rocky and Bullwinkle hosts to claghorn cluster. How to disable this service and remove those hosts from the claghorn cluster.

I have attached a txt file for more information about the claghorn cluster.

I never ran the cluster join command on Rocky and Bullwinkle yet.

root@rocky:~# pvecm status
Error: Corosync config '/etc/pve/corosync.conf' does not exist - is this node part of a cluster?

root@bullwinkle:~# pvecm status
Error: Corosync config '/etc/pve/corosync.conf' does not exist - is this node part of a cluster?

Any help will be highly appreciated. Thank you all!
 

Attachments

  • Screenshot at 2020-09-01 16-48-19.png
    Screenshot at 2020-09-01 16-48-19.png
    14.7 KB · Views: 6
  • claghorn-cluster-nodes.txt
    6.4 KB · Views: 3
Last edited:
How did you setup the new nodes? from the ISO?
does the file '/etc/corosync/corosync.conf' exist on the new nodes?
 
Hi @Stoiko Ivanov,

Thank you man for your time.

For your first question:
I did not use ProxMox ISO with the new hosts. I used Debian netinstall and then installed ProxMox on top of it. Since I wanted to use btrfs on root partition.

For your second question:
No, here is the output of pvecm nodes on both new hosts. Man, I haven't joined those 2 hosts for any cluster yet. Is there a neighbor discovery proxy service some where running in claghorn cluster (claghorn is the cluster that running old hosts leghorn and foghorn) ?

debianadmin@rocky:~$ sudo pvecm nodes
Error: Corosync config '/etc/pve/corosync.conf' does not exist - is this node part of a cluster?

debianadmin@bullwinkle:~$ sudo pvecm nodes
[sudo] password for debianadmin:
Error: Corosync config '/etc/pve/corosync.conf' does not exist - is this node part of a cluster?

Your time is highly appreciated; I look forward for your response at your earliest convenience.

Imad
 
Last edited:
is corosync running on the new nodes ? (rocky, bullwinkle)

just to be sure - please post the output of:
Code:
stat /etc/corosync/corosync.conf

(the difference between /etc/pve/corosync.conf and /etc/corosync/corosync.conf is significant here)
 
@Stoiko Ivanov

Here is the out put as requested.

# Bullwinkle
debianadmin@bullwinkle:~$ sudo stat /etc/corosync/corosync.conf
[sudo] password for debianadmin:
stat: cannot stat '/etc/corosync/corosync.conf': No such file or directory

debianadmin@bullwinkle:~$ ll /etc/corosync/corosync.conf
ls: cannot access '/etc/corosync/corosync.conf': No such file or directory

# Rocky
debianadmin@rocky:~$ sudo stat /etc/corosync/corosync.conf
[sudo] password for debianadmin:
stat: cannot stat '/etc/corosync/corosync.conf': No such file or directory

debianadmin@rocky:~$ sudo ll /etc/corosync/corosync.conf
sudo: ll: command not found
debianadmin@rocky:~$ sudo ls /etc/corosync/corosync.conf
ls: cannot access '/etc/corosync/corosync.conf': No such file or directory
debianadmin@rocky:~$

Here is the old hosts output:

# Leghorn
root@leghorn:~# stat /etc/corosync/corosync.conf
File: /etc/corosync/corosync.conf
Size: 614 Blocks: 8 IO Block: 4096 regular file
Device: fd01h/64769d Inode: 525536 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2020-08-19 16:23:19.874056291 -0400
Modify: 2020-08-19 16:23:19.854055521 -0400
Change: 2020-08-19 16:23:19.854055521 -0400
Birth: -
root@leghorn:~#

# Foghorn
root@foghorn:~# stat /etc/corosync/corosync.conf
File: /etc/corosync/corosync.conf
Size: 614 Blocks: 8 IO Block: 4096 regular file
Device: fd01h/64769d Inode: 787863 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2020-08-19 16:23:19.874063716 -0400
Modify: 2020-08-19 16:23:19.854063515 -0400
Change: 2020-08-19 16:23:19.854063515 -0400
Birth: -
root@foghorn:~#



Thank you!
 
Last edited:
Hi @Stoiko Ivanov

is it okay to issue the pvecm delnode command for rocky and bullwinkle on leghorn or foghorn and just remove them from the claghorn cluster? Does it work this way without effecting claghorn cluster? And lastly, if removing them from the cluster works fine, can we still have them joined to a new cluster with no issues?
 
Last edited:
stat: cannot stat '/etc/corosync/corosync.conf': No such file or directory
this is strange - is corosync running on the new nodes? if not it's strange how they end up in the cluster... and if it is it is strange, how it starts without config file...
 
Hi @Stoiko Ivanov

No sir, it's not running

debianadmin@bullwinkle:~$ systemctl status corosync.service
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: inactive (dead)
Condition: start condition failed at Tue 2020-09-01 16:30:47 EDT; 1 day 20h ago
└─ ConditionPathExists=/etc/corosync/corosync.conf was not met
Docs: man:corosync
man:corosync.conf
man:corosync_overview

debianadmin@rocky:~$ systemctl status corosync.service
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: inactive (dead)
Condition: start condition failed at Fri 2020-08-28 16:22:26 EDT; 5 days ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
debianadmin@rocky:~$

Could it be the one of the following packages we installed below on the new hosts before installing ProxMox? I usually include these packages by default in case the system needs. I would highly appreciate your input regarding the following utilities or packages below and if it's okay to include them on ProxMox.

# Utilities packages

sudo apt update; sudo apt install aptitude debian-goodies make module-assistant wajig software-properties-common dirmngr psmisc lsof apt-transport-https curl snmp build-essential unzip zip bzip2 lzop apt-listchanges lshw sysstat htop ssh vim-nox debconf-utils lsb-release lsb-base man-db git procps apt-listchanges daemon autoconf automake libtool smartmontools openssl binutils sudo arj nomarch libgeoip-dev flex bison debhelper ssl-cert libc6 libpcre3-dev libexpat1 libssl-dev libpcre3 glibc-doc zlib1g-dev zlib1g sosreport

# Install Security Audit lynis, root kit, and Anti-virus
sudo apt install chkrootkit rkhunter lynis iptables

# Network tools
sudo update; sudo apt install iftop openssh-server ifstat dnsutils bridge-utils ifupdown ifenslave net-tools iperf nmap wget mtr apt-transport-https vnstat iptraf hping3 dstat slurm bmon nmon tcpdump
 
We trimmed the upper packages to the following:

# System, Network, and Security Packages
sudo apt update; sudo apt install aptitude debian-goodies wajig software-properties-common psmisc lsof apt-transport-https curl snmp build-essential unzip zip bzip2 lzop apt-listchanges lshw sysstat htop ssh vim-nox debconf-utils lsb-release lsb-base man-db git procps apt-listchanges binutils sudo debhelper zlib1g-dev zlib1g sosreport iftop ifstat dnsutils bridge-utils ifupdown ifenslave net-tools iperf nmap wget mtr vnstat iptraf dstat slurm bmon nmon tcpdump chkrootkit rkhunter lynis iptables

Thank you for your time.
 
please post the '/etc/corosync/corosync.conf' from leghorn and foghorn and /etc/pve/corosync.conf (is the same on both nodes)

did maybe you or someone in your organization manually edit the corosync config?

as for the tools - from a quick glance they should be ok (and not interfere with cluster traffic)
 
Hi @Stoiko Ivanov

Please find it below. We never edited these files manually.

root@leghorn:~# cat /etc/corosync/corosync.conf
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: bullwinkle
nodeid: 4
quorum_votes: 1
ring0_addr: 10.20.35.18
}
node {
name: foghorn
nodeid: 1
quorum_votes: 1
ring0_addr: 10.20.35.14
}
node {
name: leghorn
nodeid: 2
quorum_votes: 1
ring0_addr: 10.20.35.15
}
node {
name: rocky
nodeid: 3
quorum_votes: 1
ring0_addr: 10.20.35.17
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: claghorn
config_version: 4
interface {
linknumber: 0
}
ip_version: ipv4-6
secauth: on
version: 2
}

root@leghorn:~#

root@foghorn:~# cat /etc/corosync/corosync.conf
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: bullwinkle
nodeid: 4
quorum_votes: 1
ring0_addr: 10.20.35.18
}
node {
name: foghorn
nodeid: 1
quorum_votes: 1
ring0_addr: 10.20.35.14
}
node {
name: leghorn
nodeid: 2
quorum_votes: 1
ring0_addr: 10.20.35.15
}
node {
name: rocky
nodeid: 3
quorum_votes: 1
ring0_addr: 10.20.35.17
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: claghorn
config_version: 4
interface {
linknumber: 0
}
ip_version: ipv4-6
secauth: on
version: 2
}

root@foghorn:~#

Thank you, sir!
 
this at least explains, why your cluster thinks that bullwinkle and rocky are inside - somehow the configuration on the claghorn cluster got them added.

Corosync by itself (as well as the complete PVE-stack) does not autodiscover/add nodes without explicit configuration from the admin - so this is quite odd.

I would remove the nodes from the corosync configs on foghorn and leghorn - check the reference documentation on how to edit the corosync config in a safe/sensible manner:
https://pve.proxmox.com/pve-docs/chapter-pvecm.html

afterwards you can create a new cluster with bullwinkle and rocky - however make sure to select a different cluster-name
 
Dear @Stoiko Ivanov,

We got all sorted out by reinstalling Rockey and Bullwinkle using different names and assigned them to a new cluster right away. But, we still don't know how the old names winded up at claghorn cluster by themselves :0)

The only one thing I can think of and was unique about these 2 hosts that they were left (Rocky and Bullwinkle) through the weekends not assigned to any cluster, and somehow, the claghorn cluster sensed them. That tells me there must be a service doing that.

I guess the lesson here is not to leave any single host running during weekends without assigning it to a running cluster or a new cluster :0)

Man I don't know what to say, to confirm the weirdness of this situation again, we need to have a testing cluster with 2 machines, install 2 more hosts at the same subnet and leave them running for 2 or 3 days without cluster join and see what happens.

Overall, I want to thank you guys for your hard work and great support. I hope in the future I can convince my company to move forward for subscription support.
 
  • Like
Reactions: Stoiko Ivanov
We got all sorted out by reinstalling Rockey and Bullwinkle using different names and assigned them to a new cluster right away. But, we still don't know how the old names winded up at claghorn cluster by themselves :0)
Glad the immediate issue was resolved!

That tells me there must be a service doing that.
As said - PVE does not provide such a service - the clusterstack works on top of corosync, and corosync gets its configuration from the config-file
(PVE's tools are used for editing the config, but this happens only with manual actions by the administrator)

I guess the lesson here is not to leave any single host running during weekends without assigning it to a running cluster or a new cluster :0)
We have quite a few clusters running in our network here - and quite a few single nodes in the same network-segments - and haven't observed such spontaneous autojoining .

Overall, I want to thank you guys for your hard work and great support. I hope in the future I can convince my company to move forward for subscription support.
Always nice to hear! Thanks!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!