Not all nodes showing up in web of all nodes

1nerdyguy

Active Member
Apr 17, 2014
119
2
38
Just added a new machine to our cluster, but it's not showing up in the web gui for all nodes. If I go to say node C or D, all 6 show up, but if I go to node A, only A-E show up and no F.

I ssh'd into some of the nodes, and noticed their /etc/hosts doesn't have all the nodes there either. What can I do to fix?

Long story: I added 3 new nodes to my cluster as an upgrade, and will be killing off the 3 oldest soon. But in the meantime I have all 6 running, but not all 6 show up to eachother.
 
Hi,

pleas check if the corosync.conf on all nodes are the same.

it is located /etc/pve/corosync.conf
Also check pvecm status on all nodes.
 
Same issue.

Last node added does not show on web interface, however it does show up in corosync.conf and pvecm status
 
same issue here, the last node added is not visible in the webinterface. pvecm status and corosync.conf are ok and the same on all nodes.
 
Please check if pvestatd service ir running on all nodes:

# systemctl status pvestatd.service

And verify that all configured storages are accessible:

# pvesm status
 
Sorry for digging up this old thread, but we experience the exact same situation.

The cluster consists of 4 nodes.
Node 1 only shows cluster members 1, 2 and 3
All the other nodes show all cluster members (1, 2, 3 and 4).

As you might have guessed, node 4 was added last.

The problem is limited to the graphical web user interface,
pvesm status,
systemctl status pvestatd.service,
pvecm status and corosync.conf are all in perfect shape.

I also had a look at "pvesh get cluster/config/nodes", which allegedly feeds the
web interface - even on node 1 it shows all the other cluster members, it's JUST
the web interface that won't display it to the point where I can't create a container on
node 4 when I'm connected to the web UI on node 1 (drop-down box is missing node 4),
but I can do this on any other node.

Latest updates installed (paid update channel) as of this writing,
not rebooted the member yet.
 
There might be 3 things (or a combination of these 3)

1) check SSL
==========

First check if ssl-certificates are ok, from a good node check:

grep pve-ssl /var/log/syslog

If certificates are ok, what you can do on all nodes (at least I can) at the same time is:

2) restarting services
================

# stopping services
for s in pveproxy spiceproxy pvestatd; do /etc/init.d/$s stop; done

# stopping cluster
/etc/init.d/pve-cluster stop
or (depending on version)
systemctl stop pve-cluster

# stopping corosync
service corosync stop
# check if corosync really stopped
ps uxaw | grep corosync
# if not, just kill it
killall -9 corosync

# restarting cluster
service corosync start
/etc/init.d/pve-cluster start
or (depending on version)
systemctl start pve-cluster

# check if everything is ok
pvecm status
pvecm nodes

# start services
for s in pvestatd spiceproxy pveproxy; do /etc/init.d/$s start; done

3) sqlite
======

If 1 or 2 does not work, it might be a problem with the sqlite database. Even though /etc/pve contains the correct information, it is not the "real" configuration. The real configuration is in a sqlite database inside:

/var/lib/pve-cluster/config.db*

Usually it contains of 3 files:

config.db
config.db-shm
config.db-wal (Write-Ahead-Log)

You can take the sqlite database from a good node and put it on the node not in the cluster:

step 1: stop pve-cluster and corosync on both nodes so the sqlite-database is not in use. Then you will only see 1 file (config.db)
step 2: from the good node, copy the database to the other node: rsync -az /var/lib/pve-cluster/config.db root@<hostname_not_in_cluster>:/var/lib/pve-cluster/
step 3: on both nodes start corosync and pve-cluster.

Regards, Gijsbert
 
There might be 3 things (or a combination of these 3)

1) check SSL
==========

First check if ssl-certificates are ok, from a good node check:

grep pve-ssl /var/log/syslog

If certificates are ok, what you can do on all nodes (at least I can) at the same time is:

2) restarting services
================

# stopping services
for s in pveproxy spiceproxy pvestatd; do /etc/init.d/$s stop; done

# stopping cluster
/etc/init.d/pve-cluster stop
or (depending on version)
systemctl stop pve-cluster

# stopping corosync
service corosync stop
# check if corosync really stopped
ps uxaw | grep corosync
# if not, just kill it
killall -9 corosync

# restarting cluster
service corosync start
/etc/init.d/pve-cluster start
or (depending on version)
systemctl start pve-cluster

# check if everything is ok
pvecm status
pvecm nodes

# start services
for s in pvestatd spiceproxy pveproxy; do /etc/init.d/$s start; done

3) sqlite
======

If 1 or 2 does not work, it might be a problem with the sqlite database. Even though /etc/pve contains the correct information, it is not the "real" configuration. The real configuration is in a sqlite database inside:

/var/lib/pve-cluster/config.db*

Usually it contains of 3 files:

config.db
config.db-shm
config.db-wal (Write-Ahead-Log)

You can take the sqlite database from a good node and put it on the node not in the cluster:

step 1: stop pve-cluster and corosync on both nodes so the sqlite-database is not in use. Then you will only see 1 file (config.db)
step 2: from the good node, copy the database to the other node: rsync -az /var/lib/pve-cluster/config.db root@<hostname_not_in_cluster>:/var/lib/pve-cluster/
step 3: on both nodes start corosync and pve-cluster.

Regards, Gijsbert

Thank you for this! You saved me a lot of pain!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!