[SOLVED] Yes i changed hostnames.... almost fixed. One question.

fuzzyduck · Nov 19, 2023

I bought a tiny SBC as qdevice, so i thought heck now im going for it i might as well set this up right. I havent tried the qdevice yet, but i changed hostnames(yes stupid i learned) on the two nodes. Im now left with a working setup but weird looking:

what i did thusfar (Proxmox 8.04 on both boxes)
1. Changed hostnames on both nodes. /etc/hosts and /etc/hostnames
2. rebooted node 2 only first, which broke the cluster (with hindsight obviously) with all sorts of webgui problems and such and panicked. SSH worked.
3. changed back /etc/hosts and /etc/hostnames on both nodes and rebooted node2 again.
4. Cluster came back online and stuff worked, including green ticks all is ok. Node1 never had any reboot as of yet.
5. BUT while the ssh prompt of node2 was correct like im used to, the prompt of node1 was weird looking: root@192:~# I checked both host files and never have i made a typo as far as i know. Though it almost feels like a file is parsed incorrectly?
6. To keep calm, I decided to postpone the reboot of node 1.
7. Now 2 days later suddenly i see a ghost node popup in my datacenter when i log in via node1. And the green OK-ticks on the 2 nodes are gone. Or was it always there and didnt see? node1 is my default login so i think its new.

Screenshot 2023-11-19 at 10-48-12 192 - Proxmox Virtual Environment.png

Logging in through node2 give me the correct stats:

Screenshot 2023-11-19 at 12-03-13 pve2 - Proxmox Virtual Environment.png

remember i never rebooted node1 as of yet., but now im even more scared to to so

The question... Should i reboot node1, or should i check-double-check something first?

Code:

root@192:~# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
192.168.1.2 pve.local pve

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

root@192:~# cat /etc/hostname
pve

root@192:~# pvecm status
Cluster information
-------------------
Name:             pve-cluster
Config Version:   6
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Sun Nov 19 11:08:14 2023
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000001
Ring ID:          1.328
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.1.2 (local)
0x00000002          1 192.168.1.3

tempacc375924 · Nov 19, 2023

fuzzyduck said:
I bought a tiny SBC as qdevice, so i thought heck now im going for it i might as well set this up right. I havent tried the qdevice yet, but i changed hostnames(yes stupid i learned) on the two nodes. Im now left with a working setup but weird looking:

what i did thusfar (Proxmox 8.04 on both boxes)
1. Changed hostnames on both nodes. /etc/hosts and /etc/hostnames
2. rebooted node 2 only first, which broke the cluster (with hindsight obviously) with all sorts of webgui problems and such and panicked. SSH worked.
3. changed back /etc/hosts and /etc/hostnames on both nodes and rebooted node2 again.
4. Cluster came back online and stuff worked, including green ticks all is ok. Node1 never had any reboot as of yet.
5. BUT while the ssh prompt of node2 was correct like im used to, the prompt of node1 was weird looking: root@192:~# I checked both host files and never have i made a typo as far as i know. Though it almost feels like a file is parsed incorrectly?
6. To keep calm, I decided to postpone the reboot of node 1.
7. Now 2 days later suddenly i see a ghost node popup in my datacenter when i log in via node1. And the green OK-ticks on the 2 nodes are gone. Or was it always there and didnt see? node1 is my default login so i think its new.

View attachment 58370

Logging in through node2 give me the correct stats:
View attachment 58376

remember i never rebooted node1 as of yet., but now im even more scared to to so

The question... Should i reboot node1, or should i check-double-check something first?

Code:

root@192:~# cat /etc/hosts 127.0.0.1 localhost.localdomain localhost 192.168.1.2 pve.local pve # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts root@192:~# cat /etc/hostname pve root@192:~# pvecm status Cluster information ------------------- Name: pve-cluster Config Version: 6 Transport: knet Secure auth: on Quorum information ------------------ Date: Sun Nov 19 11:08:14 2023 Quorum provider: corosync_votequorum Nodes: 2 Node ID: 0x00000001 Ring ID: 1.328 Quorate: Yes Votequorum information ---------------------- Expected votes: 2 Highest expected: 2 Total votes: 2 Quorum: 2 Flags: Quorate Membership information ---------------------- Nodeid Votes Name 0x00000001 1 192.168.1.2 (local) 0x00000002 1 192.168.1.3

On your node1 (pve -> 192.168.1.2) run:
# hostnamectl set-hostname pve

Then go and check content of the leftovers directory:

Code:

# cd /etc/pve/nodes
#ls -la

You will likely have the weird 192 directory there which you can delete:

# rm -rf 192

After that, reload GUI in the browser.

Before doing more reboots, do you mind to show content of corosync.conf:
#ls -la /etc/pve/corosync.conf

fuzzyduck · Nov 19, 2023

tempacc375924 said:
On your node1 (pve -> 192.168.1.2) run:
# hostnamectl set-hostname pve

Then go and check content of the leftovers directory:

Code:

# cd /etc/pve/nodes #ls -la

You will likely have the weird 192 directory there which you can delete:
# rm -rf 192

After that, reload GUI in the browser.

Before doing more reboots, do you mind to show content of corosync.conf:
#ls -la /etc/pve/corosync.conf

Thank you so much for taking the time to read and respond to my question. Much appreciated! especially because it did something what shouldnt be done in the first place lol.

So before i go and maybe break more by setting the hostname once again, as stated in your command. This is the out put of some commands i found online:

root@192:/# hostname
192.168.1.2

root@192:/# hostnamectl
Static hostname: pve
Transient hostname: 192.168.1.2
...etc...

root@192:/#  cat /proc/sys/kernel/hostname
192.168.1.2

root@192:/# cat /etc/hostname
pve

I DO see the weird 192 directory in /etc/pve/nodes which might be safe to delete indeed,

Contents of corosync. No weird node called 192 there...

Code:

root@192:/# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.1.2
  }
  node {
    name: pve2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.1.3
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: pve-cluster
  config_version: 6
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

I havent taken any steps as of yet to fix this.

tempacc375924 · Nov 19, 2023

fuzzyduck said:
Thank you so much for taking the time to read and respond to my question. Much appreciated! especially because it did something what shouldnt be done in the first place lol.

No worries.

fuzzyduck said:
root@192:/# hostname [/QUOTE] Being on a host 192 makes this feel like some sort of a Bond movie. ;) [QUOTE="fuzzyduck, post: 606606, member: 124564"] ... [ICODE]root@192:/# cat /etc/hostname pve

The hostname outputs also entertaining, I believe you would be safe just rebooting after you get it back, before which...

fuzzyduck said:
I DO see the weird 192 directory in /etc/pve/nodes which might be safe to delete indeed,

Just do it.

It really only fixes the zombie in the GUI, but why not do it before reboot.

fuzzyduck said:

Contents of corosync. No weird node called 192 there...

Code:

root@192:/# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.1.2
  }
  node {
    name: pve2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.1.3
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: pve-cluster
  config_version: 6
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

So this is good, you do have the right IPs there, the aliases are fine, even it states you have a quorum (mind you while you are rebooting any one of the two, you are by definition having no quorum, so you need to wait for that other node to come back up then to see happy cluster again).

fuzzyduck said:
I havent taken any steps as of yet to fix this.

Time to reboot. Report the aftermath. I would want to go click around from GUI of the zombie node and try to access shell of the other, VMs there, etc.

fuzzyduck · Nov 19, 2023

right now its evening and the wife is watching TV through the server. So ill report back tomorrow!

What i DO did was removing the /etc/pve/nodes/192 dir which existed on both nodes (while seeing the error only when logging in through node1).

service pveproxy restart did not resolve the ghost node yet though.

EDIT i just found out that i cannot enter LXC nor VM's via webconsole nor terminal.

Code:

root@192:~# pct enter 102
Configuration file 'nodes/192/lxc/102.conf' does not exist
root@192:~#

the 2 node cluster is what i was trying to fix in the first place by adding a qdevice, but ended up here on the forum LOL

thx so far

fuzzyduck · Nov 20, 2023

tempacc375924 said:
No worries.

The hostname outputs also entertaining, I believe you would be safe just rebooting after you get it back, before which...

Just do it. It really only fixes the zombie in the GUI, but why not do it before reboot.

So this is good, you do have the right IPs there, the aliases are fine, even it states you have a quorum (mind you while you are rebooting any one of the two, you are by definition having no quorum, so you need to wait for that other node to come back up then to see happy cluster again).

Time to reboot. Report the aftermath. I would want to go click around from GUI of the zombie node and try to access shell of the other, VMs there, etc.

The aftermath is pretty hopeful!
Booted without errors to GUI. Zombie node is gone, BUT the /etc/pve/nodes/192 zombie node dir got recreated. I deleted it once more and monitoring the situating. Zo maybe a file somewhere gets parsed every now and then, making the cluster think there is another node?

At least i have access to the LXC and VM again because my prompt is as it should be.

Code:

root@pve:~#

Im keeping this topic unsolved for couple more days to monitor.

tempacc375924 · Nov 20, 2023

fuzzyduck said:
What i DO did was removing the /etc/pve/nodes/192 dir which existed on both nodes (while seeing the error only when logging in through node1).

/etc/pve/... is mounted filesystem shared across cluster nodes

tempacc375924 · Nov 20, 2023

fuzzyduck said:
The aftermath is pretty hopeful!
Booted without errors to GUI. Zombie node is gone, BUT the /etc/pve/nodes/192 zombie node dir got recreated. I deleted it once more and monitoring the situating. Zo maybe a file somewhere gets parsed every now and then, making the cluster think there is another node?

In your case (renaming back and forth), it might have been actually best to reboot both nodes at the same time.

The interesting part is, you should be able to run the corosync purely on IPs, so your names could be anything really.

fuzzyduck said:
At least i have access to the LXC and VM again because my prompt is as it should be.

Code:

root@pve:~#

Ok good!

fuzzyduck said:
Im keeping this topic unsolved for couple more days to monitor.

Keep us posted. Also if the 192 gets recreated, have a look at the corosync files what's in there at that very time.

tempacc375924 · Nov 20, 2023

fuzzyduck said:
the 2 node cluster is what i was trying to fix in the first place by adding a qdevice, but ended up here on the forum LOL

One more thing ... how is the qdevice working for you so far?

SBC device is ... ?

fuzzyduck · Nov 20, 2023

tempacc375924 said:
One more thing ... how is the qdevice working for you so far? SBC device is ... ?

Haha thx for asking. Its a Single Board Computer. In my case:

http://www.orangepi.org/html/hardWare/computerAndMicrocontrollers/details/Orange-Pi-Zero-LTS.html

10 euros/dollar 2nd hand, with NIC.

I wanted one of my 4 router/accesspoints to be a qdevice, since they run OpenWrt firmware(custom linux) which opens them up for more then just router (you get ssh prompt). But since i cannot compile on it( no know-how ) i went this route, which has plain vanilla debian on it.

Already have the latest corosync-qnetd installed , but ran into setup-issues. Its another story, and not for this topic i guess.

tempacc375924 · Nov 20, 2023

fuzzyduck said:
Haha thx for asking. Its a Single Board Computer. In my case:

http://www.orangepi.org/html/hardWare/computerAndMicrocontrollers/details/Orange-Pi-Zero-LTS.html

10 euros/dollar 2nd hand, with NIC.

That's a Cortex A7, it sounds like a great candidate for just that.

fuzzyduck said:
I wanted one of my 4 router/accesspoints to be a qdevice, since they run OpenWrt firmware(custom linux) which opens them up for more then just router (you get ssh prompt). But since i cannot compile on it( no know-how ) i went this route, which has plain vanilla debian on it.

You could even run it outside of the network even (if you already run a VM or CT out there), it's the one thing that does NOT require low latency (with the rest of the cluster).

fuzzyduck said:
Already have the latest corosync-qnetd installed , but ran into setup-issues. Its another story, an not for this topic i guess.

Fair enough, feel free to open a new one.

fuzzyduck · Nov 21, 2023

After the reboot of node1 and monitoring the 2 node cluster for a day i didnt see the ghost 192 dir return. Even the qdevice setup is now working and i have 3 votes now for HA. Happy days.

Thank you

tempacc375924

for helping me out kicking me forward, where i had stalled.

Search

Search

[SOLVED] Yes i changed hostnames.... almost fixed. One question.

fuzzyduck

Member

tempacc375924

Member

fuzzyduck

Member

tempacc375924

Member

fuzzyduck

Member

fuzzyduck

Member

tempacc375924

Member

tempacc375924

Member

tempacc375924

Member

fuzzyduck

Member

tempacc375924

Member

fuzzyduck

Member

tempacc375924

We value your privacy

[SOLVED] Yes i changed hostnames.... almost fixed. One question.

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

tempacc375924​

We value your privacy

tempacc375924