Cluster Generation

d3dl3g · Sep 10, 2023

hi all,
n00b here. using the wiki (https://pve.proxmox.com/wiki/Cluster_Manager#pvecm_create_cluster)
I end up getting the 2nd node of the cluster always showing a red cross,
also shows "/etc/pve/nodes/proxmox/pve-ssl.pem' does not exist (500)"

any help or advice warmly recieved. you may have to ELI5

Leonardo Mercadante · Sep 11, 2023

d3dl3g said:
hi all,
n00b here. using the wiki (https://pve.proxmox.com/wiki/Cluster_Manager#pvecm_create_cluster)
I end up getting the 2nd node of the cluster always showing a red cross,
also shows "/etc/pve/nodes/proxmox/pve-ssl.pem' does not exist (500)"

any help or advice warmly recieved. you may have to ELI5

Hello!

when i was starting I got the same problem so, just check the fingerprint and the addresses on the join information, check the users (root@pve)
and the root passwd and i'm sure it will work properly

jsterr · Sep 11, 2023

Also make sure that you have set a timeserver, I had this issue once because of no timeserver configured (or public defaults are blocked)

d3dl3g · Sep 11, 2023

I set NTP server to matching servers on "pve2" and "pve3" @jsterr

---------------------------------------------------------------

@Leonardo Mercadante attempted recreation of cluster i used "Create cluster" > "Join Info" > "Copy Code" option from pve2
access pve3 "Join Cluster" > "Paste Code"... pve3 web ui no longer able to log in (i believe this to be normal behaviour because pve2 is controlling?)

results in same behaviour as before.
Cluster appers to be created, however, pve3 red cross and ssl.pem error

d3dl3g · Sep 12, 2023

Ok so looking back at logs (the one that comes up at the bottom of prox web ui)


Please enter superuser (root) password for '192.168.0.6': ********
Establishing API connection with host '192.168.0.6'
The authenticity of host '192.168.0.6' can't be established.
X509 SHA256 key fingerprint is 62:B9:48:03:FD:71:26:48:C0:A9:EC:CF:7D:DF:E6:3F:8C:70:2xx:Xxx:Xxx:Xx:xx:xx:xx:.
Are you sure you want to continue connecting (yes/no)? Login succeeded.
check cluster join API version
No cluster network links passed explicitly, fallback to local node IP '192.168.0.18'
Request addition of this node
An error occurred on the cluster node: cluster not ready - no quorum?

TASK ERROR: Cluster join aborted!

jsterr · Sep 12, 2023

Please post your /etc/hosts file and your /etc/pve/corosync.conf file. Login should be able from all nodes, as long you have quorum. please post the files for both nodes (if possible).

d3dl3g · Sep 12, 2023

@jsterr as requested

PVE 2
Hosts

Bash:

  GNU nano 7.2                                           /etc/hosts                                                     127.0.0.1 localhost.localdomain localhost
192.168.0.6 pve2.home pve2

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

Corosync

Bash:

  GNU nano 7.2                                     /etc/pve/corosync.conf *
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve2
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.0.6
  }
  node {
    name: pve3
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.0.18
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: Home
  config_version: 2
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

-------------------------------------------------------------------------------------

PVE 3
Hosts

Bash:

  GNU nano 7.2                                           /etc/hosts                                                     127.0.0.1 localhost.localdomain localhost
192.168.0.18 pve3.home pve3

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

Corosync

Bash:

  GNU nano 7.2                                     /etc/pve/corosync.conf *
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve2
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.0.6
  }
  node {
    name: pve3
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.0.18
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: Home
  config_version: 2
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

d3dl3g · Sep 12, 2023

FTR, Weblogin to "master" works fine.

Weblogin to "slave" fails after cluster creation.

SSH to both works absolutely fine

jsterr · Sep 13, 2023

What happens if you shutdown node1, can you login via ui on node2?

d3dl3g · Sep 13, 2023

No login via webui to node2 with node1 up or down.

Ping and SSH to node2 is flawless

edit: @jsterr

d3dl3g · Sep 24, 2023

so ive had some movement....

rebuilt and updated my test nodes. (pve2 and pve3)

Managed to get them clustered....

my problem is getting my main pve1 into the cluster. i THINK it may be a version issue. Having said that i tried having it as the primary and had to https://pve.proxmox.com/wiki/Cluster_Manager#_remove_a_cluster_node to get it to stand on its own 2 feet again after cluster join failures.

now... when joining pve1 to pve2 i can get to pve2 shell but it shows the node as offline.

ALL 3 nodes are Proxmox 8.0.4. is this what is meant by "same version"?
if i look at shell
pve1 = "Linux pve 6.2.16-10-pve #1 SMP PREEMPT_DYNAMIC PMX 6.2.16-10 (2023-08-18T11:42Z) x86_64"
pve2 = "Linux pve2 6.2.16-14-pve #1 SMP PREEMPT_DYNAMIC PMX 6.2.16-14 (2023-09-19T08:17Z) x86_64"
pve3 = "Linux pve3 6.2.16-3-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.16-3 (2023-06-17T05:58Z) x86_64"

There are some differences in proxmox system disk formats, but i dont know if that matters much. 1 is ext4 iirc and 2&3 are btrfs raid1

Any thoughts and advice still warmly recieved
@jsterr

d3dl3g · Sep 24, 2023

wait... does pve1 need to be devoid of ct/vm?

d3dl3g · Nov 6, 2023

[Solved]

Issue lay with HP ProCurve 1810 switch. After a LOT of playing around with HW and countless times rebuilding Proxmox on my 2x bare metal systems, i moved them on to a much newer (cheaper) Netgear managed 8-port switch and it just worked.

When i had the cluster created correctly i moved "pve1" back to its home via the HP Switch, needless to say pve1 & 2 could no longer see each other... i had an inkling (sp?) it was "IGMP Snooping" related as this was kinda the ONLY difference between the way my trafic was monitored/modified.
The 1810 doesnt expressly have IGMP Snooping as a toggle-able option. However it has "Storm Control" and "Auto DoS" options.
i learned that if i turned off "Auto DoS" then both my nodes turned to a green tick and the cluster works as expected.

Theres historically been posts on here that say "Absoluely not, cant use an 1810, buy a new HW" or words to the same effect but i was convinced that an enterprise switch even from so many years gone by, it should be possible to "just let stuff through".

im not 100% sure why theres soo much of that type of traffic traversing between nodes that it triggers the switch's internal mechanisms to drop traffic, i am also not 100% sure if its a factory default or something i set a long time ago with the best intentions.
For me, the issue is fixed. and it works as expected.

ive also spun up a QDevice on a surplus to requirement RPi0W and that works as expected (yes, even over wifi) so ive had quite a productive weekend.

Search

Search

Cluster Generation

d3dl3g

New Member

Leonardo Mercadante

New Member

jsterr

Famous Member

d3dl3g

New Member

Attachments

d3dl3g

New Member

jsterr

Famous Member

d3dl3g

New Member

d3dl3g

New Member

jsterr

Famous Member

d3dl3g

New Member

Attachments

d3dl3g

New Member

d3dl3g

New Member

d3dl3g

New Member

We value your privacy