Cluster Generation

hi all,
n00b here. using the wiki (https://pve.proxmox.com/wiki/Cluster_Manager#pvecm_create_cluster)
I end up getting the 2nd node of the cluster always showing a red cross,
also shows "/etc/pve/nodes/proxmox/pve-ssl.pem' does not exist (500)"

any help or advice warmly recieved. you may have to ELI5
Hello!

when i was starting I got the same problem so, just check the fingerprint and the addresses on the join information, check the users (root@pve)
and the root passwd and i'm sure it will work properly
 
Also make sure that you have set a timeserver, I had this issue once because of no timeserver configured (or public defaults are blocked)
 
  • Like
Reactions: Leonardo Mercadante
I set NTP server to matching servers on "pve2" and "pve3" @jsterr
Screenshot 2023-09-11 230244.png
Screenshot 2023-09-11 230308.png

---------------------------------------------------------------

@Leonardo Mercadante attempted recreation of cluster i used "Create cluster" > "Join Info" > "Copy Code" option from pve2
access pve3 "Join Cluster" > "Paste Code"... pve3 web ui no longer able to log in (i believe this to be normal behaviour because pve2 is controlling?)

results in same behaviour as before.
Cluster appers to be created, however, pve3 red cross and ssl.pem error

Screenshot 2023-09-11 225044.png
 

Attachments

  • Screenshot 2023-09-11 225044.png
    Screenshot 2023-09-11 225044.png
    66 KB · Views: 1
Last edited:
Ok so looking back at logs (the one that comes up at the bottom of prox web ui)

Please enter superuser (root) password for '192.168.0.6': ******** Establishing API connection with host '192.168.0.6' The authenticity of host '192.168.0.6' can't be established. X509 SHA256 key fingerprint is 62:B9:48:03:FD:71:26:48:C0:A9:EC:CF:7D:DF:E6:3F:8C:70:2xx:Xxx:Xxx:Xx:xx:xx:xx:. Are you sure you want to continue connecting (yes/no)? Login succeeded. check cluster join API version No cluster network links passed explicitly, fallback to local node IP '192.168.0.18' Request addition of this node An error occurred on the cluster node: cluster not ready - no quorum? TASK ERROR: Cluster join aborted!
 
Last edited:
Please post your /etc/hosts file and your /etc/pve/corosync.conf file. Login should be able from all nodes, as long you have quorum. please post the files for both nodes (if possible).
 
@jsterr as requested

PVE 2
Hosts
Bash:
  GNU nano 7.2                                           /etc/hosts                                                     127.0.0.1 localhost.localdomain localhost
192.168.0.6 pve2.home pve2

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

Corosync
Bash:
  GNU nano 7.2                                     /etc/pve/corosync.conf *
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve2
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.0.6
  }
  node {
    name: pve3
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.0.18
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: Home
  config_version: 2
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}
-------------------------------------------------------------------------------------

PVE 3
Hosts
Bash:
  GNU nano 7.2                                           /etc/hosts                                                     127.0.0.1 localhost.localdomain localhost
192.168.0.18 pve3.home pve3

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

Corosync
Bash:
  GNU nano 7.2                                     /etc/pve/corosync.conf *
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve2
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.0.6
  }
  node {
    name: pve3
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.0.18
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: Home
  config_version: 2
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}
 
FTR, Weblogin to "master" works fine.

Weblogin to "slave" fails after cluster creation.

SSH to both works absolutely fine :)

Screenshot 2023-09-12 220455.png
 
What happens if you shutdown node1, can you login via ui on node2?
 
No login via webui to node2 with node1 up or down.

Ping and SSH to node2 is flawless

edit: @jsterr
 

Attachments

  • Screenshot_20230913_205313.png
    Screenshot_20230913_205313.png
    157 KB · Views: 2
  • Screenshot_20230913_210156.png
    Screenshot_20230913_210156.png
    146.1 KB · Views: 2
  • Screenshot_20230913_210233.png
    Screenshot_20230913_210233.png
    82.9 KB · Views: 2
Last edited:
so ive had some movement....

rebuilt and updated my test nodes. (pve2 and pve3)

Managed to get them clustered....

my problem is getting my main pve1 into the cluster. i THINK it may be a version issue. Having said that i tried having it as the primary and had to https://pve.proxmox.com/wiki/Cluster_Manager#_remove_a_cluster_node to get it to stand on its own 2 feet again after cluster join failures.

now... when joining pve1 to pve2 i can get to pve2 shell but it shows the node as offline.

ALL 3 nodes are Proxmox 8.0.4. is this what is meant by "same version"?
if i look at shell
pve1 = "Linux pve 6.2.16-10-pve #1 SMP PREEMPT_DYNAMIC PMX 6.2.16-10 (2023-08-18T11:42Z) x86_64"
pve2 = "Linux pve2 6.2.16-14-pve #1 SMP PREEMPT_DYNAMIC PMX 6.2.16-14 (2023-09-19T08:17Z) x86_64"
pve3 = "Linux pve3 6.2.16-3-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.16-3 (2023-06-17T05:58Z) x86_64"

There are some differences in proxmox system disk formats, but i dont know if that matters much. 1 is ext4 iirc and 2&3 are btrfs raid1

Any thoughts and advice still warmly recieved
@jsterr
 
[Solved]

Issue lay with HP ProCurve 1810 switch. After a LOT of playing around with HW and countless times rebuilding Proxmox on my 2x bare metal systems, i moved them on to a much newer (cheaper) Netgear managed 8-port switch and it just worked.

When i had the cluster created correctly i moved "pve1" back to its home via the HP Switch, needless to say pve1 & 2 could no longer see each other... i had an inkling (sp?) it was "IGMP Snooping" related as this was kinda the ONLY difference between the way my trafic was monitored/modified.
The 1810 doesnt expressly have IGMP Snooping as a toggle-able option. However it has "Storm Control" and "Auto DoS" options.
i learned that if i turned off "Auto DoS" then both my nodes turned to a green tick and the cluster works as expected.

Theres historically been posts on here that say "Absoluely not, cant use an 1810, buy a new HW" or words to the same effect but i was convinced that an enterprise switch even from so many years gone by, it should be possible to "just let stuff through".

im not 100% sure why theres soo much of that type of traffic traversing between nodes that it triggers the switch's internal mechanisms to drop traffic, i am also not 100% sure if its a factory default or something i set a long time ago with the best intentions.
For me, the issue is fixed. and it works as expected.

ive also spun up a QDevice on a surplus to requirement RPi0W and that works as expected (yes, even over wifi) so ive had quite a productive weekend.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!