[SOLVED] Cluster creation problems

EmilS

New Member
Apr 15, 2019
3
0
1
Good morning all,

I tried the whole last week to create a cluster with the new version in Proxmox - hopeless. I was running a test cluster with version 5 with no problems. After I reinstalled the two nodes with fresh Debian Buster an Proxmox 6 I get following error everytime when I try to add the 2nd node to the cluster.

Code:
Login succeeded.
Request addition of this node
Join request OK, finishing setup locally
stopping pve-cluster service
backup old database to '/var/lib/pve-cluster/backup/config-1566200142.sql.gz'
shell-init: error retrieving current directory: getcwd: cannot access parent directories: Transport endpoint is not connected
waiting for quorum...OK
can't create shared ssh key database '/etc/pve/priv/authorized_keys'
(re)generate node files
generate new node certificate
unable to create directory '/etc/pve/nodes' - Permission denied

Code:
proxmox-ve: 6.0-2 (running kernel: 5.0.18-1-pve)
pve-manager: 6.0-5 (running version: 6.0-5/f8a710d7)
pve-kernel-5.0: 6.0-6
pve-kernel-helper: 6.0-6
pve-kernel-5.0.18-1-pve: 5.0.18-3
ceph-fuse: 12.2.11+dfsg1-2.1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
libjs-extjs: 6.0.1-10
libknet1: 1.10-pve2
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-3
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-7
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-63
lxcfs: 3.0.3-pve60
novnc-pve: 1.0.0-60
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-5
pve-cluster: 6.0-5
pve-container: 3.0-5
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-2
pve-qemu-kvm: 4.0.0-5
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-7
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1

Code:
Quorum information
------------------
Date:             Mon Aug 19 09:47:34 2019
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000001
Ring ID:          1/315984
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.***.***.*** (local)
0x00000002          1 172.***.***.***


Code:
Quorum information
------------------
Date:             Mon Aug 19 09:50:08 2019
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000002
Ring ID:          1/316016
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.**..***..**
0x00000002          1 172.**.**:**(local)

Multicast is working correctly, so version 5 worked without problems. Any idea, I already searched the forum and tried all solutions there.
In the GUI there is a "tls_process_server_certificate: certificate verify failed (596)" error.
Thank you in advance!

Viele Grüße
Emil
 
Last edited:
shell-init: error retrieving current directory: getcwd: cannot access parent directories: Transport endpoint is not connected
seems your pmxcfs on one of the nodes is not running:
* check the status of pve-cluster `systemctl status -l pve-cluster`
* restart the service

hope this helps!
 
Thank you for your fast answer Stoiko :)

Just tried again, as I saw this error. This time the same error appears, without shell init:error.
Clusterservice is running wíth no errors:.

Code:
,
Login succeeded.
Request addition of this node
Join request OK, finishing setup locally
stopping pve-cluster service
backup old database to '/var/lib/pve-cluster/backup/config-1566205076.sql.gz'
waiting for quorum...OK
can't create shared ssh key database '/etc/pve/priv/authorized_keys'
(re)generate node files
generate new node certificate
unable to create directory '/etc/pve/nodes' - Permission denied

Also the syslog spills out a error:
Code:
 pve-ha-lrm[1035]: unable to write lrm status file - unable to open file '/etc/pve/nodes/node2/lrm_status.tmp.1035' - No such file or directory
Maybe the file does not exist due the "Permission denied"?
 
* hmm - check the journal for messages from `corosync` and `pmxcfs` - maybe you lose quorum after joining?
* you need to run the cluster join as root!
hope this helps!
 
So i dont know why, but I tried it with another system as node 2 and now it's working. I do further investigations why it's not working on the previous system, maybe Networkhardware issues. Many thanks to you for your help Stoiko.