[SOLVED] Cluster creation problems

EmilS

New Member
Apr 15, 2019
3
0
1
Good morning all,

I tried the whole last week to create a cluster with the new version in Proxmox - hopeless. I was running a test cluster with version 5 with no problems. After I reinstalled the two nodes with fresh Debian Buster an Proxmox 6 I get following error everytime when I try to add the 2nd node to the cluster.

Code:
Login succeeded.
Request addition of this node
Join request OK, finishing setup locally
stopping pve-cluster service
backup old database to '/var/lib/pve-cluster/backup/config-1566200142.sql.gz'
shell-init: error retrieving current directory: getcwd: cannot access parent directories: Transport endpoint is not connected
waiting for quorum...OK
can't create shared ssh key database '/etc/pve/priv/authorized_keys'
(re)generate node files
generate new node certificate
unable to create directory '/etc/pve/nodes' - Permission denied

Code:
proxmox-ve: 6.0-2 (running kernel: 5.0.18-1-pve)
pve-manager: 6.0-5 (running version: 6.0-5/f8a710d7)
pve-kernel-5.0: 6.0-6
pve-kernel-helper: 6.0-6
pve-kernel-5.0.18-1-pve: 5.0.18-3
ceph-fuse: 12.2.11+dfsg1-2.1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
libjs-extjs: 6.0.1-10
libknet1: 1.10-pve2
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-3
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-7
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-63
lxcfs: 3.0.3-pve60
novnc-pve: 1.0.0-60
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-5
pve-cluster: 6.0-5
pve-container: 3.0-5
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-2
pve-qemu-kvm: 4.0.0-5
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-7
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1

Code:
Quorum information
------------------
Date:             Mon Aug 19 09:47:34 2019
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000001
Ring ID:          1/315984
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.***.***.*** (local)
0x00000002          1 172.***.***.***


Code:
Quorum information
------------------
Date:             Mon Aug 19 09:50:08 2019
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000002
Ring ID:          1/316016
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.**..***..**
0x00000002          1 172.**.**:**(local)

Multicast is working correctly, so version 5 worked without problems. Any idea, I already searched the forum and tried all solutions there.
In the GUI there is a "tls_process_server_certificate: certificate verify failed (596)" error.
Thank you in advance!

Viele Grüße
Emil
 
Last edited:
shell-init: error retrieving current directory: getcwd: cannot access parent directories: Transport endpoint is not connected
seems your pmxcfs on one of the nodes is not running:
* check the status of pve-cluster `systemctl status -l pve-cluster`
* restart the service

hope this helps!
 
Thank you for your fast answer Stoiko :)

Just tried again, as I saw this error. This time the same error appears, without shell init:error.
Clusterservice is running wíth no errors:.

Code:
,
Login succeeded.
Request addition of this node
Join request OK, finishing setup locally
stopping pve-cluster service
backup old database to '/var/lib/pve-cluster/backup/config-1566205076.sql.gz'
waiting for quorum...OK
can't create shared ssh key database '/etc/pve/priv/authorized_keys'
(re)generate node files
generate new node certificate
unable to create directory '/etc/pve/nodes' - Permission denied

Also the syslog spills out a error:
Code:
 pve-ha-lrm[1035]: unable to write lrm status file - unable to open file '/etc/pve/nodes/node2/lrm_status.tmp.1035' - No such file or directory
Maybe the file does not exist due the "Permission denied"?
 
* hmm - check the journal for messages from `corosync` and `pmxcfs` - maybe you lose quorum after joining?
* you need to run the cluster join as root!
hope this helps!
 
So i dont know why, but I tried it with another system as node 2 and now it's working. I do further investigations why it's not working on the previous system, maybe Networkhardware issues. Many thanks to you for your help Stoiko.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!