The cluster does not start and /etc pve transport endpoint is not connected

S

Shazams

Guest
Hello everyone.
I have Proxmox 6.1 with the latest updates.
There was a node in the cluster, but I deleted it through pvecm delnode nodename.
When trying to add back pvecm add nodename swore at various files from the cluster. Files were deleted by the script https://gist.github.com/ianchen06/73acc392c72d6680099b7efac1351f56#gistcomment-3054405
There was a warning when trying to add a node
* this host already contains virtual guests

WARNING: detected error but forced to continue!

I ignored it, because there were no files and the node was initially empty.
After adding, a quorum hung on me, the task I canceled with ctrl + c.
After trying to enter the web interface, everything hung. I restarted the cluster on the master node and it did not start. Also, I could not get into the /etc/pve directory.

systemctl status pve-cluster
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2020-05-22 03:29:54 MSK; 7s ago
Process: 23490 ExecStart = / usr / bin / pmxcfs (code = exited, status = 255 / EXCEPTION)

May 22 03:29:53 VT-SupportMachine1 pmxcfs [23490]: [main] crit: fuse_mount error: Transport endpoint is not connected
May 22 03:29:53 VT-SupportMachine1 systemd [1]: pve-cluster.service: Failed with result 'exit-code'.
May 22 03:29:53 VT-SupportMachine1 pmxcfs [23490]: [main] notice: exit proxmox configuration filesystem (-1)
May 22 03:29:53 VT-SupportMachine1 systemd [1]: Failed to start The Proxmox VE cluster filesystem.
May 22 03:29:53 VT-SupportMachine1 systemd [1]: pve-cluster.service: Service RestartSec = 100ms expired, scheduling restart.
May 22 03:29:54 VT-SupportMachine1 systemd [1]: pve-cluster.service: Scheduled restart job, restart counter is at 5.
May 22 03:29:54 VT-SupportMachine1 systemd [1]: Stopped The Proxmox VE cluster filesystem.
May 22 03:29:54 VT-SupportMachine1 systemd [1]: pve-cluster.service: Start request repeated too quickly.
May 22 03:29:54 VT-SupportMachine1 systemd [1]: pve-cluster.service: Failed with result 'exit-code'.
May 22 03:29:54 VT-SupportMachine1 systemd [1]: Failed to start The Proxmox VE cluster filesystem.

journalctl -xe
May 22 03:31:00 VT-SupportMachine1 systemd[1]: Starting Proxmox VE replication runner...

-- Subject: A start job for unit pvesr.service has begun execution

-- Defined-By: systemd

-- Support: https://www.debian.org/support

--

-- A start job for unit pvesr.service has begun execution.

--

-- The job identifier is 2589688.

May 22 03:31:01 VT-SupportMachine1 pve-ha-lrm[1440]: updating service status from manager failed: Connection refused

May 22 03:31:01 VT-SupportMachine1 pve-firewall[1396]: status update error: Connection refused

May 22 03:31:01 VT-SupportMachine1 pvesr[23535]: ipcc_send_rec[1] failed: Connection refused

May 22 03:31:01 VT-SupportMachine1 pvesr[23535]: ipcc_send_rec[2] failed: Connection refused

May 22 03:31:01 VT-SupportMachine1 pvesr[23535]: ipcc_send_rec[3] failed: Connection refused

May 22 03:31:01 VT-SupportMachine1 pvesr[23535]: Unable to load access control list: Connection refused

May 22 03:31:01 VT-SupportMachine1 systemd[1]: pvesr.service: Main process exited, code=exited, status=111/n/a

-- Subject: Unit process exited

-- Defined-By: systemd

-- Support: https://www.debian.org/support

--

-- An ExecStart= process belonging to unit pvesr.service has exited.

--

-- The process' exit code is 'exited' and its exit status is 111.

May 22 03:31:01 VT-SupportMachine1 systemd[1]: pvesr.service: Failed with result 'exit-code'.

-- Subject: Unit failed

-- Defined-By: systemd

-- Support: https://www.debian.org/support

--

-- The unit pvesr.service has entered the 'failed' state with result 'exit-code'.

May 22 03:31:01 VT-SupportMachine1 systemd[1]: Failed to start Proxmox VE replication runner.

-- Subject: A start job for unit pvesr.service has failed

-- Defined-By: systemd

-- Support: https://www.debian.org/support

--

-- A start job for unit pvesr.service has finished with a failure.

--

-- The job identifier is 2589688 and the job result is failed.

May 22 03:31:01 VT-SupportMachine1 cron[1364]: (*system*vzdump) CAN'T OPEN SYMLINK (/etc/cron.d/vzdump)

May 22 03:31:06 VT-SupportMachine1 pve-ha-lrm[1440]: updating service status from manager failed: Connection refused

May 22 03:31:11 VT-SupportMachine1 pve-ha-lrm[1440]: updating service status from manager failed: Connection refused

May 22 03:31:11 VT-SupportMachine1 pve-firewall[1396]: status update error: Connection refused

May 22 03:31:16 VT-SupportMachine1 pve-ha-lrm[1440]: updating service status from manager failed: Connection refused

May 22 03:31:21 VT-SupportMachine1 pve-ha-lrm[1440]: updating service status from manager failed: Connection refused

May 22 03:31:21 VT-SupportMachine1 pve-firewall[1396]: status update error: Connection refused
May 22 03:31:26 VT-SupportMachine1 pve-ha-lrm[1440]: updating service status from manager failed: Connection refused

The master node in
/var/lib/pve-cluster/config.db
has all the data, but I did not touch the database, but only made a backup

When I try to view the master node console from another working node, I get an error
Connection failed (Error 500: hostname lookup 'VT-SupportMachine1' failed - failed to get address info for: VT-SupportMachine1: Name or service not known)

How to make the master node work correctly? At the same time, other nodes work correctly (the cluster restarts)
 
qm list -full
ipcc_send_rec[1] failed: Connection refused
ipcc_send_rec[2] failed: Connection refused
ipcc_send_rec[3] failed: Connection refused
Unable to load access control list: Connection refused
 
pveversion -v
proxmox-ve: 6.1-2 (running kernel: 5.3.18-3-pve)
pve-manager: 6.1-8 (running version: 6.1-8/806edfe1)
pve-kernel-helper: 6.1-8
pve-kernel-5.3: 6.1-6
pve-kernel-4.15: 5.4-13
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.18-1-pve: 5.3.18-1
pve-kernel-4.15.18-25-pve: 4.15.18-53
pve-kernel-4.4.134-1-pve: 4.4.134-112
pve-kernel-4.4.128-1-pve: 4.4.128-111
pve-kernel-4.4.114-1-pve: 4.4.114-108
pve-kernel-4.4.98-6-pve: 4.4.98-107
pve-kernel-4.4.98-2-pve: 4.4.98-101
pve-kernel-4.4.95-1-pve: 4.4.95-99
pve-kernel-4.4.83-1-pve: 4.4.83-96
pve-kernel-4.4.79-1-pve: 4.4.79-95
pve-kernel-4.4.76-1-pve: 4.4.76-94
pve-kernel-4.4.67-1-pve: 4.4.67-92
pve-kernel-4.4.62-1-pve: 4.4.62-88
pve-kernel-4.4.59-1-pve: 4.4.59-87
pve-kernel-4.4.49-1-pve: 4.4.49-86
pve-kernel-4.4.44-1-pve: 4.4.44-84
pve-kernel-4.4.40-1-pve: 4.4.40-82
pve-kernel-4.4.35-2-pve: 4.4.35-79
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libpve-access-control: 6.0-6
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.0-17
libpve-guest-common-perl: 3.0-5
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-5
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 3.2.1-1
lxcfs: 4.0.1-pve1
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-3
pve-cluster: 6.1-4
pve-container: 3.0-23
pve-docs: 6.1-6
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.0-10
pve-firmware: 3.0-7
pve-ha-manager: 3.0-9
pve-i18n: 2.0-4
pve-qemu-kvm: 4.1.1-4
pve-xtermjs: 4.3.0-1
qemu-server: 6.1-7
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1
 
I rebooted the node and it exited the cluster.
Another one too.
When added to an existing cluster:

* authentication key '/ etc / corosync / authkey' already exists
* cluster config '/etc/pve/corosync.conf' already exists
* this host already contains virtual guests
Check if node may join a cluster failed!


pvecm status
Cluster information
-------------------
Name: VT-Cluster
Config Version: 7
Transport: knet
Secure auth: on

Cannot initialize CMAP service



What should I do?
 
on the separated node do:

Code:
systemctl stop corosync
systemctl stop pve-cluster
pmxcfs -l
rm /etc/pve/corosync.conf
rm /etc/corosync/authkey
killall pmxcfs
systemctl start pve-cluster
[/icode]

also this line [icode]ipcc_send_rec[2] failed: Connection refused[/icode] in your journal usually suggests a misconfiguration in [icode]/etc/hosts[/icode] file. if there are any lines starting with [icode]127.0.1.1[/icode] in your [icode]/etc/hosts[/icode] files in the nodes, please delete/comment these lines.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!