S
Shazams
Guest
Hello everyone.
I have Proxmox 6.1 with the latest updates.
There was a node in the cluster, but I deleted it through pvecm delnode nodename.
When trying to add back pvecm add nodename swore at various files from the cluster. Files were deleted by the script https://gist.github.com/ianchen06/73acc392c72d6680099b7efac1351f56#gistcomment-3054405
There was a warning when trying to add a node
I ignored it, because there were no files and the node was initially empty.
After adding, a quorum hung on me, the task I canceled with ctrl + c.
After trying to enter the web interface, everything hung. I restarted the cluster on the master node and it did not start. Also, I could not get into the /etc/pve directory.
systemctl status pve-cluster
journalctl -xe
The master node in
When I try to view the master node console from another working node, I get an error
How to make the master node work correctly? At the same time, other nodes work correctly (the cluster restarts)
I have Proxmox 6.1 with the latest updates.
There was a node in the cluster, but I deleted it through pvecm delnode nodename.
When trying to add back pvecm add nodename swore at various files from the cluster. Files were deleted by the script https://gist.github.com/ianchen06/73acc392c72d6680099b7efac1351f56#gistcomment-3054405
There was a warning when trying to add a node
* this host already contains virtual guests
WARNING: detected error but forced to continue!
I ignored it, because there were no files and the node was initially empty.
After adding, a quorum hung on me, the task I canceled with ctrl + c.
After trying to enter the web interface, everything hung. I restarted the cluster on the master node and it did not start. Also, I could not get into the /etc/pve directory.
systemctl status pve-cluster
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2020-05-22 03:29:54 MSK; 7s ago
Process: 23490 ExecStart = / usr / bin / pmxcfs (code = exited, status = 255 / EXCEPTION)
May 22 03:29:53 VT-SupportMachine1 pmxcfs [23490]: [main] crit: fuse_mount error: Transport endpoint is not connected
May 22 03:29:53 VT-SupportMachine1 systemd [1]: pve-cluster.service: Failed with result 'exit-code'.
May 22 03:29:53 VT-SupportMachine1 pmxcfs [23490]: [main] notice: exit proxmox configuration filesystem (-1)
May 22 03:29:53 VT-SupportMachine1 systemd [1]: Failed to start The Proxmox VE cluster filesystem.
May 22 03:29:53 VT-SupportMachine1 systemd [1]: pve-cluster.service: Service RestartSec = 100ms expired, scheduling restart.
May 22 03:29:54 VT-SupportMachine1 systemd [1]: pve-cluster.service: Scheduled restart job, restart counter is at 5.
May 22 03:29:54 VT-SupportMachine1 systemd [1]: Stopped The Proxmox VE cluster filesystem.
May 22 03:29:54 VT-SupportMachine1 systemd [1]: pve-cluster.service: Start request repeated too quickly.
May 22 03:29:54 VT-SupportMachine1 systemd [1]: pve-cluster.service: Failed with result 'exit-code'.
May 22 03:29:54 VT-SupportMachine1 systemd [1]: Failed to start The Proxmox VE cluster filesystem.
journalctl -xe
May 22 03:31:00 VT-SupportMachine1 systemd[1]: Starting Proxmox VE replication runner...
-- Subject: A start job for unit pvesr.service has begun execution
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit pvesr.service has begun execution.
--
-- The job identifier is 2589688.
May 22 03:31:01 VT-SupportMachine1 pve-ha-lrm[1440]: updating service status from manager failed: Connection refused
May 22 03:31:01 VT-SupportMachine1 pve-firewall[1396]: status update error: Connection refused
May 22 03:31:01 VT-SupportMachine1 pvesr[23535]: ipcc_send_rec[1] failed: Connection refused
May 22 03:31:01 VT-SupportMachine1 pvesr[23535]: ipcc_send_rec[2] failed: Connection refused
May 22 03:31:01 VT-SupportMachine1 pvesr[23535]: ipcc_send_rec[3] failed: Connection refused
May 22 03:31:01 VT-SupportMachine1 pvesr[23535]: Unable to load access control list: Connection refused
May 22 03:31:01 VT-SupportMachine1 systemd[1]: pvesr.service: Main process exited, code=exited, status=111/n/a
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- An ExecStart= process belonging to unit pvesr.service has exited.
--
-- The process' exit code is 'exited' and its exit status is 111.
May 22 03:31:01 VT-SupportMachine1 systemd[1]: pvesr.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- The unit pvesr.service has entered the 'failed' state with result 'exit-code'.
May 22 03:31:01 VT-SupportMachine1 systemd[1]: Failed to start Proxmox VE replication runner.
-- Subject: A start job for unit pvesr.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit pvesr.service has finished with a failure.
--
-- The job identifier is 2589688 and the job result is failed.
May 22 03:31:01 VT-SupportMachine1 cron[1364]: (*system*vzdump) CAN'T OPEN SYMLINK (/etc/cron.d/vzdump)
May 22 03:31:06 VT-SupportMachine1 pve-ha-lrm[1440]: updating service status from manager failed: Connection refused
May 22 03:31:11 VT-SupportMachine1 pve-ha-lrm[1440]: updating service status from manager failed: Connection refused
May 22 03:31:11 VT-SupportMachine1 pve-firewall[1396]: status update error: Connection refused
May 22 03:31:16 VT-SupportMachine1 pve-ha-lrm[1440]: updating service status from manager failed: Connection refused
May 22 03:31:21 VT-SupportMachine1 pve-ha-lrm[1440]: updating service status from manager failed: Connection refused
May 22 03:31:21 VT-SupportMachine1 pve-firewall[1396]: status update error: Connection refused
May 22 03:31:26 VT-SupportMachine1 pve-ha-lrm[1440]: updating service status from manager failed: Connection refused
The master node in
has all the data, but I did not touch the database, but only made a backup/var/lib/pve-cluster/config.db
When I try to view the master node console from another working node, I get an error
Connection failed (Error 500: hostname lookup 'VT-SupportMachine1' failed - failed to get address info for: VT-SupportMachine1: Name or service not known)
How to make the master node work correctly? At the same time, other nodes work correctly (the cluster restarts)