Proxmox 6.0 : Can't start my 3 nodes cluster

NoPseudo

Member
Sep 10, 2019
3
0
6
28
Hello guys,

I'm kind of new here in the proxmox forum, but i'm using Proxmox for a while now so excuse me any kind of mistake about doubling this thread or anythings.

Few days ago i've just setup a 3 nodes cluster running Proxmox 6.0, and it run well until i shut him down (properly), for power maintenance in my home.

This is a personnal lab, not a production one by the way. I could reinstall all nodes on the go if needed but i want to know why this happen before.

Nodes Spec :

(pve-01) Dell PowerEdge R10 1 : 2 Sockets (24CPUs) / 96Gb RAM / 600Gb RAID 0 OS & 1.2To RAID 5 DATA
(pve-02) Dell PowerEdge R10 2 : 2 Sockets (24CPUs) / 96Gb RAM / 600Gb RAID 0 OS & 1.2To RAID 5 DATA
(pve-03) Dell PowerEdge R10 3 : 2 Sockets (24CPUs) / 96Gb RAM / 600Gb RAID 0 OS & 1.2To RAID 5 DATA

(NAS : 6To for VM Backup)


For the installation part, that's pretty simple, just followed as always thoses installation steps and all goes well. Configured all i need to run my entire cluster, like network, storage cluster etc...

I restored all my vm's from my NAS and they were running great..

Around 23:00 (Europe/Paris) i shut down every node (from "Shutdown" GUI button in the top of the window), from pve-03 to pve-02, running only one node for few minutes them running down the third one too.

So my initial question is, if any of you already got this kind of problem, pve-cluster not starting or something like that, i'm curious to know what kind of solution do you find.

PS : If neede, can send some logs
 
hi,

So my initial question is, if any of you already got this kind of problem, pve-cluster not starting or something like that, i'm curious to know what kind of solution do you find.

it's hard to tell without more info (but it shouldn't happen). could be some misconfiguration or similar.

please send:
* pveversion -v
* journalctl -xe
* systemctl status pve-cluster
* systemctl status corosync
* systemctl status pvedaemon
* systemctl status pveproxy

and other relevant logs/journals
 
Hi,

Here's logs you asked :

pveversion -v

Code:
proxmox-ve: 6.0-2 (running kernel: 5.0.21-1-pve)
pve-manager: 6.0-7 (running version: 6.0-7/28984024)
pve-kernel-5.0: 6.0-7
pve-kernel-helper: 6.0-7
pve-kernel-5.0.21-1-pve: 5.0.21-2
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.11-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-4
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-8
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-64
lxcfs: 3.0.3-pve60
novnc-pve: 1.0.0-60
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-7
pve-cluster: 6.0-7
pve-container: 3.0-7
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.0-5
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-7
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve2

jourcnalctl -xe

Code:
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- An ExecStart= process belonging to unit pvesr.service has exited.
--
-- The process' exit code is 'exited' and its exit status is 111.
Sep 10 17:16:01 pve-01 systemd[1]: pvesr.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- The unit pvesr.service has entered the 'failed' state with result 'exit-code'.
Sep 10 17:16:01 pve-01 systemd[1]: Failed to start Proxmox VE replication runner.
-- Subject: A start job for unit pvesr.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit pvesr.service has finished with a failure.
--
-- The job identifier is 386558 and the job result is failed.
Sep 10 17:17:00 pve-01 systemd[1]: Starting Proxmox VE replication runner...
-- Subject: A start job for unit pvesr.service has begun execution
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit pvesr.service has begun execution.
--
-- The job identifier is 386630.
Sep 10 17:17:01 pve-01 cron[1792]: (*system*vzdump) CAN'T OPEN SYMLINK (/etc/cron.d/vzdump)
Sep 10 17:17:01 pve-01 CRON[15209]: pam_unix(cron:session): session opened for user root by (uid=0)
Sep 10 17:17:01 pve-01 CRON[15210]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Sep 10 17:17:01 pve-01 CRON[15209]: pam_unix(cron:session): session closed for user root
Sep 10 17:17:01 pve-01 pvesr[15202]: ipcc_send_rec[1] failed: Connection refused
Sep 10 17:17:01 pve-01 pvesr[15202]: ipcc_send_rec[2] failed: Connection refused
Sep 10 17:17:01 pve-01 pvesr[15202]: ipcc_send_rec[3] failed: Connection refused
Sep 10 17:17:01 pve-01 pvesr[15202]: Unable to load access control list: Connection refused
Sep 10 17:17:01 pve-01 systemd[1]: pvesr.service: Main process exited, code=exited, status=111/n/a
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- An ExecStart= process belonging to unit pvesr.service has exited.
--
-- The process' exit code is 'exited' and its exit status is 111.
Sep 10 17:17:01 pve-01 systemd[1]: pvesr.service: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- The unit pvesr.service has entered the 'failed' state with result 'exit-code'.
Sep 10 17:17:01 pve-01 systemd[1]: Failed to start Proxmox VE replication runner.
-- Subject: A start job for unit pvesr.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit pvesr.service has finished with a failure.
--
-- The job identifier is 386630 and the job result is failed.

And all systemd command :

Code:
root@pve-01:~# systemctl status pve-cluster
● pve-cluster.service - The Proxmox VE cluster filesystem
   Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Tue 2019-09-10 16:59:01 CEST; 20min ago
  Process: 14873 ExecStart=/usr/bin/pmxcfs (code=exited, status=255/EXCEPTION)

Sep 10 16:59:04 pve-01 systemd[1]: Failed to start The Proxmox VE cluster filesystem.
Sep 10 16:59:06 pve-01 systemd[1]: pve-cluster.service: Start request repeated too quickly.
Sep 10 16:59:06 pve-01 systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Sep 10 16:59:06 pve-01 systemd[1]: Failed to start The Proxmox VE cluster filesystem.
Sep 10 16:59:08 pve-01 systemd[1]: pve-cluster.service: Start request repeated too quickly.
Sep 10 16:59:08 pve-01 systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Sep 10 16:59:08 pve-01 systemd[1]: Failed to start The Proxmox VE cluster filesystem.
Sep 10 16:59:10 pve-01 systemd[1]: pve-cluster.service: Start request repeated too quickly.
Sep 10 16:59:10 pve-01 systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Sep 10 16:59:10 pve-01 systemd[1]: Failed to start The Proxmox VE cluster filesystem.
root@pve-01:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2019-09-10 17:19:19 CEST; 16s ago
     Docs: man:corosync
           man:corosync.conf
           man:corosync_overview
Main PID: 15327 (corosync)
    Tasks: 9 (limit: 4915)
   Memory: 161.7M
   CGroup: /system.slice/corosync.service
           └─15327 /usr/sbin/corosync -f

Sep 10 17:19:19 pve-01 corosync[15327]:   [KNET  ] host: host: 2 has no active links
Sep 10 17:19:19 pve-01 corosync[15327]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Sep 10 17:19:19 pve-01 corosync[15327]:   [KNET  ] host: host: 3 has no active links
Sep 10 17:19:19 pve-01 corosync[15327]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Sep 10 17:19:19 pve-01 corosync[15327]:   [KNET  ] host: host: 3 has no active links
Sep 10 17:19:19 pve-01 corosync[15327]:   [CPG   ] downlist left_list: 0 received
Sep 10 17:19:19 pve-01 corosync[15327]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Sep 10 17:19:19 pve-01 corosync[15327]:   [QUORUM] Members[1]: 1
Sep 10 17:19:19 pve-01 corosync[15327]:   [MAIN  ] Completed service synchronization, ready to provide service.
Sep 10 17:19:19 pve-01 corosync[15327]:   [KNET  ] host: host: 3 has no active links
root@pve-01:~# systemctl status pvedaemon
● pvedaemon.service - PVE API Daemon
   Loaded: loaded (/lib/systemd/system/pvedaemon.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2019-09-10 13:10:36 CEST; 4h 9min ago
  Process: 1856 ExecStart=/usr/bin/pvedaemon start (code=exited, status=0/SUCCESS)
Main PID: 1866 (pvedaemon)
    Tasks: 4 (limit: 4915)
   Memory: 125.8M
   CGroup: /system.slice/pvedaemon.service
           ├─1866 pvedaemon
           ├─1867 pvedaemon worker
           ├─1868 pvedaemon worker
           └─1869 pvedaemon worker

Sep 10 13:10:34 pve-01 systemd[1]: Starting PVE API Daemon...
Sep 10 13:10:36 pve-01 pvedaemon[1866]: starting server
Sep 10 13:10:36 pve-01 pvedaemon[1866]: starting 3 worker(s)
Sep 10 13:10:36 pve-01 pvedaemon[1866]: worker 1867 started
Sep 10 13:10:36 pve-01 pvedaemon[1866]: worker 1868 started
Sep 10 13:10:36 pve-01 pvedaemon[1866]: worker 1869 started
Sep 10 13:10:36 pve-01 systemd[1]: Started PVE API Daemon.
root@pve-01:~# systemctl status pveproxy
● pveproxy.service - PVE API Proxy Server
   Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Tue 2019-09-10 16:59:10 CEST; 20min ago
  Process: 14877 ExecStart=/usr/bin/pveproxy start (code=exited, status=255/EXCEPTION)

Sep 10 16:59:10 pve-01 systemd[1]: pveproxy.service: Service RestartSec=100ms expired, scheduling restart.
Sep 10 16:59:10 pve-01 systemd[1]: pveproxy.service: Scheduled restart job, restart counter is at 2240.
Sep 10 16:59:10 pve-01 systemd[1]: Stopped PVE API Proxy Server.
Sep 10 16:59:10 pve-01 systemd[1]: pveproxy.service: Start request repeated too quickly.
Sep 10 16:59:10 pve-01 systemd[1]: pveproxy.service: Failed with result 'exit-code'.
Sep 10 16:59:10 pve-01 systemd[1]: Failed to start PVE API Proxy Server.

PS : Only one was up when i execute those commands, i'll start the others now
 
pveproxy service isn't running

how about journalctl -u 'pve*' -e
that should tell us more.
 
The output of the command :

Code:
Sep 10 17:41:01 pve-01 systemd[1]: pvesr.service: Failed with result 'exit-code'.
Sep 10 17:41:01 pve-01 systemd[1]: Failed to start Proxmox VE replication runner.
Sep 10 17:42:00 pve-01 systemd[1]: Starting Proxmox VE replication runner...
Sep 10 17:42:01 pve-01 pvesr[15760]: ipcc_send_rec[1] failed: Connection refused
Sep 10 17:42:01 pve-01 pvesr[15760]: ipcc_send_rec[2] failed: Connection refused
Sep 10 17:42:01 pve-01 pvesr[15760]: ipcc_send_rec[3] failed: Connection refused
Sep 10 17:42:01 pve-01 pvesr[15760]: Unable to load access control list: Connection refused
Sep 10 17:42:01 pve-01 systemd[1]: pvesr.service: Main process exited, code=exited, status=111/n/a
Sep 10 17:42:01 pve-01 systemd[1]: pvesr.service: Failed with result 'exit-code'.
Sep 10 17:42:01 pve-01 systemd[1]: Failed to start Proxmox VE replication runner.
Sep 10 17:43:00 pve-01 systemd[1]: Starting Proxmox VE replication runner...
Sep 10 17:43:01 pve-01 pvesr[15773]: ipcc_send_rec[1] failed: Connection refused
Sep 10 17:43:01 pve-01 pvesr[15773]: ipcc_send_rec[2] failed: Connection refused
Sep 10 17:43:01 pve-01 pvesr[15773]: ipcc_send_rec[3] failed: Connection refused
Sep 10 17:43:01 pve-01 pvesr[15773]: Unable to load access control list: Connection refused
Sep 10 17:43:01 pve-01 systemd[1]: pvesr.service: Main process exited, code=exited, status=111/n/a
Sep 10 17:43:01 pve-01 systemd[1]: pvesr.service: Failed with result 'exit-code'.
Sep 10 17:43:01 pve-01 systemd[1]: Failed to start Proxmox VE replication runner.
Sep 10 17:44:00 pve-01 systemd[1]: Starting Proxmox VE replication runner...
Sep 10 17:44:01 pve-01 pvesr[15786]: ipcc_send_rec[1] failed: Connection refused
Sep 10 17:44:01 pve-01 pvesr[15786]: ipcc_send_rec[2] failed: Connection refused
Sep 10 17:44:01 pve-01 pvesr[15786]: ipcc_send_rec[3] failed: Connection refused
Sep 10 17:44:01 pve-01 pvesr[15786]: Unable to load access control list: Connection refused
Sep 10 17:44:01 pve-01 systemd[1]: pvesr.service: Main process exited, code=exited, status=111/n/a
Sep 10 17:44:01 pve-01 systemd[1]: pvesr.service: Failed with result 'exit-code'.
Sep 10 17:44:01 pve-01 systemd[1]: Failed to start Proxmox VE replication runner.
Sep 10 17:45:00 pve-01 systemd[1]: Starting Proxmox VE replication runner...
Sep 10 17:45:01 pve-01 pvesr[15800]: ipcc_send_rec[1] failed: Connection refused
Sep 10 17:45:01 pve-01 pvesr[15800]: ipcc_send_rec[2] failed: Connection refused
Sep 10 17:45:01 pve-01 pvesr[15800]: ipcc_send_rec[3] failed: Connection refused
Sep 10 17:45:01 pve-01 pvesr[15800]: Unable to load access control list: Connection refused
Sep 10 17:45:01 pve-01 systemd[1]: pvesr.service: Main process exited, code=exited, status=111/n/a
Sep 10 17:45:01 pve-01 systemd[1]: pvesr.service: Failed with result 'exit-code'.
Sep 10 17:45:01 pve-01 systemd[1]: Failed to start Proxmox VE replication runner.
Sep 10 17:46:00 pve-01 systemd[1]: Starting Proxmox VE replication runner...
Sep 10 17:46:01 pve-01 pvesr[15817]: ipcc_send_rec[1] failed: Connection refused
Sep 10 17:46:01 pve-01 pvesr[15817]: ipcc_send_rec[2] failed: Connection refused
Sep 10 17:46:01 pve-01 pvesr[15817]: ipcc_send_rec[3] failed: Connection refused
Sep 10 17:46:01 pve-01 pvesr[15817]: Unable to load access control list: Connection refused
Sep 10 17:46:01 pve-01 systemd[1]: pvesr.service: Main process exited, code=exited, status=111/n/a
Sep 10 17:46:01 pve-01 systemd[1]: pvesr.service: Failed with result 'exit-code'.
Sep 10 17:46:01 pve-01 systemd[1]: Failed to start Proxmox VE replication runner.
Sep 10 17:47:00 pve-01 systemd[1]: Starting Proxmox VE replication runner...
Sep 10 17:47:01 pve-01 pvesr[15830]: ipcc_send_rec[1] failed: Connection refused
Sep 10 17:47:01 pve-01 pvesr[15830]: ipcc_send_rec[2] failed: Connection refused
Sep 10 17:47:01 pve-01 pvesr[15830]: ipcc_send_rec[3] failed: Connection refused
Sep 10 17:47:01 pve-01 pvesr[15830]: Unable to load access control list: Connection refused
Sep 10 17:47:01 pve-01 systemd[1]: pvesr.service: Main process exited, code=exited, status=111/n/a
Sep 10 17:47:01 pve-01 systemd[1]: pvesr.service: Failed with result 'exit-code'.
Sep 10 17:47:01 pve-01 systemd[1]: Failed to start Proxmox VE replication runner.
Sep 10 17:48:00 pve-01 systemd[1]: Starting Proxmox VE replication runner...
Sep 10 17:48:01 pve-01 pvesr[15846]: ipcc_send_rec[1] failed: Connection refused
Sep 10 17:48:01 pve-01 pvesr[15846]: ipcc_send_rec[2] failed: Connection refused
Sep 10 17:48:01 pve-01 pvesr[15846]: ipcc_send_rec[3] failed: Connection refused
Sep 10 17:48:01 pve-01 pvesr[15846]: Unable to load access control list: Connection refused
Sep 10 17:48:01 pve-01 systemd[1]: pvesr.service: Main process exited, code=exited, status=111/n/a
Sep 10 17:48:01 pve-01 systemd[1]: pvesr.service: Failed with result 'exit-code'.
Sep 10 17:48:01 pve-01 systemd[1]: Failed to start Proxmox VE replication runner.
 
is this really the entire output? you should be seeing a lot more. can you post the full output?

edit:

pipe the output to a file like this:

Code:
journalctl -u 'pve*' -e > journal.txt

and copy/attach the journal.txt file
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!