Cluster can't startup after reboot all nodes

urc-ysh

Member
Nov 10, 2021
6
0
6
29
I have a three nodes PVE cluster, and I never tried to shutdown all servers in this cluster. I changed my IDC service provider, so I have to uproot this Friday and I just executed sudo poweroff on all thress server in 5 mins . After uproot, I tried boot these three servers and access from web, it dosen't work. Then I ssh to my three server to check PVE relate service status. The pve -cluster service status shows below.
Code:
ops@pve-180:~$ sudo service pve-cluster status
● pve-cluster.service - The Proxmox VE cluster filesystem
     Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Sun 2022-09-11 20:51:32 CST; 7min ago
        CPU: 9ms

Sep 11 20:51:32 pve-180 systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 5.
Sep 11 20:51:32 pve-180 systemd[1]: Stopped The Proxmox VE cluster filesystem.
Sep 11 20:51:32 pve-180 systemd[1]: pve-cluster.service: Start request repeated too quickly.
Sep 11 20:51:32 pve-180 systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Sep 11 20:51:32 pve-180 systemd[1]: Failed to start The Proxmox VE cluster filesystem.

And my /etc/hosts file likes below:
Code:
127.0.0.1 localhost
127.0.1.1 pve-180

::1             localhost ip6-localhost ip6-loopback
ff02::1         ip6-allnodes
ff02::1         ip6-allrouters

And I found these error logged in /var/log/syslog:
Code:
Sep 11 19:40:29 pve-180 pveproxy[23184]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1891.
Sep 11 19:40:29 pve-180 pveproxy[23185]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1891.
Sep 11 19:40:33 pve-180 ceph-osd[23168]: unable to get monitor info from DNS SRV with service name: ceph-mon
Sep 11 19:40:33 pve-180 ceph-osd[23167]: unable to get monitor info from DNS SRV with service name: ceph-mon
Sep 11 19:40:33 pve-180 ceph-osd[23167]: 2022-09-11T19:40:33.307+0800 7fa30e011f00 -1 failed for service _ceph-mon._tcp
Sep 11 19:40:33 pve-180 ceph-osd[23167]: 2022-09-11T19:40:33.307+0800 7fa30e011f00 -1 monclient: get_monmap_and_config cannot identify monitors to contact
Sep 11 19:40:33 pve-180 ceph-osd[23167]: failed to fetch mon config (--no-mon-config to skip)
Sep 11 19:40:33 pve-180 ceph-osd[23168]: 2022-09-11T19:40:33.307+0800 7fd2477f7f00 -1 failed for service _ceph-mon._tcp
Sep 11 19:40:33 pve-180 ceph-osd[23168]: 2022-09-11T19:40:33.307+0800 7fd2477f7f00 -1 monclient: get_monmap_and_config cannot identify monitors to contact
Sep 11 19:40:33 pve-180 ceph-osd[23168]: failed to fetch mon config (--no-mon-config to skip)
Sep 11 19:40:33 pve-180 systemd[1]: ceph-osd@3.service: Main process exited, code=exited, status=1/FAILURE
Sep 11 19:40:33 pve-180 systemd[1]: ceph-osd@3.service: Failed with result 'exit-code'.
Sep 11 19:40:33 pve-180 systemd[1]: ceph-osd@2.service: Main process exited, code=exited, status=1/FAILURE
Sep 11 19:40:33 pve-180 systemd[1]: ceph-osd@2.service: Failed with result 'exit-code'.
Sep 11 19:40:34 pve-180 pveproxy[23183]: worker exit
Sep 11 19:40:34 pve-180 pveproxy[1968]: worker 23183 finished
Sep 11 19:40:34 pve-180 pveproxy[1968]: starting 1 worker(s)
Sep 11 19:40:34 pve-180 pveproxy[1968]: worker 23192 started
Sep 11 19:40:34 pve-180 pveproxy[23184]: worker exit
Sep 11 19:40:34 pve-180 pveproxy[23192]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1891.
Sep 11 19:40:34 pve-180 pveproxy[23185]: worker exit
Sep 11 19:40:34 pve-180 pveproxy[1968]: worker 23184 finished
Sep 11 19:40:34 pve-180 pveproxy[1968]: starting 1 worker(s)
Sep 11 19:40:34 pve-180 pveproxy[1968]: worker 23193 started
Sep 11 19:40:34 pve-180 pveproxy[1968]: worker 23185 finished
Sep 11 19:40:34 pve-180 pveproxy[1968]: starting 1 worker(s)
Sep 11 19:40:34 pve-180 pveproxy[1968]: worker 23194 started
Sep 11 19:40:34 pve-180 pveproxy[23193]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1891.
Sep 11 19:40:34 pve-180 pveproxy[23194]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1891.
Sep 11 19:40:39 pve-180 pveproxy[23192]: worker exit
Sep 11 19:40:39 pve-180 pveproxy[1968]: worker 23192 finished
Sep 11 19:40:39 pve-180 pveproxy[1968]: starting 1 worker(s)
Sep 11 19:40:39 pve-180 pveproxy[1968]: worker 23195 started
Sep 11 19:40:39 pve-180 pveproxy[23195]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1891.
Sep 11 19:40:39 pve-180 pveproxy[23193]: worker exit
Sep 11 19:40:39 pve-180 pveproxy[23194]: worker exit
Sep 11 19:40:39 pve-180 pveproxy[1968]: worker 23193 finished
Sep 11 19:40:39 pve-180 pveproxy[1968]: starting 1 worker(s)
Sep 11 19:40:39 pve-180 pveproxy[1968]: worker 23196 started
Sep 11 19:40:39 pve-180 pveproxy[1968]: worker 23194 finished
Sep 11 19:40:39 pve-180 pveproxy[1968]: starting 1 worker(s)
Sep 11 19:40:39 pve-180 pveproxy[1968]: worker 23197 started
Sep 11 19:40:39 pve-180 pveproxy[23196]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1891.
Sep 11 19:40:39 pve-180 pveproxy[23197]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1891.
Sep 11 19:40:43 pve-180 systemd[1]: ceph-osd@3.service: Scheduled restart job, restart counter is at 172.
Sep 11 19:40:43 pve-180 systemd[1]: ceph-osd@2.service: Scheduled restart job, restart counter is at 172.
Sep 11 19:40:43 pve-180 systemd[1]: Stopped Ceph object storage daemon osd.2.
Sep 11 19:40:43 pve-180 systemd[1]: Starting Ceph object storage daemon osd.2...
Sep 11 19:40:43 pve-180 systemd[1]: Stopped Ceph object storage daemon osd.3.
Sep 11 19:40:43 pve-180 systemd[1]: Starting Ceph object storage daemon osd.3...
Sep 11 19:40:43 pve-180 systemd[1]: Started Ceph object storage daemon osd.2.
Sep 11 19:40:43 pve-180 systemd[1]: Started Ceph object storage daemon osd.3.
Sep 11 19:40:43 pve-180 ceph-osd[23208]: did not load config file, using default settings.
Sep 11 19:40:43 pve-180 ceph-osd[23207]: did not load config file, using default settings.
Sep 11 19:40:43 pve-180 ceph-osd[23208]: 2022-09-11T19:40:43.535+0800 7ff50d196f00 -1 Errors while parsing config file!
Sep 11 19:40:43 pve-180 ceph-osd[23207]: 2022-09-11T19:40:43.535+0800 7f3a85143f00 -1 Errors while parsing config file!
Sep 11 19:40:43 pve-180 ceph-osd[23207]: 2022-09-11T19:40:43.535+0800 7f3a85143f00 -1 can't open ceph.conf: (2) No such file or directory
Sep 11 19:40:43 pve-180 ceph-osd[23208]: 2022-09-11T19:40:43.535+0800 7ff50d196f00 -1 can't open ceph.conf: (2) No such file or directory
Sep 11 19:40:44 pve-180 pveproxy[23195]: worker exit
Sep 11 19:40:44 pve-180 pveproxy[1968]: worker 23195 finished
 
All of three server's config and error info looks similar.
Can anyone give me some advice? Thankyou!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!