Deleted Cluster and now I can't access web UI

D

Deleted member 60080

Guest
Hello,

I'm kind of a noob in the proxmox shell. I just got my R710 that was going to upgrade to from my 2950 that I have had for a super long time. I installed Proxmox on my new server and created a cluster on my 2950. I was having issues connecting the two so I decided to scrap the idea and thought it would just be best to create VZdump backup files on an external hard drive. I followed this tutorial (1st reply) to delete the cluster on the 2950 (I never connected the R710 to the 2950) but when I restarted the 2950, I could no longer access the web UI. I went to the shell and typed in pvecm status but I get the error:

Cannot initialize CMAP service

When I try and start any vm using qm start, I get this error:

cluster not ready - no quorum?

When I try pvecm e 1 and then pvecm status again, I get the same cannot initialize CMAP service error.

What do you think I should do? I don't need to restore the server, I just need all of my VMs so I can put them on the new server.
 
The next time please follow our official documentation, the mentioned thread is a bit older (possible outdated, e.g. it mentions cman and sysvinit which are not valid in current Proxmox VE 5.X) and sometimes a bit weird things are suggested. https://pve.proxmox.com/pve-docs/chapter-pvecm.html#pvecm_separate_node_without_reinstall

When I try and start any vm using qm start, I get this error:

cluster not ready - no quorum?

When I try pvecm e 1 and then pvecm status again, I get the same cannot initialize CMAP service error.

those two command are expected to behave like this if there is no cluster configured, as the cannot connect to the cluster communication stack (corosync).

First ensure all required services are up and running:
Code:
systemctl restart pve-cluster pveproxy pvedaemon

If that does not work, or throws error I need a bit more info from you. E.g., as root run:
Code:
systemctl list-units --failed --plain -l
ls -l /etc/corosync/ /etc/pve/corosync.conf
systemctl status pveproxy pve-cluster

and post the output here, preferably in [code] output [/code] tags.
 
When I run systemctl list-units --failed --plain -l, I see that corosync, influxdb, and pvesr have all failed
Code:
UNIT             LOAD   ACTIVE SUB    DESCRIPTION                          

  corosync.service loaded failed failed Corosync Cluster Engine              

  influxdb.service loaded failed failed InfluxDB is an open-source, distributed,

  pvesr.service    loaded failed failed Proxmox VE replication runner
 
OK, that should not matter much for your specific case. Corosync has no config anymore thus it failed, but it's only needed in a cluster which you got rid off.
Did you try the service restarts too and could you please post the rest of the requested outputs.
 
Did you try the service restarts too and could you please post the rest of the requested outputs.
There is no output from the restart command.
Code:
ls -l /etc/corosync/ /etc/pve/corosync.conf
ls: cannot access '/etc/corosync/': No such file or directory

-r--r----- 1 root www-data 362 Jan  9 16:47 /etc/pve/corosync.conf
Code:
systemctl status pveproxy pve-cluster
● pveproxy.service - PVE API Proxy Server

   Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset:

   Active: active (running) since Fri 2019-01-11 08:20:34 PST; 47min ago

  Process: 23323 ExecStop=/usr/bin/pveproxy stop (code=exited, status=0/SUCCESS)

  Process: 23165 ExecReload=/usr/bin/pveproxy restart (code=exited, status=0/SUC

  Process: 23417 ExecStart=/usr/bin/pveproxy start (code=exited, status=0/SUCCES

 Main PID: 23433 (pveproxy)

    Tasks: 4 (limit: 4915)

   Memory: 113.6M

      CPU: 1min 41.919s

   CGroup: /system.slice/pveproxy.service

          ├─23433 pveproxy

          ├─28455 pveproxy worker

          ├─28456 pveproxy worker

          └─28457 pveproxy worker


Jan 11 09:07:57 proxmox pveproxy[23433]: worker 28448 finished

Jan 11 09:07:57 proxmox pveproxy[23433]: starting 1 worker(s)

Jan 11 09:07:57 proxmox pveproxy[23433]: worker 28456 started

Jan 11 09:07:57 proxmox pveproxy[28455]: /etc/pve/local/pve-ssl.key: failed to l

Jan 11 09:07:57 proxmox pveproxy[28456]: /etc/pve/local/pve-ssl.key: failed to l

Jan 11 09:07:57 proxmox pveproxy[28449]: worker exit

Jan 11 09:07:57 proxmox pveproxy[23433]: worker 28449 finished
 
Jan 11 09:07:57 proxmox pveproxy[28455]: /etc/pve/local/pve-ssl.key: failed to l

and there's your issue, the proxy cannot load it's ssl key.

can you try:
Code:
pvecm updatecerts
systemctl restart pveproxy
 
Now I get
Code:
no quorum - unable to update files
 
Oh well, the pmxcfs is in limbo from the cluster separation try. /etc/pve/corosync.conf is still present thus the configuration file systems thinks it's still clustered.
To fix that do:
Code:
systemctl stop pve-cluster
# start in local mode
pmxcfs -l 
rm /etc/pve/corosync.conf 
killall pmxcfs
systemctl stop pve-cluster

then repeat the commands from my previous post.
 
Now only corosync and influxdb are failing
Code:
UNIT             LOAD   ACTIVE SUB    DESCRIPTION                                                  

  corosync.service loaded failed failed Corosync Cluster Engine                                      

  influxdb.service loaded failed failed InfluxDB is an open-source, distributed, time series database
 
Corosync fails because /etc/corosync/corosync.conf does not exist
Code:
● corosync.service - Corosync Cluster Engine

   Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset:

   Active: failed (Result: exit-code) since Wed 2019-01-09 16:51:35 PST; 4 days

Condition: start condition failed at Fri 2019-01-11 12:13:34 PST; 2 days ago

          └─ ConditionPathExists=/etc/corosync/corosync.conf was not met

Influxdb gives no information
Code:
● influxdb.service - InfluxDB is an open-source, distributed, time series database

   Loaded: loaded (/lib/systemd/system/influxdb.service; enabled; vendor preset: enabled)

   Active: failed (Result: exit-code) since Wed 2019-01-09 16:51:33 PST; 4 days ago

    Docs: man:influxd(1)

  Process: 1866 ExecStart=/usr/bin/influxd -config /etc/influxdb/influxdb.conf $INFLUXD_OPTS (code=exited, status=1/FAILURE)

 Main PID: 1866 (code=exited, status=1/FAILURE)

      CPU: 14ms
Also, I don't need to restore this server. I am just wondering if there is any way to get the VMs off of it and on to the new server.
 
Now only corosync and influxdb are failing

As I already said above, corosync is expected to fail, influxdb isn't important for us right now (not direct part of PVE)...
As you do not say what exactly you did and do not post all requested output this is a bit hard to address, I'm afraid...

I assume you fixed pve-cluster as I said above and it is writeable again, so did you rerun the updatecerts and pveproxy commads:

can you try:
Code:
pvecm updatecerts
systemctl restart pveproxy

?
 
Code:
root@proxmox:~# pvecm updatecerts

(re)generate node files

merge authorized SSH keys and known hosts
and systemctl does not give any output

I also noticed that local-lvm does not exist. When I try to start a VM now, I get:
Code:
root@proxmox:~# qm start 100

storage 'local-lvm' does not exists
 
Oh well, the pmxcfs is in limbo from the cluster separation try. /etc/pve/corosync.conf is still present thus the configuration file systems thinks it's still clustered.
To fix that do:
Code:
systemctl stop pve-cluster
# start in local mode
pmxcfs -l
rm /etc/pve/corosync.conf
killall pmxcfs
systemctl stop pve-cluster

then repeat the commands from my previous post.
Same problem, I have two node . I fixed the problem on node 2, but same code on nodo1, it shows
Code:
pmxcfs -l
fuse: mountpoint is not empty
fuse: if you are sure this is safe, use the 'nonempty' mount option
[main] crit: fuse_mount error: File exists
[main] notice: exit proxmox configuration filesystem (-1)

I tried
Code:
 systemctl list-units --failed --plain -l
  UNIT                LOAD   ACTIVE SUB    DESCRIPTION
  corosync.service    loaded failed failed Corosync Cluster Engine
  pve-cluster.service loaded failed failed The Proxmox VE cluster filesystem
  pvesr.service       loaded failed failed Proxmox VE replication runner

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

3 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.

ls -l /etc/corosync/ /etc/pve/corosync.conf
ls: cannot access '/etc/pve/corosync.conf': No such file or directory
/etc/corosync/:
total 12
-r-------- 1 root root  256 Dec 22 09:48 authkey
-rw-r--r-- 1 root root  531 Dec 22 10:37 corosync.conf
drwxr-xr-x 2 root root 4096 Jun 25 11:02 uidgid.d


systemctl status pveproxy pve-cluster
● pveproxy.service - PVE API Proxy Server
   Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2020-12-22 13:19:58 EST; 3h 0min ago
  Process: 12507 ExecStartPre=/usr/bin/pvecm updatecerts --silent (code=exited, status=111)
  Process: 12508 ExecStart=/usr/bin/pveproxy start (code=exited, status=0/SUCCESS)
 Main PID: 12509 (pveproxy)
    Tasks: 4 (limit: 7372)
   Memory: 142.0M
   CGroup: /system.slice/pveproxy.service
           ├─12509 pveproxy
           ├─25780 pveproxy worker
           ├─25781 pveproxy worker
           └─25784 pveproxy worker

Dec 22 16:20:45 no1 pveproxy[25780]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm
Dec 22 16:20:46 no1 pveproxy[25778]: worker exit
Dec 22 16:20:46 no1 pveproxy[25779]: worker exit
Dec 22 16:20:46 no1 pveproxy[12509]: worker 25779 finished
Dec 22 16:20:46 no1 pveproxy[12509]: worker 25778 finished
Dec 22 16:20:46 no1 pveproxy[12509]: starting 2 worker(s)
Dec 22 16:20:46 no1 pveproxy[12509]: worker 25781 started
Dec 22 16:20:46 no1 pveproxy[12509]: worker 25784 started
Dec 22 16:20:46 no1 pveproxy[25781]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm
Dec 22 16:20:46 no1 pveproxy[25784]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm

● pve-cluster.service - The Proxmox VE cluster filesystem
   Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Tue 2020-12-22 13:19:55 EST; 3h 0min ago
  Process: 12494 ExecStart=/usr/bin/pmxcfs (code=exited, status=255/EXCEPTION)

Dec 22 13:19:55 no1 systemd[1]: pve-cluster.service: Service RestartSec=100ms expired, scheduling restart.
Dec 22 13:19:55 no1 systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 5.
Dec 22 13:19:55 no1 systemd[1]: Stopped The Proxmox VE cluster filesystem.
Dec 22 13:19:55 no1 systemd[1]: pve-cluster.service: Start request repeated too quickly.
Dec 22 13:19:55 no1 systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Dec 22 13:19:55 no1 systemd[1]: Failed to start The Proxmox VE cluster filesystem.

How can I fix this problem?

Thanks
 
pmxcfs -l fuse: mountpoint is not empty
Looks like the /etc/pve directory is not empty.

Make sure that the pmxcfs process is not running. ps aux | grep pmxcfs. There should only be one line output showing the grep command, but nothing with /usr/bin/pmxcfs

If it is not running already, check what's inside the /etc/pve directory and (re)move it if you know you don't need it.
 
Looks like the /etc/pve directory is not empty.

Make sure that the pmxcfs process is not running. ps aux | grep pmxcfs. There should only be one line output showing the grep command, but nothing with /usr/bin/pmxcfs

If it is not running already, check what's inside the /etc/pve directory and (re)move it if you know you don't need it.
Thanks, it works.

But I tried
Code:
pvecm updatecerts
ipcc_send_rec[1] failed: Connection refused
ipcc_send_rec[2] failed: Connection refused
ipcc_send_rec[3] failed: Connection refused
Unable to load access control list: Connection refused

It shows refused.

When I access to webui, I found I can control all the VMs .(The VMs are still running), but the VM icon is question mark.


Thanks
 

Attachments

  • 100.png
    100.png
    59 KB · Views: 8
  • no1.png
    no1.png
    105.6 KB · Views: 8
Last edited:
Check if the necessary services are running. You can do so via the GUI: <Node> -> System. On a single node it is okay if the corosync service is not running. All others should be running.

I assume that the pvestatd service is either not running or having some other problem which should be visible in the logs.
 
I have reactive the pvestatd, now all the vms are coming back.
Check if the necessary services are running. You can do so via the GUI: <Node> -> System. On a single node it is okay if the corosync service is not running. All others should be running.

I assume that the pvestatd service is either not running or having some other problem which should be visible in the logs.
I restart the pvestatd service, all the VMs are coming back.

Thanks
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!