PROXMOX VE: It stopped working

Rysiu · Aug 11, 2022

I have a problem with PROXMOX VE.

The PROXMOX node has stopped working.

I have the following symptoms:

root@nodename:/etc/pve/local# /usr/bin/pmxcfs
[database] crit: found entry with duplicate name (inode = 0000000002EF9C1A, parent = 000000000000000E, name = '107.conf')
[database] crit: DB load failed
[main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/config.db'
[main] notice: exit proxmox configuration filesystem (-1)

and

root@nodename:/etc/pve/local# qm list
ipcc_send_rec[1] failed: Connection refused
ipcc_send_rec[2] failed: Connection refused
ipcc_send_rec[3] failed: Connection refused
Unable to load access control list: Connection refused

root@nodename:/etc/pve# systemctl status pve-cluster.service
● pve-cluster.service - The Proxmox VE cluster filesystem
   Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Thu 2022-08-11 08:43:20 CEST; 6s ago
  Process: 16178 ExecStart=/usr/bin/pmxcfs (code=exited, status=255/EXCEPTION)

Aug 11 08:43:20 nodename systemd[1]: pve-cluster.service: Service RestartSec=100ms expired, scheduling restart.
Aug 11 08:43:20 nodename systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 5.
Aug 11 08:43:20 nodename systemd[1]: Stopped The Proxmox VE cluster filesystem.
Aug 11 08:43:20 nodename systemd[1]: pve-cluster.service: Start request repeated too quickly.
Aug 11 08:43:20 nodename systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Aug 11 08:43:20 nodename systemd[1]: Failed to start The Proxmox VE cluster filesystem.

root@nodename:/etc/pve# journalctl -xe
Aug 11 08:43:22 nodename pveproxy[16179]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1727.
Aug 11 08:43:22 nodename pveproxy[16180]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1727.
Aug 11 08:43:22 nodename pveproxy[16160]: worker exit
Aug 11 08:43:22 nodename pveproxy[1527]: worker 16160 finished
Aug 11 08:43:22 nodename pveproxy[1527]: starting 1 worker(s)
Aug 11 08:43:22 nodename pveproxy[1527]: worker 16181 started
Aug 11 08:43:22 nodename pveproxy[16181]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1727.
Aug 11 08:43:27 nodename pveproxy[16179]: worker exit

The web panel also does not work.
What could be the problem?

aaron · Aug 11, 2022

Something went wrong with the sqlite DB that stores the contents of /etc/pve, therefore, all services that need to access files there won't work properly.

First, make a backup of the database before you attempt to fix it:

Code:

cp /var/lib/pve-cluster/config.db /var/lib/pve-cluster/config.db.bkp

Then start investigating. Looks like there are 2 entries for the 107.conf file.

Code:

sqlite3 /var/lib/pve-cluster/config.db

Then first set a few parameters to make the output easier to read:

Code:

sqlite> .header on
sqlite> .mode line

Last, run the following query and post the output here in [code][/code] tags.

Code:

sqlite> select inode,version,mtime,data from tree where name = "107.conf";

Rysiu · Aug 11, 2022

Select query return:

Code:

sqlite> select inode,version,mtime,data from tree where name = "107.conf";
  inode = 49257498
version = 49257500
  mtime = 1654930792
   data = bootdisk: scsi0
cores: 2
ide2: local:iso/ubuntu-20.04.2-live-server-amd64.iso,media=cdrom
memory: 2048
name: JUMP-000
net0: virtio=4E:FD:9F:51:7B:74,bridge=vmbr0
numa: 0
onboot: 1
ostype: l26
scsi0: local-lvm:vm-107-disk-0,size=32G
scsihw: virtio-scsi-pci
smbios1: uuid=3685e841-fd51-4100-9ead-8f6959f83e71
sockets: 1
vmgenid: c790b03b-7027-4280-a4df-5d1b3e9a1acf


  inode = 49257498
version = 49257500
  mtime = 1654930792
   data = bootdisk: scsi0
cores: 2
ide2: local:iso/ubuntu-20.04.2-live-server-amd64.iso,media=cdrom
memory: 2048
name: JUMP-000
net0: virtio=4E:FD:9F:51:7B:74,bridge=vmbr0
numa: 0
onboot: 1
ostype: l26
scsi0: local-lvm:vm-107-disk-0,size=32G
scsihw: virtio-scsi-pci
smbios1: uuid=3685e841-fd51-4100-9ead-8f6959f83e71
sockets: 1
vmgenid: c790b03b-7027-4280-a4df-5d1b3e9a1acf

What should be implemented next?

aaron · Aug 11, 2022

Unless I am mistaken, this looks exactly the same. In that case, please run the following query.

Code:

sqlite> select * from tree where name = "107.conf";

Rysiu · Aug 11, 2022

New query return:

Code:

sqlite> select * from tree where name = "107.conf";
  inode = 49257498
 parent = 14
version = 49257500
 writer = 0
  mtime = 1654930792
   type = 8
   name = 107.conf
   data = bootdisk: scsi0
cores: 2
ide2: local:iso/ubuntu-20.04.2-live-server-amd64.iso,media=cdrom
memory: 2048
name: JUMP-000
net0: virtio=4E:FD:9F:51:7B:74,bridge=vmbr0
numa: 0
onboot: 1
ostype: l26
scsi0: local-lvm:vm-107-disk-0,size=32G
scsihw: virtio-scsi-pci
smbios1: uuid=3685e841-fd51-4100-9ead-8f6959f83e71
sockets: 1
vmgenid: c790b03b-7027-4280-a4df-5d1b3e9a1acf


  inode = 49257498
 parent = 14
version = 49257500
 writer = 0
  mtime = 1654930792
   type = 8
   name = 107.conf
   data = bootdisk: scsi0
cores: 2
ide2: local:iso/ubuntu-20.04.2-live-server-amd64.iso,media=cdrom
memory: 2048
name: JUMP-000
net0: virtio=4E:FD:9F:51:7B:74,bridge=vmbr0
numa: 0
onboot: 1
ostype: l26
scsi0: local-lvm:vm-107-disk-0,size=32G
scsihw: virtio-scsi-pci
smbios1: uuid=3685e841-fd51-4100-9ead-8f6959f83e71
sockets: 1
vmgenid: c790b03b-7027-4280-a4df-5d1b3e9a1acf

I see that the result is very similar to the previous one.

aaron · Aug 11, 2022

Okay, both entries are exactly the same.
Try running

Code:

delete from tree where inode = "49257498" limit 1;

after that, running the previous query should only return one entry. Please make sure you have a backup before you try to run the delete query.

Rysiu · Aug 11, 2022

Ok. After the changes, I have a different error message:

Code:

root@nodename:~# /usr/bin/pmxcfs
fuse: mountpoint is not empty
fuse: if you are sure this is safe, use the 'nonempty' mount option
[main] crit: fuse_mount error: File exists
[main] notice: exit proxmox configuration filesystem (-1)

aaron · Aug 11, 2022

check what is currently located at /etc/pve and move it somewhere else, once the dir is empty, the pve-cluster service can hopefully start again

Search

Search

PROXMOX VE: It stopped working

Rysiu

New Member

aaron

Proxmox Staff Member

Rysiu

New Member

aaron

Proxmox Staff Member

Rysiu

New Member

aaron

Proxmox Staff Member

Rysiu

New Member

aaron

Proxmox Staff Member

We value your privacy