Error during host rename

Airw0lf

Member
Apr 11, 2021
62
3
13
60
it-visibility.net
Earlier this evening I tried renaming 2 standalone hosts.
One worked out as expected - the other did not.
Meaning the folder /etc/pve was empty - not sure if any other damage was done.

I compared the 2 hosts and was able to re-create some bits and pieces:
- Recreated the folder structure and logical links
- Restored the folder ./nodes with the VM-configs from a backup
- Recreated the storage.cfg
- Recreated the firewall config

The content of the folder is now:
Code:
root@vigilant:/etc/pve# ls -l
total 32
-rw-r--r-- 1 root root   44 Feb 22 22:51 datacenter.cfg
drwxr-xr-x 2 root root 4096 Feb 22 23:30 firewall
lrwxrwxrwx 1 root root   14 Feb 22 22:54 local -> nodes/vigilant
lrwxrwxrwx 1 root root   18 Feb 22 22:57 lxc -> nodes/vigilant/lxc
drwxr-xr-x 3 root root 4096 Feb 22 22:38 nodes
lrwxrwxrwx 1 root root   21 Feb 22 22:58 openvz -> nodes/vigilant/openvz
drwxr-xr-x 2 root root 4096 Feb 22 22:52 priv
lrwxrwxrwx 1 root root   26 Feb 22 22:59 qemu-server -> nodes/vigilant/qemu-server
drwxr-xr-x 2 root root 4096 Feb 22 23:32 sdn
-rw-r--r-- 1 root root  762 Feb 22 22:35 storage.cfg
-rw-r--r-- 1 root root  107 Feb 22 22:31 user.cfg
drwxr-xr-x 2 root root 4096 Feb 22 23:14 virtual-guest
r

The .nodes folder now:
Code:
root@vigilant:/etc/pve# ls -l nodes/vigilant
total 32
-rw-r----- 1 root root   34 Feb 22 22:40 host.fw
-rw-r----- 1 root root   83 Feb 22 22:40 lrm_status
drwxr-xr-x 2 root root 4096 Feb 22 22:40 lxc
drwxr-xr-x 2 root root 4096 Feb 22 22:40 openvz
drwx------ 2 root root 4096 Feb 22 22:40 priv
-rw-r----- 1 root root 1679 Feb 22 22:40 pve-ssl.key
-rw-r----- 1 root root 1692 Feb 22 22:40 pve-ssl.pem
drwxr-xr-x 2 root root 4096 Feb 22 22:40 qemu-server
root@vigilant:/etc/pve# ls -l ./nodes
total 4
drwxr-xr-x 6 root root 4096 Feb 22 22:40 vigilant


The host is now failing with lots of errors in the syslog.

Lots of these:
Code:
Feb 23 00:18:00 vigilant pveproxy[3177]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key)>
Feb 23 00:18:00 vigilant pveproxy[3178]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key)>

Some of these:
Feb 23 00:18:01 vigilant cron[964]: (*system*vzdump) CAN'T OPEN SYMLINK (/etc/cron.d/vzdump)

And before those there is this:
Code:
Feb 22 23:34:18 vigilant pmxcfs[1163]: [database] crit: found entry with duplicate name 'lxc' - A:(inode = 0x0000000000>
Feb 22 23:34:18 vigilant pmxcfs[1163]: [database] crit: found entry with duplicate name 'lxc' - A:(inode = 0x0000000000>
Feb 22 23:34:18 vigilant pmxcfs[1163]: [database] crit: DB load failed
Feb 22 23:34:18 vigilant pmxcfs[1163]: [main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/c>
Feb 22 23:34:18 vigilant pmxcfs[1163]: [main] notice: exit proxmox configuration filesystem (-1)
Feb 22 23:34:18 vigilant pmxcfs[1163]: [database] crit: DB load failed
Feb 22 23:34:18 vigilant pmxcfs[1163]: [main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/c>
Feb 22 23:34:18 vigilant pmxcfs[1163]: [main] notice: exit proxmox configuration filesystem (-1)
Feb 22 23:34:18 vigilant systemd[1]: pve-cluster.service: Control process exited, code=exited, status=255/EXCEPTION
Feb 22 23:34:18 vigilant systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Feb 22 23:34:18 vigilant systemd[1]: Failed to start The Proxmox VE cluster filesystem.
Feb 22 23:34:18 vigilant systemd[1]: Condition check resulted in Corosync Cluster Engine being skipped.
Feb 22 23:34:18 vigilant pve-firewall[975]: ipcc_send_rec[1] failed: Connection refused
Feb 22 23:34:18 vigilant pve-firewall[975]: ipcc_send_rec[2] failed: Connection refused
Feb 22 23:34:18 vigilant pve-firewall[975]: ipcc_send_rec[3] failed: Connection refused
Feb 22 23:34:18 vigilant pve-firewall[975]: Unable to load access control list: Connection refused
Feb 22 23:34:18 vigilant pve-firewall[975]: ipcc_send_rec[1] failed: Connection refused
Feb 22 23:34:18 vigilant pve-firewall[975]: ipcc_send_rec[2] failed: Connection refused
Feb 22 23:34:18 vigilant pve-firewall[975]: ipcc_send_rec[3] failed: Connection refused
Feb 22 23:34:18 vigilant systemd[1]: pve-firewall.service: Control process exited, code=exited, status=111/n/a
Feb 22 23:34:18 vigilant systemd[1]: pve-firewall.service: Failed with result 'exit-code'.
Feb 22 23:34:18 vigilant systemd[1]: Failed to start Proxmox VE firewall.

I tried to recreate the certs with the following command and results:
Code:
root@vigilant:/etc/pve# pvecm updatecerts --force
ipcc_send_rec[1] failed: Connection refused
ipcc_send_rec[2] failed: Connection refused
ipcc_send_rec[3] failed: Connection refused
Unable to load access control list: Connection refused

The system name is vigilant. The content of the files /etc/hosts and /etc/hostname is:
Code:
root@vigilant:/etc/pve# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
192.168.139.251 vigilant.itv.lan vigilant

root@vigilant:/etc/pve# cat /etc/hostname
vigilant

I searhc around the from in an attempt fixing this.
But after several hours and numerous attempts still no noticable improvements.

Any suggestions?
 
hi,

Did you follow our wiki guide [0] for rename the PVE?

What says this command: hostname -f?


[0] https://pve.proxmox.com/wiki/Renaming_a_PVE_node

Yes - I followed these instructions.

And after that, I did some renaming/moving of files and folders in /etc/pve/nodes.
Because of the new nodes entry after following the instructions.
Which worked fine on one of the two (standalone!) nodes - but not on the other.

I'm also able to read the (sqlite?) database /var/lib/pve-cluster/config.db.
However, the /var/log/syslog says this (assuming the phrase [database] is referring to this file):
Code:
Feb 22 23:34:18 vigilant pmxcfs[1018]: [database] crit: found entry with duplicate name 'lxc' - A:(inode = 0x0000000000>
Feb 22 23:34:18 vigilant pmxcfs[1018]: [database] crit: found entry with duplicate name 'lxc' - A:(inode = 0x0000000000>
Feb 22 23:34:18 vigilant pmxcfs[1018]: [database] crit: DB load failed

And hostname -f gives me:
Code:
root@vigilant:/home/will# hostname -f
vigilant.itv.lan

=====

EDIT 9:35 AM:
Just realized that I didn't do the cleanup-part mentioned on the end => did now do this.
However there is no difference in the outcome compared to the results above.
 
Last edited:
Thank you for the output!

Do you have two folders in the /etc/pve/nodes/ path, or do you see two nodes in the PVE GUI? (ls -la /etc/pve/nodes/)
 
  • Like
Reactions: Airw0lf
Thank you for the output!

Do you have two folders in the /etc/pve/nodes/ path, or do you see two nodes in the PVE GUI? (ls -la /etc/pve/nodes/)

There is no pve gui - I guess because it can not load the database and/or the certificates/keys; both error conditions are mentioned in the file called /var/log/syslog.

I have one folder in /etc/pve/nodes - it called vigilant.
Code:
root@vigilant:/home/will# ls -la /etc/pve/nodes
total 12
drwxr-xr-x 3 root root 4096 Feb 22 22:38 .
drwxr-x--- 7 root root 4096 Feb 23 09:03 ..
drwxr-xr-x 6 root root 4096 Feb 22 22:40 vigilant

The content of that folder (and its subfolders) is:
Code:
root@vigilant:/home/will# ls -R -l /etc/pve/nodes/vigilant
/etc/pve/nodes/vigilant:
total 32
-rw-r----- 1 root root   34 Feb 22 22:40 host.fw
-rw-r----- 1 root root   83 Feb 22 22:40 lrm_status
drwxr-xr-x 2 root root 4096 Feb 22 22:40 lxc
drwxr-xr-x 2 root root 4096 Feb 22 22:40 openvz
drwx------ 2 root root 4096 Feb 22 22:40 priv
-rw-r----- 1 root root 1679 Feb 22 22:40 pve-ssl.key
-rw-r----- 1 root root 1692 Feb 22 22:40 pve-ssl.pem
drwxr-xr-x 2 root root 4096 Feb 22 22:40 qemu-server

/etc/pve/nodes/vigilant/lxc:
total 4
-rw-r----- 1 root root 319 Feb 22 22:40 103.conf

/etc/pve/nodes/vigilant/openvz:
total 0

/etc/pve/nodes/vigilant/priv:
total 0

/etc/pve/nodes/vigilant/qemu-server:
total 24
-rw-r----- 1 root root 565 Feb 22 22:40 100.conf
-rw-r----- 1 root root 490 Feb 22 22:40 101.conf
-rw-r----- 1 root root 380 Feb 22 22:40 102.conf
-rw-r----- 1 root root 480 Feb 22 22:40 104.conf
-rw-r----- 1 root root 409 Feb 22 22:40 105.conf
-rw-r----- 1 root root 427 Feb 22 22:40 106.conf
 
Last edited:
Hi again,

May also provide us with the output of ls -l /etc/pve/local/pve-ssl* command. I have also a question, did you reboot the server after the node was renamed?
 
  • Like
Reactions: Airw0lf
Hi again,

May also provide us with the output of ls -l /etc/pve/local/pve-ssl* command. I have also a question, did you reboot the server after the node was renamed?
Hi @Moayad ,

First of all: thank you very much for your quick and to the point responses - really appreciated!

Yes - the server was rebooted with shutdown (i.e. powercycle) - more then once actually.
Also after doing the cleanup part as mentioned in the article.

Requested output:
Code:
root@vigilant:/home/will# ls -l /etc/pve/local/pve-ssl*
-rw-r----- 1 root root 1679 Feb 22 22:40 /etc/pve/local/pve-ssl.key
-rw-r----- 1 root root 1692 Feb 22 22:40 /etc/pve/local/pve-ssl.pem


Cheers - Will
 
Last edited:
that looks very wrong! can you please post the output of mount | grep /etc/pve and systemctl status pve-cluster?
 
mount | grep /etc/pve and sy
that looks very wrong! can you please post the output of mount | grep /etc/pve and systemctl status pve-cluster?

I'm aware of that as there is no webui and mounts - see below:

Code:
root@vigilant:/home/will# mount | grep /etc/pve

root@vigilant:/home/will# systemctl status pve-cluster
● pve-cluster.service - The Proxmox VE cluster filesystem
     Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Thu 2023-02-23 13:31:33 CET; 1h 11min ago
    Process: 1343 ExecStart=/usr/bin/pmxcfs (code=exited, status=255/EXCEPTION)
        CPU: 8ms

Feb 23 13:31:33 vigilant systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 5.
Feb 23 13:31:33 vigilant systemd[1]: Stopped The Proxmox VE cluster filesystem.
Feb 23 13:31:33 vigilant systemd[1]: pve-cluster.service: Start request repeated too quickly.
Feb 23 13:31:33 vigilant systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Feb 23 13:31:33 vigilant systemd[1]: Failed to start The Proxmox VE cluster filesystem.

The one that is also renamed and working as expected has:
Code:
root@the-neb:/home/will# mount | grep /etc/pve
/dev/fuse on /etc/pve type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other)

root@the-neb:/home/will# systemctl status pve-cluster
● pve-cluster.service - The Proxmox VE cluster filesystem
     Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2023-02-22 22:14:19 CET; 16h ago
    Process: 1292 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
   Main PID: 1296 (pmxcfs)
      Tasks: 7 (limit: 77034)
     Memory: 40.0M
        CPU: 40.123s
     CGroup: /system.slice/pve-cluster.service
             └─1296 /usr/bin/pmxcfs

Feb 22 22:14:18 the-neb systemd[1]: Starting The Proxmox VE cluster filesystem...
Feb 22 22:14:19 the-neb systemd[1]: Started The Proxmox VE cluster filesystem.
 
Last edited:
ah, you already mentioned in your first post that /etc/pve "is empty", I missed that.

so, you somehow corrupted the DB (did you manually attempt to edit it?).. if you have a backup of /etc/pve, I'd suggest the following

- move /var/lib/pve-cluster/* out of the way, e.g. mkdir /var/lib/pve-cluster/broken; mv /var/lib/pve-cluster/* /var/lib/pve-cluster/broken/
- move /etc/pve with the wrong content out of the way, e.g. mv /etc/pve /etc/pve-broken; mkdir /etc/pve
- now, attempt to (re)-start pve-cluster: systemctl restart pve-cluster
- if /etc/pve is still empty, check and post journalctl --since "-15min" -u pve-cluster
- if /etc/pve is now filled with content again, restore individual config files from your backup, and then reboot the node so that all services are restarted (or restart them one by one)
 
  • Like
Reactions: Airw0lf
ah, you already mentioned in your first post that /etc/pve "is empty", I missed that.

so, you somehow corrupted the DB (did you manually attempt to edit it?).. if you have a backup of /etc/pve, I'd suggest the following

- move /var/lib/pve-cluster/* out of the way, e.g. mkdir /var/lib/pve-cluster/broken; mv /var/lib/pve-cluster/* /var/lib/pve-cluster/broken/
- move /etc/pve with the wrong content out of the way, e.g. mv /etc/pve /etc/pve-broken; mkdir /etc/pve
- now, attempt to (re)-start pve-cluster: systemctl restart pve-cluster
- if /etc/pve is still empty, check and post journalctl --since "-15min" -u pve-cluster
- if /etc/pve is now filled with content again, restore individual config files from your backup, and then reboot the node so that all services are restarted (or restart them one by one)

Hi @fabian ,

No - I didn't try to edit the DB manually - no reason for that.
And no - I don't have a backup of the complete /etc/pve folder - only of /etc/pve/nodes.

To what extend would this make a difference to the outcome of your suggestion?


Cheers - Will
 
it would mean that you have to recreate any relevant files in /etc/pve (e.g., storage.cfg and user.cfg) and /etc/pve/priv (e.g., keys/passwords used to access storages, ACME accounts, ..), either from memory/documentation, or using the data contained in the old, broken sqlite DB (you can inspect it with sqlite3 after moving it out of the way, the contents are pretty straight-forward).

the error messages in your first post are cut-off, could you do the following and post the resulting "log" file as well?

journalctl --since "2023-02-22 23:00" --until "2023-02-22 23:59" --unit pve-cluster > log
 
  • Like
Reactions: Airw0lf
it would mean that you have to recreate any relevant files in /etc/pve (e.g., storage.cfg and user.cfg) and /etc/pve/priv (e.g., keys/passwords used to access storages, ACME accounts, ..), either from memory/documentation, or using the data contained in the old, broken sqlite DB (you can inspect it with sqlite3 after moving it out of the way, the contents are pretty straight-forward).

the error messages in your first post are cut-off, could you do the following and post the resulting "log" file as well?

journalctl --since "2023-02-22 23:00" --until "2023-02-22 23:59" --unit pve-cluster > log

I already re-created bits-and-pieces (also mentioned when opening this threat).
But I'm not aware of what to put in for the remaining parts.

I also didn't trust myself copying the DB and start with that - didn't know (until your suggestion) how to put things back.

Attached the zip of the "log"-file.


Cheers - Will
 

Attachments

the problem is you didn't actually restore the bits and pieces ;) /etc/pve is not a simple directory, pmxcfs needs to be mounted there. by moving both the DB and the mountpoint out of the way, pmxcfs should be able to start with a clean slate and you can then copy back the things you already recovered from /etc/pve-broken to /etc/pve
 
  • Like
Reactions: Airw0lf
also, logs for pve-cluster covering the time period right before you did the rename up to "2023-02-22 23:01" would be interesting - you can use the same command but with adapt --since and --until to get those!
 
also, logs for pve-cluster covering the time period right before you did the rename up to "2023-02-22 23:01" would be interesting - you can use the same command but with adapt --since and --until to get those!

Well... if it helps: I can get you je the logs for the past 2-3 days?
So that you can slice-and-dice things anyway you want?
 
sure - if you indicate the (rough) time when you did the rename ;) if you have shell history that shows what you did when attempting the rename that might shed some light onto the root cause as well.
 
the problem is you didn't actually restore the bits and pieces ;) /etc/pve is not a simple directory, pmxcfs needs to be mounted there. by moving both the DB and the mountpoint out of the way, pmxcfs should be able to start with a clean slate and you can then copy back the things you already recovered from /etc/pve-broken to /etc/pve

Interesting concept... I removed the broken stuff, re-created the folders, rebooted and got my webUI back.
Which indeed allows me add things one-by-one as I have these config-things documented.

I now have the LXC container and VM's running as if nothing happened...

Great stuff this pve technolgy - thanks guys! :D :D:D
 
Last edited:
>Feb 22 23:34:18 vigilant pmxcfs[1163]: [database] crit: found entry with duplicate name 'lxc' - A:(inode = 0x0000000000>
>Feb 22 23:34:18 vigilant pmxcfs[1163]: [database] crit: found entry with duplicate name 'lxc' - A:(inode = 0x0000000000>

i had this problem today, after host rename

i tried renaming /etc/pve/nodes/old ->new, which didn't work - so i created "new" and moved qemu-server and lxc into the new dir and deleted the old ones. but apparently, that lead to duplicate names lxc and qem-server inside the database and pmxcfs wont start

i installed "visidata" from https://www.visidata.org , which is tui based database editor and could easily delete the old db entries (with oder revision) and pcxcfs works again after this

use at your own risk....
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!