[SOLVED] Mistakenly tried to change node hostname with VMs on the node

milesnthesky · Dec 26, 2023

I tried to change the hostname of my node without thoroughly reading the documentation, which states that it cannot be done on a node with VMs. This was a stupid oversight and now I have messed up my node.

I attempted to change the config.db file using the SQLite Browser to address the corrupted filesystem problem I created, but I haven't had much luck with that.

Is there anything I can do to resolve this issue?

Here is the result of systemctl status pve-cluster:

Code:

● pve-cluster.service - The Proxmox VE cluster filesystem
     Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Tue 2023-12-26 17:11:21 EST; 16min ago
    Process: 1044980 ExecStart=/usr/bin/pmxcfs (code=exited, status=255/EXCEPTION)
        CPU: 9ms

Dec 26 17:11:21 pve01 systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 5.
Dec 26 17:11:21 pve01 systemd[1]: Stopped The Proxmox VE cluster filesystem.
Dec 26 17:11:21 pve01 systemd[1]: pve-cluster.service: Start request repeated too quickly.
Dec 26 17:11:21 pve01 systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Dec 26 17:11:21 pve01 systemd[1]: Failed to start The Proxmox VE cluster filesystem.

Here is the result of journalctl -xe:

Code:

Dec 26 17:30:18 pve01 systemd[1]: Stopped The Proxmox VE cluster filesystem.
░░ Subject: A stop job for unit pve-cluster.service has finished
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A stop job for unit pve-cluster.service has finished.
░░
░░ The job identifier is 11096 and the job result is done.
Dec 26 17:30:18 pve01 systemd[1]: pve-cluster.service: Start request repeated too quickly.
Dec 26 17:30:18 pve01 systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ The unit pve-cluster.service has entered the 'failed' state with result 'exit-code'.
Dec 26 17:30:18 pve01 systemd[1]: Failed to start The Proxmox VE cluster filesystem.
░░ Subject: A start job for unit pve-cluster.service has failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit pve-cluster.service has finished with a failure.
░░
░░ The job identifier is 11096 and the job result is failed.
Dec 26 17:30:18 pve01 systemd[1]: Condition check resulted in Corosync Cluster Engine being skipped.
░░ Subject: A start job for unit corosync.service has finished successfully
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit corosync.service has finished successfully.
░░
░░ The job identifier is 11174.
Dec 26 17:30:20 pve01 pve-ha-lrm[1352]: updating service status from manager failed: Connection refused
Dec 26 17:30:23 pve01 pvestatd[1294]: ipcc_send_rec[1] failed: Connection refused
Dec 26 17:30:23 pve01 pvestatd[1294]: ipcc_send_rec[2] failed: Connection refused
Dec 26 17:30:23 pve01 pvestatd[1294]: ipcc_send_rec[3] failed: Connection refused
Dec 26 17:30:23 pve01 pvestatd[1294]: ipcc_send_rec[4] failed: Connection refused
Dec 26 17:30:23 pve01 pvestatd[1294]: status update error: Connection refused
Dec 26 17:30:25 pve01 pve-ha-lrm[1352]: updating service status from manager failed: Connection refused
Dec 26 17:30:27 pve01 pve-firewall[1291]: status update error: Connection refused

esi_y · Dec 26, 2023

milesnthesky said:
I tried to change the hostname of my node without thoroughly reading the documentation, which states that it cannot be done on a node with VMs. This was a stupid oversight and now I have messed up my node.

It basically was not designed to be changed, ever. VMs on it or not, it's just sugarcoating the fact you have to reinstall nodes if you want to do it cleanly.

milesnthesky said:
I attempted to change the config.db file using the SQLite Browser to address the corrupted filesystem problem I created, but I haven't had much luck with that.

What exactly did you change there? Because that just gets mounted into your /etc/pve as virtual filesystem so you might as well have gone editing the file contents there.

milesnthesky said:
Is there anything I can do to resolve this issue?

Change the name back? Copy back config.db from another node (with the service off) in case it got corrupted, tried to see if the cluster gets all happy with it on the line again?

milesnthesky · Dec 26, 2023

tempacc346235 said:
It basically was not designed to be changed, ever. VMs on it or not, it's just sugarcoating the fact you have to reinstall nodes if you want to do it cleanly.

What exactly did you change there? Because that just gets mounted into your /etc/pve as virtual filesystem so you might as well have gone editing the file contents there.

Change the name back? Copy back config.db from another node (with the service off) in case it got corrupted, tried to see if the cluster gets all happy with it on the line again?

I changed all instances of my old hostname to my new hostname.

Hmm, so I guess I would have to spin up another node to get the config.db file off of it?

esi_y · Dec 26, 2023

milesnthesky said:
I changed all instances of my old hostname to my new hostname.

Hmm, so I guess I would have to spin up another node to get the config.db file off of it?

I suspect you will still have some missing parts in corosync.conf, etc, also if you were changing it on a running node when the config.db was basically in use we do not know what happened.

Are you saying you edit all (how many nodes?) config.dbs manually found all instances of the alias/name and now it's not working. Do the remaining nodes have quorum? pvecm status say what?

I thought you have at least some config.db untouched.

Do you have any backups of that? I suggest you back up your VMs/CTs before even e.g. restarting any of the nodes now.

milesnthesky · Dec 26, 2023

tempacc346235 said:
I suspect you will still have some missing parts in corosync.conf, etc, also if you were changing it on a running node when the config.db was basically in use we do not know what happened.

Are you saying you edit all (how many nodes?) config.dbs manually found all instances of the alias/name and now it's not working. Do the remaining nodes have quorum? pvecm status say what?

I thought you have at least some config.db untouched. Do you have any backups of that? I suggest you back up your VMs/CTs before even e.g. restarting any of the nodes now.

The physical machine was running, but the pvecluster service was not if that makes a difference.

It's only one node.

Unfortunately another rookie mistake, I do not have any backups.

esi_y · Dec 27, 2023

milesnthesky said:
The physical machine was running, but the pvecluster service was not if that makes a difference.

It's only one node.

Unfortunately another rookie mistake, I do not have any backups.

Oh I see, so this is no cluster, just single node, you did exactly what? hostnamectl and then went on to manually edit config.db with sql tool with the service off?

Do you have /etc/pve populated with files now at all?

milesnthesky · Dec 27, 2023

tempacc346235 said:
Oh I see, so this is no cluster, just single node, you did exactly what? hostnamectl and then went on to manually edit config.db with sql tool with the service off?

Do you have /etc/pve populated with files now at all?

I edited the hostname in /etc/hostname and /etc/hosts. Upon discovering that I broke my instance, I followed a Reddit thread to resolve the issue. I stopped the pve-cluster service and tried to edit the config.db file by updating all instances of of the old hostname with the new one. This did not succeed.

I do not see any files or directories in /etc/pve.

esi_y · Dec 27, 2023

milesnthesky said:
I edited the hostname in /etc/hostname and /etc/hosts. Upon discovering that I broke my instance, I followed a Reddit thread to resolve the issue. I stopped the pve-cluster service and tried to edit the config.db file by updating all instances of of the old hostname with the new one. This did not succeed.

I do not see any files or directories in /etc/pve.

Alright, I suppose you cannot really break it further any more. It would be nice to be able to see that /etc/pve mounting first of all.

Did you try to change everything back to begin with? Make a copy of that config.db (too bad you do not have the original), then go on change the names back. Also change back the /etc/host{name|s} files. I would reboot. If it does not start at all, you will best have e.g. live Debian to play with it further.

esi_y · Dec 27, 2023

Silly question - what tool did you use to edit that config.db?

milesnthesky · Dec 27, 2023

tempacc346235 said:
Alright, I suppose you cannot really break it further any more. It would be nice to be able to see that /etc/pve mounting first of all.

Did you try to change everything back to begin with? Make a copy of that config.db (too bad you do not have the original), then go on change the names back. Also change back the /etc/host{name|s} files. I would reboot. If it does not start at all, you will best have e.g. live Debian to play with it further.

Interestingly, just changing the hostname in /etc/hostname and /etc/hosts back to the original now without touching the config.db allows the pve-cluster service to start and for me to access the web administration panel. However, I cannot access the shell of the node or the consoles of VMs through the web GUI. When I attempt to access the shell I get undefined Code: 1006 and the VM consoles just say Failed to connect to server.

milesnthesky · Dec 27, 2023

tempacc346235 said:
Silly question - what tool did you use to edit that config.db?

DB Browser for SQLite as recommend by that Reddit thread.

esi_y · Dec 27, 2023

milesnthesky said:
Interestingly, just changing the hostname in /etc/hostname and /etc/hosts back to the original now without touching the config.db allows the pve-cluster service to start and for me to access the web administration panel. However, I cannot access the shell of the node or the consoles of VMs through the web GUI. When I attempt to access the shell I get undefined Code: 1006 and the VM consoles just say Failed to connect to server.

The issue is, you have a mix of everything. The question is not if the service starts, but if you get /etc/pve mounted. And even then you would need to edit the entries back to the original.

milesnthesky · Dec 27, 2023

tempacc346235 said:
The issue is, you have a mix of everything. The question is not if the service starts, but if you get /etc/pve mounted. And even then you would need to edit the entries back to the original.

Well I can see files in /etc/pve now. Would the next step be to try and revert config.db?

esi_y · Dec 27, 2023

milesnthesky said:
Well I can see files in /etc/pve now. Would the next step be to try config.db again?

As it is in some "stable" state, I would make a copy of it (config.db) before I do anything.

Then I would actually copy out the content of /etc/pve somewhere aside and edit the instances of the alias in the files (it's easier than in the db). After all that, you can literally copy the "fixed" files back into /etc/pve in one shot. It will - since the service is running - update the config.db anyhow.

If you feel more comfortable editing the config.db, you would need to do that with the service off, then restart it. It makes no difference, it's all the same content, but I would prefer editing textfiles rather than juggling with sql tool on a DB file.

milesnthesky · Dec 27, 2023

tempacc346235 said:
As it is in some "stable" state, I would make a copy of it (config.db) before I do anything.

Then I would actually copy out the content of /etc/pve somewhere aside and edit the instances of the alias in the files (it's easier than in the db). After all that, you can literally copy the "fixed" files back into /etc/pve in one shot. It will - since the service is running - update the config.db anyhow.

If you feel more comfortable editing the config.db, you would need to do that with the service off, then restart it. It makes no difference, it's all the same content, but I would prefer editing textfiles rather than juggling with sql tool on a DB file.

Got it. Thanks. Let me try that and report back.

esi_y · Dec 27, 2023

Just so you know, especially for a single-node install, it really is trivial relationship of that /etc/pve filesystem and the config.db. The entries in the DB are file content and they point to "inodes" of the parent dirs and all that. Then there's aritificial constructs like symlinks which the filesystem mounts like on top of what's in the db (it is more useful for cluster scenarios).

You only care about the actual files, not symlinks. You could even edit it in-place, basically, but it's better if the files change "at once" so that the names are correct all at once.

Another fun fact, since this is basically trivial install, once you have fixed content (from /etc/pve), you can literally reinstall even whole PVE but keep the VMs/CTs LVM/ZFS pool. If you just did that and named that new install the same and then copied in the backed up /etc/pve content it should start working too. But since you have not catastrophically destroy anything else on that node, why do all that. So that's why I would just "fix the files back".

milesnthesky · Dec 27, 2023

Gotcha, thanks for the info!

So that seemed to fix it. The "Failed to connect to server." on the VMs is still persisting and seems to be a result of accessing the web GUI through a reverse proxy based on some troubleshooting I just did (I can access the VM consoles from the local ip, but not the reverse proxy address). Any ideas on how to fix that?

esi_y · Dec 27, 2023

milesnthesky said:
Gotcha, thanks for the info!

So that seemed to fix it. The "Failed to connect to server." on the VMs is still persisting and seems to be a result of accessing the web GUI through a reverse proxy based on some troubleshooting I just did (I can access the VM consoles from the local ip, but not the reverse proxy address). Any ideas on how to fix that?

Can you try to run pvecm updatecerts -f. If that does not fix it you would need to explain your reverse proxy setup and what exactly is failing. Another thing, you may want to reboot the whole thing now that it's back to life.

milesnthesky · Dec 27, 2023

tempacc346235 said:
Can you try to run pvecm updatecerts -f. If that does not fix it you would need to explain your reverse proxy setup and what exactly is failing. Another thing, you may want to reboot the whole thing now that it's back to life.

The command and reboot did not seem to change anything. My setup is relatively simple. I have an instance of Nginx Proxy Manager that is only exposed on my local network. It uses a domain I own and certificates from Letsencrypt.

milesnthesky · Dec 27, 2023

tempacc346235 said:
Can you try to run pvecm updatecerts -f. If that does not fix it you would need to explain your reverse proxy setup and what exactly is failing. Another thing, you may want to reboot the whole thing now that it's back to life.

Fixed! Just needed to update the reverse proxy configuration to support web sockets. Thanks for your help and happy holidays!

[SOLVED] Mistakenly tried to change node hostname with VMs on the node

New Member

Renowned Member

New Member

Renowned Member

New Member

Renowned Member

New Member

Renowned Member

Renowned Member

New Member

New Member

Renowned Member

New Member

Renowned Member

New Member

Renowned Member

New Member

Renowned Member

New Member

New Member

We value your privacy