Shame on me.... I changed the hostname, now pve-cluster can't boot

Nanobug · Feb 12, 2025

I know now, that changing the hostname was a stupid idea.
It has been reverted now.
However, I can't make pve-cluster.service start again.

I have a PiKVM added to it, so I think I managed to capture most of the errors:

Here's some info to start of with:

Bash:

root@pve2:/var/lib/pve-cluster# pveversion -v
proxmox-ve: 8.3.0 (running kernel: 6.8.12-6-pve)
pve-manager: 8.3.2 (running version: 8.3.2/3e76eec21c4a14a7)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.12-6
proxmox-kernel-6.8.12-6-pve-signed: 6.8.12-6
proxmox-kernel-6.8.12-5-pve-signed: 6.8.12-5
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
proxmox-kernel-6.8.12-3-pve-signed: 6.8.12-3
proxmox-kernel-6.8.12-2-pve-signed: 6.8.12-2
proxmox-kernel-6.8.12-1-pve-signed: 6.8.12-1
proxmox-kernel-6.8.8-1-pve-signed: 6.8.8-1
proxmox-kernel-6.8.4-3-pve-signed: 6.8.4-3
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
proxmox-kernel-6.5.13-6-pve-signed: 6.5.13-6
proxmox-kernel-6.5: 6.5.13-6
proxmox-kernel-6.5.13-5-pve-signed: 6.5.13-5
proxmox-kernel-6.5.13-3-pve-signed: 6.5.13-3
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
proxmox-kernel-6.5.11-7-pve-signed: 6.5.11-7
proxmox-kernel-6.5.11-4-pve-signed: 6.5.11-4
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.2.0
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.10
libpve-cluster-perl: 8.0.10
libpve-common-perl: 8.2.9
libpve-guest-common-perl: 5.1.6
libpve-http-server-perl: 5.1.2
libpve-network-perl: 0.10.0
libpve-rs-perl: 0.9.1
libpve-storage-perl: 8.3.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.5.0-1
proxmox-backup-client: 3.3.2-1
proxmox-backup-file-restore: 3.3.2-2
proxmox-firewall: 0.6.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.3.1
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.3
pve-cluster: 8.0.10
pve-container: 5.2.3
pve-docs: 8.3.1
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.1.0
pve-firmware: 3.14-2
pve-ha-manager: 4.0.6
pve-i18n: 3.3.2
pve-qemu-kvm: 9.0.2-4
pve-xtermjs: 5.3.0-3
qemu-server: 8.3.3
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.6-pve1

Then the hosts file and the hostname file:

Bash:

root@pve2:/var/lib/pve-cluster# cat /etc/hosts
#127.0.0.1 localhost.localdomain localhost
10.0.20.3 pve2.nanonet.local pve2

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

Bash:

root@pve2:/var/lib/pve-cluster# cat /etc/hostname
pve2

Excuse my messy interface, I was trying to get 10 Gbit bonding to work, and I haven't done cleanup in it.

Code:

root@pve2:/var/lib/pve-cluster# cat /etc/network/interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

# Loopback interface
auto lo
iface lo inet loopback

# enp7s0 interface with static IP
auto enp7s0
#iface enp7s0 inet manual
iface enp7s0 inet static
        address 10.0.20.3
        netmask 255.255.255.0
        gateway 10.0.20.1

# Bonding setup for enp8s0f0 and enp8s0f1
auto bond0
iface bond0 inet manual
        bond-slaves enp8s0f0 enp8s0f1
        bond-miimon 100
        bond-mode 802.3ad
        bond-xmit-hash-policy layer2-3
        bond-lacp-rate 1
#        mtu 9000


#auto enp8s0f0
#iface enp8s0f0 inet manual

#auto enp8s0f1
#iface enp8s0f1 inet manual

#auto bond0
#iface bond0 inet manual
#       bond-slaves enp8s0f0 enp8s0f1
#       bond-miimon 100
#       bond-mode 802.3ad
#       bond-xmit-hash-policy layer2+3
#       bond-lacp-rate 1
#

# vmbr0 using enp7s0
auto vmbr0
iface vmbr0 inet static
        address 10.0.20.3/24
        gateway 10.0.20.1
        bridge-ports enp7s0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094

# vmbr1 using bond0
auto vmbr1
iface vmbr1 inet manual
        bridge-ports bond0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
#       mtu 9000

# Include additional interface configurations
source /etc/network/interfaces.d/*

journalctl on pve-cluster:

Bash:

root@pve2:/var/lib/pve-cluster# journalctl -xeu pve-cluster.service
Feb 12 21:58:51 pve2 systemd[1]: pve-cluster.service: Control process exited, code=exited, status=255/EXCEPTION
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ An ExecStart= process belonging to unit pve-cluster.service has exited.
░░
░░ The process' exit code is 'exited' and its exit status is 255.
Feb 12 21:58:51 pve2 systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ The unit pve-cluster.service has entered the 'failed' state with result 'exit-code'.
Feb 12 21:58:51 pve2 systemd[1]: Failed to start pve-cluster.service - The Proxmox VE cluster filesystem.
░░ Subject: A start job for unit pve-cluster.service has failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit pve-cluster.service has finished with a failure.
░░
░░ The job identifier is 3233 and the job result is failed.
Feb 12 21:58:51 pve2 systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 5.
░░ Subject: Automatic restarting of a unit has been scheduled
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ Automatic restarting of the unit pve-cluster.service has been scheduled, as the result for
░░ the configured Restart= setting for the unit.
Feb 12 21:58:51 pve2 systemd[1]: Stopped pve-cluster.service - The Proxmox VE cluster filesystem.
░░ Subject: A stop job for unit pve-cluster.service has finished
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A stop job for unit pve-cluster.service has finished.
░░
░░ The job identifier is 3409 and the job result is done.
Feb 12 21:58:51 pve2 systemd[1]: pve-cluster.service: Start request repeated too quickly.
Feb 12 21:58:51 pve2 systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ The unit pve-cluster.service has entered the 'failed' state with result 'exit-code'.
Feb 12 21:58:51 pve2 systemd[1]: Failed to start pve-cluster.service - The Proxmox VE cluster filesystem.
░░ Subject: A start job for unit pve-cluster.service has failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit pve-cluster.service has finished with a failure.
░░
░░ The job identifier is 3409 and the job result is failed.

From what I can tell on the forums, this is the essential stuff of my issues.

I did have the exact same errors as this thread:
https://forum.proxmox.com/threads/pve-cluster-fails-to-start.82861/

However, the solution didn't work for me.
I also have a backup of the config.db file as t.lamprecht suggested doing.

The database shows the following:

Bash:

root@pve2:/var/lib/pve-cluster# sqlite3 config.db 'PRAGMA integrity_check'
ok
root@pve2:/var/lib/pve-cluster# sqlite3 config.db .schema
CREATE TABLE tree (  inode INTEGER PRIMARY KEY NOT NULL,  parent INTEGER NOT NULL CHECK(typeof(parent)=='integer'),  version INTEGER NOT NULL CHECK(typeof(version)=='integer'),  writer INTEGER NOT NULL CHECK(typeof(writer)=='integer'),  mtime INTEGER NOT NULL CHECK(typeof(mtime)=='integer'),  type INTEGER NOT NULL CHECK(typeof(type)=='integer'),  name TEXT NOT NULL,  data BLOB);
root@pve2:/var/lib/pve-cluster# sqlite3 config.db 'SELECT inode,mtime,name FROM tree WHERE parent = 0'
0|1739373513|__version__
8|1706012227|virtual-guest
9|1706012227|priv
11|1706012227|nodes
24|1706012228|pve-www.key
30|1706012230|pve-root-ca.pem
49|1706012230|firewall
50|1706012230|ha
51|1706012230|mapping
53|1706012230|sdn
5280097|1713213472|notifications.cfg
15055938|1727877219|datacenter.cfg
18802066|1733487579|storage.cfg
22161888|1738520257|replication.cfg
22161892|1738520257|jobs.cfg
22161897|1738520257|vzdump.cron
22722776|1739361578|authkey.pub.old
22722779|1739361578|authkey.pub
22729924|1739372259|user.cfg

I did make a backup of the database before using

Bash:

sqlite3 config.db 'DELETE FROM tree WHERE parent = 347747 or inode = 347747'

I don't know where to troubleshoot from now on.
Can anyone point me in the right direction?

Nanobug · Feb 12, 2025

I hate when this happens...
Writing it all down, made me come up with the solution.

For my problem I did the following:

Code:

sqlite3 /var/lib/pve-cluster/config.db
.tables
SELECT * FROM tree WHERE name = 'qemu-server';

# Gave the following result:
14|12|14|0|1706012227|4|qemu-server|
22730368|22730367|22730368|0|1739373195|4|qemu-server|
22730494|22730367|22730495|0|1706012227|4|qemu-server|

#Then I deleted the two newest entries:
DELETE FROM tree WHERE inode = 22730368;
DELETE FROM tree WHERE inode = 22730494;


# Then startet all the services again:
systemctl start pve-cluster
systemctl start pvedaemon
systemctl start pveproxy
systemctl start pvestatd

# and rebooted
reboot

I hope this can possible help someone in the future, or even me the next time I decide to do something I shouldn't.

waltar · Feb 12, 2025

Nanobug said:
root@pve2:/var/lib/pve-cluster# cat /etc/hosts #127.0.0.1 localhost.localdomain localhost 10.0.20.3 pve2.nanonet.local pve2

The 127.0.0.1 localhost line should ever be there and NOT out commented as you did !!!

Nanobug · Feb 12, 2025

waltar said:
The 127.0.0.1 localhost line should ever be there and NOT out commented as you did !!!

Why not?
It's not being read anyway.

Search

Search

Shame on me.... I changed the hostname, now pve-cluster can't boot

Nanobug

Member

Nanobug

Member

waltar

Famous Member

Nanobug

Member

We value your privacy