Hi All,
I have been migrating all of my servers to new subnets and have been changing the hostnames to more appropriate names as I go along.
The very last server was my Proxmox hypervisor which is where it's all gone wrong, mostly due to my own stupidity.
I attempted to follow: https://pve.proxmox.com/wiki/Renaming_a_PVE_node
My server is standalone and is not part of a cluster. However I did leave nodes on the server. (Yes I know I'm an idiot)
When I attempted to move the configuration files I got the error:
mv: cannot move ‘/etc/pve/nodes/sauron/lxc’ to ‘/etc/pve/nodes/hpv-01/lxc’: Directory not empty
mv: cannot move ‘/etc/pve/nodes/sauron/qemu-server’ to ‘/etc/pve/nodes/hpv-01/qemu-server’: Directory not empty
After this /etc/pve became unmounted.
I attempted to restart the cluster service:
~# systemctl status pve-cluster
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled)
Active: failed (Result: exit-code) since Tue 2016-11-15 18:42:56 GMT; 8s ago
Process: 28155 ExecStartPost=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
Process: 30032 ExecStart=/usr/bin/pmxcfs $DAEMON_OPTS (code=exited, status=255)
Main PID: 28153 (code=killed, signal=SEGV)
Nov 15 18:42:56 hpv-01 pmxcfs[30032]: [database] crit: found entry with duplicate name (inode = 00000000011C65D7, parent = 00000000011C653C, name = 'lxc')
Nov 15 18:42:56 hpv-01 pmxcfs[30032]: [database] crit: DB load failed
Nov 15 18:42:56 hpv-01 pmxcfs[30032]: [database] crit: found entry with duplicate name (inode = 00000000011C65D7, parent = 00000000011C653C, name = 'lxc')
Nov 15 18:42:56 hpv-01 pmxcfs[30032]: [database] crit: DB load failed
Nov 15 18:42:56 hpv-01 pmxcfs[30032]: [main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/config.db'
Nov 15 18:42:56 hpv-01 pmxcfs[30032]: [main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/config.db'
Nov 15 18:42:56 hpv-01 pmxcfs[30032]: [main] notice: exit proxmox configuration filesystem (-1)
Nov 15 18:42:56 hpv-01 systemd[1]: pve-cluster.service: control process exited, code=exited status=255
Nov 15 18:42:56 hpv-01 systemd[1]: Failed to start The Proxmox VE cluster filesystem.
Nov 15 18:42:56 hpv-01 systemd[1]: Unit pve-cluster.service entered failed state.
I was able to follow the following guide to partially resolve the problems with the database:
http://blog.sjas.de/posts/proxmox-unable-to-open-database.html
I removed duplicates for 'lxc' as well as 'qemu-server' and 'lrm_status'.
The current status of the service is as follows:
~# systemctl status pve-cluster
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled)
Active: failed (Result: exit-code) since Tue 2016-11-15 19:21:49 GMT; 2s ago
Process: 28155 ExecStartPost=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
Process: 38065 ExecStart=/usr/bin/pmxcfs $DAEMON_OPTS (code=exited, status=255)
Main PID: 28153 (code=killed, signal=SEGV)
Nov 15 19:21:49 hpv-01 pmxcfs[38065]: [database] crit: missing directory inode (inode = 000000000000000A)
Nov 15 19:21:49 hpv-01 pmxcfs[38065]: [database] crit: DB load failed
Nov 15 19:21:49 hpv-01 pmxcfs[38065]: [database] crit: missing directory inode (inode = 000000000000000A)
Nov 15 19:21:49 hpv-01 pmxcfs[38065]: [database] crit: DB load failed
Nov 15 19:21:49 hpv-01 pmxcfs[38065]: [main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/config.db'
Nov 15 19:21:49 hpv-01 pmxcfs[38065]: [main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/config.db'
Nov 15 19:21:49 hpv-01 pmxcfs[38065]: [main] notice: exit proxmox configuration filesystem (-1)
Nov 15 19:21:49 hpv-01 systemd[1]: pve-cluster.service: control process exited, code=exited status=255
Nov 15 19:21:49 hpv-01 systemd[1]: Failed to start The Proxmox VE cluster filesystem.
Nov 15 19:21:49 hpv-01 systemd[1]: Unit pve-cluster.service entered failed state.
I have a copy of config.db from before I attempted to clean up the database which I can provide to anyone who can assist.
Will this be at all possible to recover from?
If not, what would my best recovery strategy be?
I would prefer not to restore all guests from last night's backup if possible as the VM and LXC container data is good.
If I reinstalled Proxmox can I simply re-attach the storage and restore the KVM and LXC configs from my backups?
I have been migrating all of my servers to new subnets and have been changing the hostnames to more appropriate names as I go along.
The very last server was my Proxmox hypervisor which is where it's all gone wrong, mostly due to my own stupidity.
I attempted to follow: https://pve.proxmox.com/wiki/Renaming_a_PVE_node
My server is standalone and is not part of a cluster. However I did leave nodes on the server. (Yes I know I'm an idiot)
When I attempted to move the configuration files I got the error:
mv: cannot move ‘/etc/pve/nodes/sauron/lxc’ to ‘/etc/pve/nodes/hpv-01/lxc’: Directory not empty
mv: cannot move ‘/etc/pve/nodes/sauron/qemu-server’ to ‘/etc/pve/nodes/hpv-01/qemu-server’: Directory not empty
After this /etc/pve became unmounted.
I attempted to restart the cluster service:
~# systemctl status pve-cluster
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled)
Active: failed (Result: exit-code) since Tue 2016-11-15 18:42:56 GMT; 8s ago
Process: 28155 ExecStartPost=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
Process: 30032 ExecStart=/usr/bin/pmxcfs $DAEMON_OPTS (code=exited, status=255)
Main PID: 28153 (code=killed, signal=SEGV)
Nov 15 18:42:56 hpv-01 pmxcfs[30032]: [database] crit: found entry with duplicate name (inode = 00000000011C65D7, parent = 00000000011C653C, name = 'lxc')
Nov 15 18:42:56 hpv-01 pmxcfs[30032]: [database] crit: DB load failed
Nov 15 18:42:56 hpv-01 pmxcfs[30032]: [database] crit: found entry with duplicate name (inode = 00000000011C65D7, parent = 00000000011C653C, name = 'lxc')
Nov 15 18:42:56 hpv-01 pmxcfs[30032]: [database] crit: DB load failed
Nov 15 18:42:56 hpv-01 pmxcfs[30032]: [main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/config.db'
Nov 15 18:42:56 hpv-01 pmxcfs[30032]: [main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/config.db'
Nov 15 18:42:56 hpv-01 pmxcfs[30032]: [main] notice: exit proxmox configuration filesystem (-1)
Nov 15 18:42:56 hpv-01 systemd[1]: pve-cluster.service: control process exited, code=exited status=255
Nov 15 18:42:56 hpv-01 systemd[1]: Failed to start The Proxmox VE cluster filesystem.
Nov 15 18:42:56 hpv-01 systemd[1]: Unit pve-cluster.service entered failed state.
I was able to follow the following guide to partially resolve the problems with the database:
http://blog.sjas.de/posts/proxmox-unable-to-open-database.html
I removed duplicates for 'lxc' as well as 'qemu-server' and 'lrm_status'.
The current status of the service is as follows:
~# systemctl status pve-cluster
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled)
Active: failed (Result: exit-code) since Tue 2016-11-15 19:21:49 GMT; 2s ago
Process: 28155 ExecStartPost=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
Process: 38065 ExecStart=/usr/bin/pmxcfs $DAEMON_OPTS (code=exited, status=255)
Main PID: 28153 (code=killed, signal=SEGV)
Nov 15 19:21:49 hpv-01 pmxcfs[38065]: [database] crit: missing directory inode (inode = 000000000000000A)
Nov 15 19:21:49 hpv-01 pmxcfs[38065]: [database] crit: DB load failed
Nov 15 19:21:49 hpv-01 pmxcfs[38065]: [database] crit: missing directory inode (inode = 000000000000000A)
Nov 15 19:21:49 hpv-01 pmxcfs[38065]: [database] crit: DB load failed
Nov 15 19:21:49 hpv-01 pmxcfs[38065]: [main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/config.db'
Nov 15 19:21:49 hpv-01 pmxcfs[38065]: [main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/config.db'
Nov 15 19:21:49 hpv-01 pmxcfs[38065]: [main] notice: exit proxmox configuration filesystem (-1)
Nov 15 19:21:49 hpv-01 systemd[1]: pve-cluster.service: control process exited, code=exited status=255
Nov 15 19:21:49 hpv-01 systemd[1]: Failed to start The Proxmox VE cluster filesystem.
Nov 15 19:21:49 hpv-01 systemd[1]: Unit pve-cluster.service entered failed state.
I have a copy of config.db from before I attempted to clean up the database which I can provide to anyone who can assist.
Will this be at all possible to recover from?
If not, what would my best recovery strategy be?
I would prefer not to restore all guests from last night's backup if possible as the VM and LXC container data is good.
If I reinstalled Proxmox can I simply re-attach the storage and restore the KVM and LXC configs from my backups?