PVE 8.4.12 Data Hunt for Recovery

LittleFinger

New Member
Nov 10, 2024
19
0
1
Earth
I have a borked PVE install I need to recover data off of. I can find the config files (/var/lib/lxc I think), but I'm having issues located the LXC and VM "drives." I was told everything I needed was in "/etc/config," but there's no config folder in this version. Maybe I looked in the wrong location?

Is there more than just the config and virtual drive data I'd need to collect before wiping and reinstalling Proxmox?

I can't add a new server to do backups to, I have SFTP access (been browsing via Filezilla), no console access via the WebUI.
 
For Proxmox VE, the LXC config file is under: /etc/pve/lxc. And the VM config file is under: /etc/pve/qemu-server
Have you try to use command vzdump <VMID> to backup your VM or lxc out?
 
Does no one reply to threads anymore?

Where can I find that data to move _manually_ from one server to another?
1) I have access to data over SFTP
2) Nothing will start up, etc., due to no quorum/cluster
3) I can't add or use other storage from my other server due to #2

I know where to find the data, I just don't know if my presumed method of doing what I'm experimenting with (and will see the results in the morning), will work as nothing can be transferred normally. I installed PBS in a VM, but I don't know if I need to do anything on my handicapped server to make it see PBS, or if control is initiated by PBS.
 
Please understand that "I have a borked PVE install" is no a very exhaustive description of your environment.
Simply assuming you clicked "default" at everything: what does "pvesm list local-lvm" show in a ssh shell?
 
Does no one reply to threads anymore?
You can't expect an exact answer if you don't provide an exact question: you are giving zero information on how your server/cluster is/was configured, zero information on what did you / happened to bork it and zero information on the current status. You don't even mentioned you have a cluster until your third post!

Please, provide exact details and hopefully someone could lend you a hand or provide some tips.
 
  • Like
Reactions: Johannes S
Please understand that "I have a borked PVE install" is no a very exhaustive description of your environment.
Simply assuming you clicked "default" at everything: what does "pvesm list local-lvm" show in a ssh shell?
Code:
root@galaxy:~# pvesm list local-lvm
storage 'local-lvm' does not exist

root@galaxy:~# pvesm list nvme
Volid                  Format  Type              Size VMID
nvme:subvol-103-disk-0 subvol  rootdir     2147483648 103
nvme:subvol-104-disk-0 subvol  rootdir    19327352832 104
nvme:vm-105-disk-0     raw     images         1048576 105
nvme:vm-105-disk-1     raw     images     34359738368 105
nvme:vm-106-disk-0     raw     images         1048576 106
nvme:vm-800-disk-0     raw     images         1048576 800
nvme:vm-800-disk-1     raw     images         4194304 800
nvme:vm-800-disk-2     raw     images     64424509440 800
nvme:vm-801-disk-0     raw     images         1048576 801
nvme:vm-801-disk-1     raw     images     53687091200 801
nvme:vm-900-disk-0     raw     images         1048576 900
nvme:vm-900-disk-1     raw     images    274877906944 900
nvme:vm-901-disk-0     raw     images     64424509440 901
root@galaxy:~# pvesm list local-zfs
Volid                   Format  Type         Size VMID
local-zfs:vm-901-disk-0 raw     images    1048576 901
local-zfs:vm-901-disk-1 raw     images    4194304 901

As for the cluster message, I tried to set it up, and then when it didn't work (as I only had two), I removed the cluster from configuration profiles. One server (the one I'm having issues with), is where it started. Yeah, I know - S M R T.

I simply wanna save the VM and LXC configs (found), I have found multiple spots where VM and LXC data is stored.. but what I'd like to try and achieve is to be able to fire up the VMs and LXCs on my other server just to verify they're restorable after I reinstall PVE and migrate them back.

Is PBS capable of backing up stuff like this without permission and/or confirmation from a PVE server?
 
Last edited:
Best way is to recover from backup.
But you can examine storage defined in /etc/pve/storage.cfg on failed server (and check configs of vm/ct)
Assuming the same storage (name and type) is defined on other server, you can try your "copy VM config and data" method.
Was data on a local storage directory on faulty server ?
 
  • Like
Reactions: Johannes S
Best way is to recover from backup.
But you can examine storage defined in /etc/pve/storage.cfg on failed server (and check configs of vm/ct)
Assuming the same storage (name and type) is defined on other server, you can try your "copy VM config and data" method.
Was data on a local storage directory on faulty server ?
To be fair, I'm pretty sure _only_ the PVE install is screwed as a result of my fuckery.

I'm sure PVE will however force me to re-do my NVMe and whatever drives comprise up my "local-zfs" array. Bulk storage is handled by TrueNAS which has control of the HBA card. All extra data is so far safe.
 
When it comes to shell access, I cannot use the web UI shell prompt, I can connect with Powershell and likely others if I wanted. So far I have found that I can not do updates presently. FileZilla has no issue cruising the directories on both PVE servers.
 
Last edited:
Best way is to recover from backup.
But you can examine storage defined in /etc/pve/storage.cfg on failed server (and check configs of vm/ct)
Assuming the same storage (name and type) is defined on other server, you can try your "copy VM config and data" method.
Was data on a local storage directory on faulty server ?
I copied VM data over to PVE #2 last night. Because it was stored on the nvme array in PVE #1, it's trying to be found in the same drive storage.
 
For Proxmox VE, the LXC config file is under: /etc/pve/lxc. And the VM config file is under: /etc/pve/qemu-server
Have you try to use command vzdump <VMID> to backup your VM or lxc out?
Code:
root@galaxy:~# vzdump 102
INFO: starting new backup job: vzdump 102
INFO: filesystem type on dumpdir is 'zfs' -using /var/tmp/vzdumptmp451247_102 for temporary files
INFO: Starting Backup of VM 102 (lxc)
INFO: Backup started at 2025-10-17 02:18:04
INFO: status = stopped
ERROR: Backup of VM 102 failed - unable to open file '/etc/pve/nodes/galaxy/lxc/102.conf.tmp.451247' - Permission denied
INFO: Failed at 2025-10-17 02:18:04
INFO: Backup job finished with errors
INFO: notified via target `mail-to-root`
job errors
 
So, I tried this.. and fail. What gives?

Code:
root@galaxy:~# ./maint_mode.sh

Enable maintenance mode and disable all current VM/CT bootup? (y/n) y
Disabling (and saving) current onboot settings:
update VM 105: -onboot 0
unable to open file '/etc/pve/nodes/galaxy/qemu-server/105.conf.tmp.1711794' - Permission denied
update VM 106: -onboot 0
unable to open file '/etc/pve/nodes/galaxy/qemu-server/106.conf.tmp.1711825' - Permission denied
update VM 800: -onboot 0
unable to open file '/etc/pve/nodes/galaxy/qemu-server/800.conf.tmp.1711832' - Permission denied
update VM 801: -onboot 0
unable to open file '/etc/pve/nodes/galaxy/qemu-server/801.conf.tmp.1711837' - Permission denied
unable to open file '/etc/pve/nodes/galaxy/lxc/102.conf.tmp.1711872' - Permission denied
update CT 102: -onboot 0
unable to open file '/etc/pve/nodes/galaxy/lxc/103.conf.tmp.1711881' - Permission denied
update CT 103: -onboot 0
unable to open file '/etc/pve/nodes/galaxy/lxc/104.conf.tmp.1711903' - Permission denied
update CT 104: -onboot 0

Is it possible to reinstall Proxmox over itself and not destroy the NVMe array I have? Give the array the same storage name and continue on? Would this even work, or do I need to get paid support?

Server is no longer trying to start everything, just complaining that it's tampon is lit on fire over no quorum (error 500). Means no adding or using local storage to back up VMs and LXCs into a single file for easy copy off the machine.
 
Last edited:
If you have a spare disc (e.G. USB disc or SATA/NVME if a slot in your host is still free ) you could format it with ZFS and to a zfs send/receive of your existing system to replicate all the data (including vm/lxc) from your host to it before the reinstall. Can you still login? Then please enter find /etc/pve and post the output here
 
First question was to 'locate the LXC and VM "drives."'. You were given the location in post #8. So data rescue is possible, question answered.
Second is how to repair the cluster. This depends on its current state (pvecm status would be nice). maybe the database is defective.
How many cluster nodes do you have, if it's single-node why do you have a cluster after all?
There should be database backups at "/var/lib/pve-cluster/backup", have you tried them?
 
  • Like
Reactions: Johannes S
If you have a spare disc (e.G. USB disc or SATA/NVME if a slot in your host is still free ) you could format it with ZFS and to a zfs send/receive of your existing system to replicate all the data (including vm/lxc) from your host to it before the reinstall. Can you still login? Then please enter find /etc/pve and post the output here
I can still sign in. WebGUI shell is unusable though, so I'm using Powershell for shell access and Filezilla for data browsing. I have four free drive bays.

Code:
root@galaxy:~# find /etc/pve
/etc/pve
/etc/pve/.debug
/etc/pve/.vmlist
/etc/pve/.members
/etc/pve/lxc
/etc/pve/local
/etc/pve/.rrd
/etc/pve/.version
/etc/pve/.clusterlog
/etc/pve/openvz
/etc/pve/qemu-server
/etc/pve/storage.cfg
/etc/pve/user.cfg
/etc/pve/pve-root-ca.pem
/etc/pve/priv
/etc/pve/priv/known_hosts
/etc/pve/priv/authorized_keys
/etc/pve/priv/storage
/etc/pve/priv/acme
/etc/pve/priv/pve-root-ca.srl
/etc/pve/priv/lock
/etc/pve/priv/authkey.key
/etc/pve/priv/pve-root-ca.key
/etc/pve/virtual-guest
/etc/pve/firewall
/etc/pve/replication.cfg
/etc/pve/vzdump.cron
/etc/pve/sdn
/etc/pve/sdn/fabrics
/etc/pve/datacenter.cfg
/etc/pve/mapping
/etc/pve/.corosync.conf.swp
/etc/pve/authkey.pub
/etc/pve/ha
/etc/pve/corosync.conf
/etc/pve/authkey.pub.old
/etc/pve/nodes
/etc/pve/nodes/galaxy
/etc/pve/nodes/galaxy/lxc
/etc/pve/nodes/galaxy/lxc/104.conf
/etc/pve/nodes/galaxy/lxc/103.conf
/etc/pve/nodes/galaxy/lxc/102.conf
/etc/pve/nodes/galaxy/pve-ssl.key
/etc/pve/nodes/galaxy/lrm_status
/etc/pve/nodes/galaxy/pve-ssl.pem
/etc/pve/nodes/galaxy/priv
/etc/pve/nodes/galaxy/ssh_known_hosts
/etc/pve/nodes/galaxy/openvz
/etc/pve/nodes/galaxy/qemu-server
/etc/pve/nodes/galaxy/qemu-server/106.conf
/etc/pve/nodes/galaxy/qemu-server/901.conf
/etc/pve/nodes/galaxy/qemu-server/900.conf
/etc/pve/nodes/galaxy/qemu-server/801.conf
/etc/pve/nodes/galaxy/qemu-server/800.conf
/etc/pve/nodes/galaxy/qemu-server/105.conf
/etc/pve/pve-www.key
 
Last edited:
First question was to 'locate the LXC and VM "drives."'. You were given the location in post #8. So data rescue is possible, question answered.
Second is how to repair the cluster. This depends on its current state (pvecm status would be nice). maybe the database is defective.
How many cluster nodes do you have, if it's single-node why do you have a cluster after all?
There should be database backups at "/var/lib/pve-cluster/backup", have you tried them?
Output:
Code:
root@galaxy:~# pvecm status
Cluster information
-------------------
Name:             SLN
Config Version:   1
Transport:        knet
Secure auth:      on

Cannot initialize CMAP service

There was only two cause I got ahead of myself one night when I should've known I was too sleepy to do the brain thing properly (I know, right? Go me..), and then I went into the shell and tried to reverse everything. I can see the files in "/var/lib/pve-cluster", but there is no backup folder.
Code:
root@galaxy:~# find /var/lib/pve-cluster
/var/lib/pve-cluster
/var/lib/pve-cluster/.pmxcfs.lockfile
/var/lib/pve-cluster/config.db-wal
/var/lib/pve-cluster/config.db-shm
/var/lib/pve-cluster/config.db