stupid question - I accidentally deleted my /var directory. How bad is that?

alpha754293

Active Member
Jan 8, 2023
204
31
33
Stupid question - I accidentally deleted my /var directory. How bad is that?

The even stupider part about this was that I was actually trying to create a backup of it, and when it wasn't rsync-ing correctly, I was trying to delete it from my target location and instead, accidentally added a / in front of it, and thus, deleted the source /var rather than the target var directory. I tried recovering from the last backup that I made of the folder, but unfortunately, I didn't back up the entire /var folder -- just /var/lib etc.

So how badly screwed am I?

(Thankfully, the system wasn't in a cluster nor anything like that.)
 
I also have a new problem where whatever config/database file that I was able to recover/restore from, it had my old list of VMs/LXCs. But on my lvm, I still disks there that are now technically orphaned.

The GUI has died/stopped working.

I googled how to move the disks via the CLI and it says to use the qm disk move command.

However, for example, I have a disk that's vm-4095-disk-0 when I type in lvdisplay, but when I type in either qm list or pct list, that VM/LXC doesn't exist anymore.

Does this mean that all of these orphaned disks/VMs/LXCs are dead/gone now?

Or is there some way for me to create like a "blank"/"fake" VM/LXC, attach the disk to it, to see if I can revive those VMs/LXCs?

But first things first, I need to move those disks off the LVM so that I can do a Proxmox re-install. What's the best way for me to do that if the qm disk move isn't going to work because those VMs/LXCs config files apparently doesn't exist anymore?
 
Hello,

the /etc/pve directory (where all Proxmox VE configurations are located) is a fuse filesystem backed up by a database located inside /var/lib. If you restored /var/lib from a previous backup that would be consistent with losing the latest VM/container configurations. In this scenario the configs are "gone" , but the disk images are still where they used to be. You could try reconstructing the configurations by hand if you more or less remember the configuration, what storages where used for the disk images, etc.

If you create a dummy VM with the same numerical VM id and no disks attached you can use

```
qm disk rescan --vmid $VMID
```

to automatically add the disks images (they will be added as "Unused Disks" which can be then re-attached in the web UI) that have the same numerical id.

> Also, is there a way for me to repair my Proxmox installation without having to do a complete re-install?

I would recommend to get the VMs running, make new backups, verify it is possible to restore those backups, and restore the backups on a new Proxmox VE install.
 
Last edited:
Hello,

the /etc/pve directory (where all Proxmox VE configurations are located) is a fuse filesystem backed up by a database located inside /var/lib. If you restored /var/lib from a previous backup that would be consistent with loosing the latest VM/container configurations. In this scenario the configs are "gone" , but the disk images are still where they used to be. You could try reconstructing them by hand if you remember the configuration, what storages where used for the disk images, etc.

If you create a dummy VM with the same numerical VM id and no disks attached you can use

```
qm disk rescan --vmid $VMID
```

to automatically add the disks images that have the same numerical id.

> Also, is there a way for me to repair my Proxmox installation without having to do a complete re-install?

I would recommend to get the VMs running, make new backups, verify it is possible to restore those backups, and restore the backups on a new Proxmox VE install.
Thank you.

But with my having deleted the /var directory, and then only partially being able to restore some, but not all of it back, would there be a way for me to quote like "get the rest" of the /var directory back by repairing my now broken/damaged Proxmox install, without a full and complete re-install of Proxmox?

For example, in the /var directory, I've only re-created /var/lib, /var/lock/, /var/log, and /var/run folders manually, to try and help get things going a little bit.

But is there a way to get the rest of the folders/structure back?

I think that I can infer what the configs were (loosely) based on the disks (as in number of disks) and the "structure", but I will know more when I am able to boot the VM disks back up.

But right now, I had google help me write a script to dump out the LVs to a ZFS raidz2 pool that resides on the system, that way, in case that I have to wipe the OS drive, I would at least have the LVs exported/dumped out to qcow2 disk format.

So I am trying to triage both things at the same time:

1) The system itself (due to my idiocy in having borked the /var folder and only having to restore a very tiny part of it back and 2) trying to triage/restore the VMs/LXCs, in this order of priority.

I appreciate any help that I can get for either of these two priorities, but especially the first priority.

Thank you.
 
> But with my having deleted the /var directory, and then only partially being able to restore some

That is very hard to say, the system is not expecting to loose /var out of nowhere. If you restored from an older backup it stands to reason that changes since the backup was taken were lost, that includes newly created guests.

> would there be a way for me to quote like "get the rest" of the /var directory back by repairing my now broken/damaged Proxmox install, without a full and complete re-install of Proxmox?

I would say that it is unlikely without a full backup. Since /var contains the state of many programs it is not clear to me whether such a backup would even work flawlessly.

> But right now, I had google help me write a script to dump out the LVs to a ZFS raidz2 pool that resides on the system, that way, in case that I have to wipe the OS drive, I would at least have the LVs exported/dumped out to qcow2 disk format.

Please use the backup functionality in Proxmox VE, that will backup both the disk images and that configurations correctly. Please do so once you restore the VM configs to a working state.

As I mentioned, the hard part of restoring the VM config is adding the disk images and that can be done with the qm command.
 
First, backup your virtual machines and the config in /etc/pve, if still possible.

While a process
(in this case pmxcfs, get the pid with
ps aux |grep pmxcfs |grep -v grep | awk '{print $2}' )

is running you can access the open files via /proc/PID/fd/*

The rest of /var you should be able to copy (rsysnc -av --numeric-ids) from another proxmox system.

That said, it is probably faster, easier and less error prone to backup your virtual machines or use your existing backups (you DO have existing backups, i hope) and import them on a new proxmox installation
 
Last edited:
  • Like
Reactions: Johannes S
Please use the backup functionality in Proxmox VE, that will backup both the disk images and that configurations correctly. Please do so once you restore the VM configs to a working state.
So....for the time being -- the borked system -- I am effectively treating that as if it is in a locked/frozen state. (It's not actually in a locked/frozen state. I'm just treating as though it was.)

By exporting/dumping out the disks from LV to qcow2, I have now fired up another system where I am going to do the recovery with, that way, the borked system stays "intact", in it's current, borked state, because I can't reboot it right now (according to google. Trying to reboot it now would be very, very bad.)

So I am going to try and use my other system to restore/recover the VMs/LXCs, and then send the backups to my PBS.

As I mentioned, the hard part of restoring the VM config is adding the disk images and that can be done with the qm command.
I might be able to recover one VM at a time. (I think that I have a pretty good shot of it because I have a separate Excel spreadsheet that tracks the manual IPv4 address assignments. by hostname, so that should give me a lot of clues about what the VM config was, between the IPv4 addy and said hostname.)

From there, I can use my other system to send the backup of the recovered VMs/LXCs to PBS and whilst it's doing that, I can start to work on fixing/recovering the borked system.

First, backup your virtual machines and the config in /etc/pve, if still possible.

While a process
(in this case pmxcfs, get the pid with
ps aux |grep pmxcfs |grep -v grep | awk '{print $2}' )

is running you can access the open files via /proc/PID/fd/*
No, that's gone now, unfortunately.

/etc/pve died. System was saying "Transport endpoint is not connected" so I googled it to try and fix it, and since it died, I can't tell if it made it worse or better. I mean, the fix did bring /etc/pve back to life, but from the prior, backed up state, and not the most recent state, which is a part of the problem due to my stupidity.


The rest of /var you should be able to copy (rsysnc -av --numeric-ids) from another proxmox system.
I will have to try that, probably over the network, probably over ssh.

(I'll have to google AI help me to with the specific/correct syntax for the command to make this happen.)


(you DO have existing backups, i hope)
Fortunately, yes.

I run a backup monthly.

It's just the in-between backups that I don't have (which is "sad" because the backups would've kicked off like half an hour after I had borked my system, if I hadn't borked my system).

So, I am recovering from the June 1st set of backups.

(hence trying to recover/rebuild the orphaned VM disks by creating a fake/dummy VM so that I can attach the disk to it, so that it won't be orphaned anymore.)


and import them on a new proxmox installation
Yeah....

darn darn darn darn darn (replace those with me cussing).

Again, the sad thing is that I was actually trying to make a backup of the /etc and /var directories when this happened. Damn it.
 
So how badly screwed am I?
Not screwed at all, but you will have downtime. The way forward depends on whether the payload (virtual disks) are located on the same disk as your system or not.

IF the virtual disks are on a seperate filesystem you're home free- just reinstall proxmox and use the method @Maximiliano presented to recover your vms.

IF its on the same disks it gets trickier- you can either install pve on the original partition (requires some sophistication) OR- and this is the way I would advise- copy all your virtual disks to another drive and use the above method to recover.

Again, the sad thing is that I was actually trying to make a backup of the /etc and /var directories when this happened. Damn it.
For future reference- you dont need to do that at all. the only things you would need to KNOW (and not necessarily copy rote) is the network interface file, certs/keys, and storage.cfg to be able to reconstruct the entire hypervisor. the rest can and should be done with a proper backup mechanism eg pbs.
 
IF its on the same disks it gets trickier- you can either install pve on the original partition (requires some sophistication) OR- and this is the way I would advise- copy all your virtual disks to another drive and use the above method to recover.
Yeah, that's what I did by exporting/dumping out the LV to qcow2 so that it's not on my OS disk anymore.


For future reference- you dont need to do that at all. the only things you would need to KNOW (and not necessarily copy rote) is the network interface file, certs/keys, and storage.cfg to be able to reconstruct the entire hypervisor. the rest can and should be done with a proper backup mechanism eg pbs.
My thinking was that if I am dumb enough to do this again (again, by accident), my thought was that if I had at least an archive of the entire /etc and /var directories, I would just be able to pull down the archive, and unpack and it should put pretty much everything back to the previous state, depending on how often I backup/archive said /etc and /var directories.

Next stupid question -- for my Win10 and Win11 VMs, how do I attach an existing EFI disk?

qm rescan -vmid <<vmid>> was able to bring them in as an unused disk, but vm-10000-disk-0.qcow2 is the EFI disk, and as such, when I edit the disk, via the web GUI, it doesn't give me the option to assign it as said EFI disk, just IDE, SATA, SCSI, etc.

Your help is greatly appreciated.
 
good question- I dont know.

you COULD just edit the vmid.conf directly (eg, /etc/pve/qemu-server/100.conf) and add the disk directly- find the line that starts with unused0: and change it to efidisk0:

might need other variables for pre-enrolled-keys, etc.
Sorry -- it's been busy trying to get all the VMs offloaded and backed up.

So I ended up googling it and google AI was able to help guide me.

I'm going to add this here, in case someone needs this, in the future:

1) Assuming that you have the disks in a pool and that you've added that pool as a storage that PVE can use (e.g. I have a pair of 12 TB SAS HDDs that I put in a ZFS mirror called bigpool and six 1 TB SATA HDDs that I put into a raidz pool, called smallpool), and then added both to PVE storage (Datacenter -> Storage).

So given this assumption, I also copied the qcow2 exports (from my borked system's LVM) to /bigpool/disks.

I also restored the last backup from PBS (e.g. VMID 11000 is my Win11 "template" of sorts), so that will pull down the configuration, etc.

So with Win11, there's a EFI disk and a TPM state as well as the OS disk itself.

2) If you then put/copy the disks in question to your storage's ./images directory (e.g. /bigpool/images), then you can run:

pvesm list bigpool and it will find the path for the storage. You'll need this path for the next step.

3) To replace the TPM state with an existing TPM state, use this command:

qm set <<VMID>> -tpmstate0 <storage_name>:vm-<old_vmid>-tpm.raw,version=v2.0

So in my case, I used:

qm set 11000 -tpmstate0 bigpool:11000/vm-11000-disk-0.qcow2.

I don't know if the version=v2.0 is super important or not.

I also don't know if the qcow2 (vs. raw) format makes a difference or not. If it doesn't work, you can always use this command to convert the qcow2 disk back to raw:


qemu-img convert -f qcow2 -O raw /path/to/source.qcow2 /path/to/destination.raw

4) To attach an existing EFI disk, use this command:
qm set <VMID> -efidisk0 <STORAGE_ID>:<SIZE_OR_VOLUME_NAME>

So in my case, I used:

qm set 11000 -efidisk0 bigpool:11000/vm-11000-disk-2.qcow2.

Again, I don't know if the qcow2 vs. raw format makes a difference. If it doesn't work/doesn't boot back up, you can always try converting it, re-attaching it, and then try to start the VM again, to see if it will work for you.
That's basically the jist of it.

It's been taking me a little while as I am pulling down backups, re-attaching the disks with the newer copies from the LVM-to-qcow2 exports, firing up the VM/LXC to make sure that it is back to operational order, and then creating a new backup with this, and once that is done, delete the VM/LXC and move on to the next one.

Working with HDDs takes a while.

I am also, at the same time, pulling down all of the disks from the non-LVM storage from my now borked server, recreating those as well, making sure that they're working, and then creating an updated backup copy of those VMs/LXCs (because as I said, if I had just waited like half an hour more or so, my monthly backup job would've ran. So, this, in effect, is the very, very, very manual version of that before I re-install PVE on my borked server.

It's going to take a while for me to get through it all because HDDs just don't run that fast, especially I am doing multiple things with/to it simultaneously.

(Thank you, Google AI, for helping me with the syntax for these commands.)