/etc/pve Not Mounted - Proxmox 4.4

TheKingOfKats

New Member
Dec 28, 2016
16
2
3
29
I'm afraid I've done a number to my install in that I can no longer access /etc/pve/ or /var/lib/pve/. I tried to create a cluster, but after doing so changed the node's hostname. When I found that I could not write to /etc/pve, I rebooted, at which point there was an error where the cluster failed to quorum and /etc/pve was not writable. I tried to remove the node from the cluster, and after rebooting from there it appears that /etc/pve/ is not mounted or all the data inside is gone, hopefully the former.

Ideally, if I can recover the VM/LXC configuration files I will transfer all my services to another node and reinstall fresh with 5.0 before transferring back.

On my broken install I can still access my LXC container root mounts at /rpool/data/subvol-*, however, this does not exist on my second node running a fresh install of 4.4, where it seems that VMs are instead virtual block devices under /dev/pve. What is going on here?

Thanks!
 
Hi,

this sounds like there is some problem at cluster creation.

read this for more information
https://pve.proxmox.com/wiki/Proxmox_Cluster_File_System_(pmxcfs)
https://pve.proxmox.com/wiki/Cluster_Manager

Thanks. I'm able to pull the VM/LXC configurations from the database and get the data from /rpool/data/subvol*, but could you explain the difference in the two storage models? On my second node the /rpool/data/subvol* directory does not exist, but instead there are virtual block devices under /dev/pve
 
Thanks. I'm able to pull the VM/LXC configurations from the database and get the data from /rpool/data/subvol*, but could you explain the difference in the two storage models? On my second node the /rpool/data/subvol* directory does not exist, but instead there are virtual block devices under /dev/pve

Disregard that. I wasn't looking at VMs stored on ZFS on the second node. Could you verify that if I reinstall Proxmox with only my root drives attached then import a zpool from my storage drives afterwards that Proxmox will not delete any data when I import the ZFS storage with the web interface? I doubt that would be the case but I just want to make sure; restoring from backups will take a long time and I'd like to avoid that if at all possible.
 
In case anyone stumbles on this, I have resolved my problem by manually migrating all of my services across nodes. Here is the full story from my internal documentation:


After trying to link Proxmox nodes eden and zion into a cluster, due to a ZFS misconfiguration (traces of removed zpool, tfast), changing the hostname, and network dependency issues where Proxmox needs a network in order to boot but needs to boot in order to network, eden node was left inoperable with all VM/LXC configurations moved into a database and unusable without the cluster service. Although I tried to simply revert back to the local storage model and remove all traces of the cluster, this ultimately tainted the install beyond reasonable repair.

I manually extracted each LXC/VM config from the SQLite database file /var/lib/pve/config.db and moved them to the zion host.

sqlite3 /etc/pve-cluster/config.db

select * from tree;

I went through this information and manually recreated the original config files.

Before completely reinstalling the OS on eden, I needed to back up all VM/LXC data/configurations to the zion node while still maintaining permissions/ownership. Under the ZFS storage model for LXC containers in Proxmox are owned by root and permit ACL access, meaning that any tools I use must maintain ACL data. For transferring between eden and zion, I used rsync

# Run from the broken eden node
sudo rsync -aAX /rpool/data/ root@zion.localdomain:/rpool/data/

After copying my recreated config files to /etc/pve/lxc on the functional node, this allowed me to immediately boot and run all of my containers that did not have bind mount dependencies. ** Note: my two nodes are nearly identical in hardware configuration, but if your second node does not have the same storage IDs as the original, you will have to account for that in your config files. You could do this with a simple sed operation.

With my LXCs moved over, I moved onto the VMs. This turned out to be a huge hassle; luckily I only had two VMs to migrate, but if you have any more than that it would probably be worthwhile to look for alternative methods to doing this.

The VMs exist as virtual disks on /dev/zpool/vm-ID-disk-1.

I backed up the pfSense VM using dd to image the virtual disk, then restored it by creating an empty volume with the same ID on zion and using dd to copy the image to zion's empty device.

dd if=/dev/rpool/vm-113-disk-1 of=113.img bs=64k
rsync -aAX 113.img root@zion.localdomain:/home/KingOfKats/

Because /dev/zpool/vm-ID-disk-# is actually a symlink to a virtual disk /dev/zd##, first I find the backing virtual disk from the symlink and restore the backup to that.

dd if=113.img of=/dev/zd## bs=64k

After I had all of my services / data moved over, I installed a fresh copy of Proxmox 5.0 on the previously broken node and essentially repeated the process moving back. I definitely won't be trying out any clustering stuff for a while, and when I do I will definitely be more careful about it.