Editing storage.cfg Disaster

inbox25

Member
May 20, 2022
4
0
6
Hi everyone,

I have been using Proxmox to create new servers at home and am fairly new to the software and the concept of servers in general.

I created a main Proxmox node with a couple of VMS and 4 or 5 LXCs and it is installed on a single NVME SSD with zfs (defaults). I also created a cluster with this machine as the main node.

I have a second empty Proxmox node installed initially on a single sata SSD with an LVM setup (defaults). Today I tried to join this second node to the cluster with the first node.

Whilst the joining process was a success, I found that the there was some kind of 'question mark' icon in the 'drive' section of the second node. I was doing some reading online and then stupidly decided to edit the storage.cfg file. Additionally, out of panic and stupidity, I may have also deleted some disks from one of the Proxmox nodes, although I cannot remember which one specifically.

During this editing process I made some changes which have resulted in me seeing the following message when I try to access the shell of the first node:

1.jpg

Reading this leads me to believe that I may have accidentally deleted/cleared the drive containing the root partition of the first Proxmox node. I cannot SSH into this Proxmox host either. The only thing is that I can still see the webGUI of both nodes.

The LXCs and VMs which are running on the first Proxmox node are all still functioning and I can still access the services being hosted on them. The only issue is that when I shut down an LXC or VM, I cannot start it back up, it shows me the error:

Code:
TASK ERROR: zfs error: cannot open 'rpool/subvol-111-disk-0': dataset does not exist

When I try to migrate VM/LXCs to the second node, it will say the follow (in addition to the previous error message):

Code:
ERROR: migration aborted (duration 00:00:03): storage migration for 'local-zfs:subvol-105-disk-0' to storage 'local-zfs' failed - zfs error: For further help on a command or topic, run: zfs help [<topic>

I have been playing around in order to try to 'guess' what the original storage.cfg contained, as it stands, the storage.cfg sits at this:

2.jpg

Given that I cannot access the terminal of the first Proxmox node via the Proxmox web console and cannot SSH into it (perhaps due to not enabling SSH directly into the root account of the Proxmox node, or perhaps by actually deleting the root contents on the Proxmox installation drive), is there a way to rescue this? Does the aforementioned information tell me that I actually accidentally deleted the root contents of the first Proxmox install or are these errors purely caused by an incorrect storage.cfg file? If the former is true, am I doomed to wiping this node and restarting from scratch? And if the latter is the case, how can I go about reverse engineering the right storage.cfg file from my current predicament?

Much thanks to anyone reading this
 
For the Proxmox node where you lost access - you can try to boot from Linux live CD with ZFS support - I think Ubuntu ISO images support ZFS out of the box. After that, check if your ZFS pool exists and data is present (zpool list, zfs list, etc). If it is there, then next steps is to backup/move it somewhere to a safe place and reinstall Proxmox on the node.
 
  • Like
Reactions: Kingneutron
For the Proxmox node where you lost access - you can try to boot from Linux live CD with ZFS support - I think Ubuntu ISO images support ZFS out of the box. After that, check if your ZFS pool exists and data is present (zpool list, zfs list, etc). If it is there, then next steps is to backup/move it somewhere to a safe place and reinstall Proxmox on the node.
Okay great, thank you . I will try this tonight and report back

Just to clarify, will this involve some kind of "chroot"? and also mounting the old hard drives to the Ubuntu Live USB in order to be able to use the zpool list commands?
 
Looks like I'll be doing a brand new install. I booted from an Ubuntu live USB and entered the following commands which returned the following outputs

Seems like the disk I installed the ZFS Proxmox install, plus the ISO and images disk are completely wiped. Weirdly enough I can still boot into Proxmox with a monitor connected to the Proxmox box and it even makes it to the login screen, but when I enter my username and password, it returns me back to the login screen. Additionally, it displays an incorrect IP address for Proxmox itself on this computer

1. lsblk

20240419_204055(1).jpg

2. mount /dev/nvme0n1p3 /mnt

Code:
 mount: /mnt: unknown filesystem type 'zfs_member'

3. zpool list

Code:
 no pools available

4. zfs list

Code:
 no datasets available

5. gparted

6. df -h /dev/nvme0n1

Code:
 Filesystem    Size    Used Avail Use% Mounted on
udev        16G       0   16G   0% /dev

20240419_205735.jpg
 
Try
Code:
zfs import
to list any ZFS filesystems that can be detected. Then
Code:
zfs import filesystemname
to import them (make them detectable / often also mount them)
 
  • Like
Reactions: bomzh
According to your screenshot it looks like ZFS might be still present.
Try running this command from LiveCD: zpool import
and after that check if any ZFS pool / dataset exists: zpool list && zfs list
 
  • Like
Reactions: Kingneutron
According to your screenshot it looks like ZFS might be still present.
Try running this command from LiveCD: zpool import
and after that check if any ZFS pool / dataset exists: zpool list && zfs list
You are correct, it's zpool import, not zfs import.
 
According to your screenshot it looks like ZFS might be still present.
Try running this command from LiveCD: zpool import
and after that check if any ZFS pool / dataset exists: zpool list && zfs list

Hi bomzh,
As an learning exercise for myself, which screenshots indicate that a ZFS item may still be present?

OP, trying to be kind here but take this as your cue to implement a regular backup regimen. Make a backup before making ANY system changes, including package upgrades. You should always have something to restore from.

Hopefully this will help you in the future:

https://github.com/kneutron/ansitest/blob/master/sysadmin-bkp-edit.sh

Yes, unfortunately I spent the last few days blowing away the old Proxmox install and building up the new one. I am still quite new to Proxmox so am still learning this aspect, currently have some LXC and VM dumps which I am going to temporarily store on an external hard drive, will eventually configure an actual backup server on another machine. It's 'definitely been a teaching moment for me
 
As an learning exercise for myself, which screenshots indicate that a ZFS item may still be present?
As you mentioned, the system boots up to login screen and the Proxmox was installed using ZFS option. That means the filesystem exists and it is present in some kind of state that allows it to boot itself.
Also your screenshot from "lsblk" lists 3 partitions - that's how ZFS/Proxmox usually partitions disk when you choose ZFS install option. That's like "when I see a human being with a beard on the face that means male" - of course that's not always true these days, but still that's what most assume and same with your ZFS case.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!