Editing storage.cfg Disaster

inbox25 · Apr 19, 2024

Hi everyone,

I have been using Proxmox to create new servers at home and am fairly new to the software and the concept of servers in general.

I created a main Proxmox node with a couple of VMS and 4 or 5 LXCs and it is installed on a single NVME SSD with zfs (defaults). I also created a cluster with this machine as the main node.

I have a second empty Proxmox node installed initially on a single sata SSD with an LVM setup (defaults). Today I tried to join this second node to the cluster with the first node.

Whilst the joining process was a success, I found that the there was some kind of 'question mark' icon in the 'drive' section of the second node. I was doing some reading online and then stupidly decided to edit the storage.cfg file. Additionally, out of panic and stupidity, I may have also deleted some disks from one of the Proxmox nodes, although I cannot remember which one specifically.

During this editing process I made some changes which have resulted in me seeing the following message when I try to access the shell of the first node:

Reading this leads me to believe that I may have accidentally deleted/cleared the drive containing the root partition of the first Proxmox node. I cannot SSH into this Proxmox host either. The only thing is that I can still see the webGUI of both nodes.

The LXCs and VMs which are running on the first Proxmox node are all still functioning and I can still access the services being hosted on them. The only issue is that when I shut down an LXC or VM, I cannot start it back up, it shows me the error:

Code:

TASK ERROR: zfs error: cannot open 'rpool/subvol-111-disk-0': dataset does not exist

When I try to migrate VM/LXCs to the second node, it will say the follow (in addition to the previous error message):

Code:

ERROR: migration aborted (duration 00:00:03): storage migration for 'local-zfs:subvol-105-disk-0' to storage 'local-zfs' failed - zfs error: For further help on a command or topic, run: zfs help [<topic>

I have been playing around in order to try to 'guess' what the original storage.cfg contained, as it stands, the storage.cfg sits at this:

Given that I cannot access the terminal of the first Proxmox node via the Proxmox web console and cannot SSH into it (perhaps due to not enabling SSH directly into the root account of the Proxmox node, or perhaps by actually deleting the root contents on the Proxmox installation drive), is there a way to rescue this? Does the aforementioned information tell me that I actually accidentally deleted the root contents of the first Proxmox install or are these errors purely caused by an incorrect storage.cfg file? If the former is true, am I doomed to wiping this node and restarting from scratch? And if the latter is the case, how can I go about reverse engineering the right storage.cfg file from my current predicament?

Much thanks to anyone reading this

bomzh · Apr 19, 2024

For the Proxmox node where you lost access - you can try to boot from Linux live CD with ZFS support - I think Ubuntu ISO images support ZFS out of the box. After that, check if your ZFS pool exists and data is present (zpool list, zfs list, etc). If it is there, then next steps is to backup/move it somewhere to a safe place and reinstall Proxmox on the node.

inbox25 · Apr 19, 2024

bomzh said:
For the Proxmox node where you lost access - you can try to boot from Linux live CD with ZFS support - I think Ubuntu ISO images support ZFS out of the box. After that, check if your ZFS pool exists and data is present (zpool list, zfs list, etc). If it is there, then next steps is to backup/move it somewhere to a safe place and reinstall Proxmox on the node.

Okay great, thank you . I will try this tonight and report back

Just to clarify, will this involve some kind of "chroot"? and also mounting the old hard drives to the Ubuntu Live USB in order to be able to use the zpool list commands?

inbox25 · Apr 19, 2024

Looks like I'll be doing a brand new install. I booted from an Ubuntu live USB and entered the following commands which returned the following outputs

Seems like the disk I installed the ZFS Proxmox install, plus the ISO and images disk are completely wiped. Weirdly enough I can still boot into Proxmox with a monitor connected to the Proxmox box and it even makes it to the login screen, but when I enter my username and password, it returns me back to the login screen. Additionally, it displays an incorrect IP address for Proxmox itself on this computer

1. lsblk

2. mount /dev/nvme0n1p3 /mnt

Code:

 mount: /mnt: unknown filesystem type 'zfs_member'

3. zpool list

Code:

 no pools available

4. zfs list

Code:

 no datasets available

5. gparted

6. df -h /dev/nvme0n1

Code:

 Filesystem    Size    Used Avail Use% Mounted on
udev        16G       0   16G   0% /dev

dagservice · Apr 19, 2024

Try

Code:

zfs import

to list any ZFS filesystems that can be detected. Then

Code:

zfs import filesystemname

to import them (make them detectable / often also mount them)

bomzh · Apr 19, 2024

According to your screenshot it looks like ZFS might be still present.
Try running this command from LiveCD: zpool import
and after that check if any ZFS pool / dataset exists: zpool list && zfs list

dagservice · Apr 19, 2024

bomzh said:
According to your screenshot it looks like ZFS might be still present.
Try running this command from LiveCD: zpool import
and after that check if any ZFS pool / dataset exists: zpool list && zfs list

You are correct, it's zpool import, not zfs import.

Kingneutron · Apr 20, 2024

OP, trying to be kind here but take this as your cue to implement a regular backup regimen. Make a backup before making ANY system changes, including package upgrades. You should always have something to restore from.

Hopefully this will help you in the future:

https://github.com/kneutron/ansitest/blob/master/sysadmin-bkp-edit.sh

inbox25 · Apr 24, 2024

bomzh said:
According to your screenshot it looks like ZFS might be still present.
Try running this command from LiveCD: zpool import
and after that check if any ZFS pool / dataset exists: zpool list && zfs list

Hi bomzh,
As an learning exercise for myself, which screenshots indicate that a ZFS item may still be present?

Kingneutron said:
OP, trying to be kind here but take this as your cue to implement a regular backup regimen. Make a backup before making ANY system changes, including package upgrades. You should always have something to restore from.

Hopefully this will help you in the future:

https://github.com/kneutron/ansitest/blob/master/sysadmin-bkp-edit.sh

Yes, unfortunately I spent the last few days blowing away the old Proxmox install and building up the new one. I am still quite new to Proxmox so am still learning this aspect, currently have some LXC and VM dumps which I am going to temporarily store on an external hard drive, will eventually configure an actual backup server on another machine. It's 'definitely been a teaching moment for me

bomzh · Apr 24, 2024

inbox25 said:
As an learning exercise for myself, which screenshots indicate that a ZFS item may still be present?

As you mentioned, the system boots up to login screen and the Proxmox was installed using ZFS option. That means the filesystem exists and it is present in some kind of state that allows it to boot itself.
Also your screenshot from "lsblk" lists 3 partitions - that's how ZFS/Proxmox usually partitions disk when you choose ZFS install option. That's like "when I see a human being with a beard on the face that means male" - of course that's not always true these days, but still that's what most assume and same with your ZFS case.

Search

Search

Editing storage.cfg Disaster

inbox25

Member

bomzh

Member

inbox25

Member

inbox25

Member

dagservice

New Member

bomzh

Member

dagservice

New Member

Kingneutron

Active Member

inbox25

Member

bomzh

Member