f*ed up VM storage setup - need help recovering VM disk ZFS dataset

dakazze

New Member
Nov 25, 2022
29
0
1
First off, this is a super embarrassing fuckup that happened because I rushed setting up my backup server...

Creating some VMs I did not check the storage location which defaulted to a NAS NFS share. The NAS is still there and the NFS share is working but after rebooting the NAS while the VMs were running I cant start these VMs because "TASK ERROR: volume 'NAS:103/vm-103-disk-0.qcow2' does not exist".
One of these VMs involves a lot of work to setup again so I would really appreciate if there was a way to get it back up so I can move the disks where they belong.

There have been no major changes to the NAS dataset but rebooting the host did not help.

Thats the mountpoint set in the vm.conf:
Code:
scsi0: NAS:103/vm-103-disk-0.qcow2,iothread=1,size=32G


Code:
root@proxBu:/mnt# zfs list
NAME                           USED  AVAIL  REFER  MOUNTPOINT
rpool                          101G   728G    25K  /rpool
rpool/ROOT                    5.65G   728G    24K  /rpool/ROOT
rpool/ROOT/pve-1              5.65G   728G  5.65G  /
rpool/data                    57.1G   728G    25K  /rpool/data
rpool/data/subvol-100-disk-1   840M  99.2G   840M  /rpool/data/subvol-100-disk-1
rpool/data/subvol-104-disk-0  12.0M  3.99G  11.8M  /rpool/data/subvol-104-disk-0
rpool/data/vm-101-disk-0      35.5K   728G  35.5K  -
rpool/data/vm-101-disk-1      11.0G   728G  11.0G  -
rpool/data/vm-102-disk-0      81.5K   728G  44.5K  -
rpool/data/vm-103-disk-0      26.7G   728G  26.7G  -
rpool/data/vm-105-disk-0      10.7G   728G  10.2G  -
rpool/data/vm-105-state-uh    2.22G   728G  2.22G  -
rpool/data/vm-106-disk-0      5.66G   728G  5.66G  -
rpool/var-lib-vz              38.3G   728G  38.3G  /var/lib/vz

I am a noob when it comes to ZFS so my first thought was to use zfs set mountpoint for vm-103-disk which "does not apply to datasets of this type"

Any ideas?
 
If I understood you probably have a locked nfs mount because of nas reboot. Did you reboot the host?
Sorry, sometimes I have a hard time expressing myself in english.

The short Version:

1. Created VM and mistakenly set storage to NAS (different machine) which is a NFS share
2. I did not notice this mistake and everything worked fine for 2-3 weeks
3. I rebooted the NAS while the VM was still up because I did not know yet
4. Ever since I cant start the VM because 'NAS:103/vm-103-disk-0.qcow2' does not exist"
5. rebooted the host --> NFS share is accessible, path is the same, got R/W access
6. VM still cant boot because it cant find its boot disk
7. frustration / no idea what to do / angry at self for retarded mistake
 
1. Created VM and mistakenly set storage to NAS (different machine) which is a NFS share
2. I did not notice this mistake and everything worked fine for 2-3 weeks
That's not a bad mistake as we have all vm's and lxc's on nas also and works great.
3. I rebooted the NAS while the VM was still up because I did not know yet
That's ok as the kernel slow down (until 0) the app which is from kernel side the vm as it cannot temp. write I/O data to unavailable nfs-server. When nfs mount valid again outstanding data is written and vm get cpu slices (for generating new I/O) again.
4. Ever since I cant start the VM because 'NAS:103/vm-103-disk-0.qcow2' does not exist"
5. rebooted the host --> NFS share is accessible, path is the same, got R/W access
6. VM still cant boot because it cant find its boot disk
Problem on nfs server !! Maybe mostly (?) related to your zfs setup there.
7. frustration / no idea what to do / angry at self for retarded mistake
I understand you but still there's no mistake to vm storage definition.
 
Thanks for taking the time to reply, I appreciate the help!

That's not a bad mistake as we have all vm's and lxc's on nas also and works great.
Yea I know, but it was still a mistake which led to the current issue.

Problem on nfs server !! Maybe mostly (?) related to your zfs setup there.
Any idea how I might be able to find and fix the cause?

There have been no changes to the dataset, the share or the permissions. The share is accessible from the host and it has full access. Still I can neither start the VM nor move the disk because: 'NAS:103/vm-103-disk-0.qcow2' does not exist

Sadly I dont even have an idea where to start doing research -.-

The NFS target is a ZFS dataset managed by TrueNAS on a different machine. There are no obvious issues on that side and zfs list does not show anything that is not mounted. Is it possible that the disk was deleted during scrub since the mountpoint did not exist and the volume might have gotten damaged due to the unexpected shutdown? (yes I am a noob....)
 
The NFS target is a ZFS dataset managed by TrueNAS on a different machine. There are no obvious issues on that side and zfs list does not show anything that is not mounted.
root@proxBu:/mnt# zfs list
rpool/data/vm-103-disk-0 26.7G 728G 26.7G -
So "proxBu" is your truenas right ?
What happen if you do on proxBu "dd if=/rpool/data/vm-103-disk-0 of=/dev/null bs=1024k" (which takes a while)
and "ls -l /dev/zvol/rpool/data/vm-103-disk-0" ?
 
Last edited:
I'm still wondering what you are doing ... the zfs list show a pve installation ... and you said you have a truenas vm which itself has than passthrough disk. The nas export shares (over nfe or smb protocol) or block volumes (over iscsi) to clients. In case of a share the client (pve) setup eg. qemu files (as written 'NAS:103/vm-103-disk-0.qcow2') for vm and if it uses iscsi it get a block device (where the client even COULD setup a filesystem of itself and mount that if wished to but probably not in this case) and setup raw files for a vm.
You are talking about mounting but zfs list shows zvols on pve host ...
 
Again, thank you for taking the time here and I am sorry that my lacking knowledge of these topics complicates things... Usually it is just enough to set up stuff and then keep it going but with issues like this here I am completely lost -.-

So "proxBu" is your truenas right ?
What happen if you do on proxBu "dd if=/rpool/data/vm-103-disk-0 of=/dev/null bs=1024k" (which takes a while)
and "ls -l /dev/zvol/rpool/data/vm-103-disk-0" ?
proxBu is my PVE install on the backup server. This is the PVE that cant start the VM because of the missing disk.


The /dev/zvol/rpool/data/vm-103-disk-0 on proxBu is a second volume, which I corretly pointed to zfs-local instead of the NAS when setting up the VM, so this is not our missing boot drive.

from the 103.conf
Code:
scsi0: NAS:103/vm-103-disk-0.qcow2,iothread=1,size=32G
scsi1: local-zfs:vm-103-disk-0,iothread=1,size=64G,ssd=1

I just ran a search for "vm-103" on the TrueNAS that should have the disk but the search didnt return anything.


How about checking that disk first:
"no such file or folder" I got no rpool under /dev and /rpool/data/ on this machine only has subvol-100 and subvol-104, nothing for vm-103.



I'm still wondering what you are doing ... the zfs list show a pve installation ... and you said you have a truenas vm which itself has than passthrough disk. The nas export shares (over nfe or smb protocol) or block volumes (over iscsi) to clients. In case of a share the client (pve) setup eg. qemu files (as written 'NAS:103/vm-103-disk-0.qcow2') for vm and if it uses iscsi it get a block device (where the client even COULD setup a filesystem of itself and mount that if wished to but probably not in this case) and setup raw files for a vm.
You are talking about mounting but zfs list shows zvols on pve host ...

Yea the zfs - list is from the PVE install that has the VM which is missing its boot disk. As I said, I did not intend to set the storage target (when initially setting up the VM) to NAS but I guess it defaulted to this location and I did not check before finalizing, which is why it ended up like that. There was no manual setup involved, just a simple VM setup via PVE GUI.
I know my english skills might further complicate things at times and I am sorry for that!
 
On your nas (truenas with zfs) don't have snapshots for going back ?
Oh I do have several snapshots, which go back two weeks... just not of that dataset :rolleyes:
This dataset was meant for VM ISOs and CT templates alone so I have no BU tasks for it....
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!