Host has unwanted access to ZFS boot-pool of TrueNAS guest

wonkyponky

New Member
Nov 8, 2023
19
4
3
Hi there,
I created a TrueNAS scale guest using a passthrough 9300-8i HBA which works fine concerning the storage disks on the controller.

But what is weird - in the host with
Code:
zpool import
 pool: boot-pool
     id: *****************767
  state: ONLINE
status: One or more devices are configured to use a non-native block size.
        Expect reduced performance.
 action: The pool can be imported using its name or numeric identifier.
 config:

        boot-pool   ONLINE
          *****p3   ONLINE
I have access to the system disk/boot pool of this VM, which in consequence should be importable by the host .. while the disk is not mounted, of course.

I run PVE 8.1.4 and I have my VM storage on a local ZFS volume, managed by the host. This pool is perfectly fine.

But the whole thing makes me nervous - while researching I came across stories of Proxmox deliberately importing pools when it shouldn't have to .. I tended to blame non-blacklisted HBA drivers for that, which is not the case here ..

One could disable the zfs-pool import service or so but in the first place I am wondering why the boot-pool is visible to the host AT ALL!?

Anybody having an idea?
Thanks!
 
I tested a bit more and created another raw virtio-scsi disk on the ZFS VM storage. This I used to create a test-pool in my TrueNAS VM. This pool also can be seen from within the host ..

As had been suspected somewhere else, it has something to do with that all ZFS disks / partitions appear as devices like. So every Virtual Disk I create on the ZFS volume is being listed ..
Code:
# lsblk
zd0                  230:0    0    50G  0 disk
└─zd0p1              230:1    0    50G  0 part
zd16                 230:16   0    75G  0 disk
├─zd16p1             230:17   0   100M  0 part
├─zd16p2             230:18   0    16M  0 part
├─zd16p3             230:19   0  71.8G  0 part
└─zd16p4             230:20   0   3.1G  0 part
zd32                 230:32   0    50G  0 disk
└─zd32p1             230:33   0    49G  0 part
zd48                 230:48   0   8.5G  0 disk


From what I read here it seems a rather common problem that ZFS pools get imported "automatically" - probably by the service zfs-import-scan.service - which I just could disable to prevent this.

At the same time I don't really get there is no conceptual clear-cut separation between partitions and therefore pools that belong to VMs and the "system" pools that the PVE host works with.

Anybody who can shed some light on this? I don't want to end up with a corrupted pool because it got imported by accident. PVE doesn't even see the boot-pool as being in use by another system. My test-pool though is being statussed as
Code:
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
 
  • Like
Reactions: Kingneutron
just to let you know:
I set up a different storage on another controller.
Ext4 with LVM thin. These VM Volumes (ZFS) are isolated from the Host OS ..
 
Hi,

I have had some experience here with your issue. So, in short, I have found that by passing vfio.ids=<sas hba> on my kernel cmdline does effectively (almost!) resolve this issue from happening. It can also be done from inside the modules list, i.e. the same file as the sample vfio.conf configuration that is often used for pcie-passthrough of GPU in order to prevent kernel drivers from being loaded at all during boot. This requires one to maintain an up-to-date initrd and I have found that this is indeed error prone for me and thus I settled with the cmdline boot args method. Perhaps I should also add it to my vfio.conf in order to prevent me from forgetting to pass it during boot?

In any case, both methods are not 100% full proof for you must remember to **always** add it to your boot flags and/or use a specific initrd image. I have already had one particular nasty event where I had to manually rebuild my kernel and boot environment. It took me several iterations to get this right and I did notice the side effect of the wrong pools being accessible to the hypervisor! In this situation, I was one step away from making a criticial mistake and blowing away the very pool that I was trying to recover from!

I ended up spending several days more than I should have until realizing another critical side effect of the same issue. Even if the pools remain non-imported, if you try to specify a specific pool when two or more pools with the same ZFS **GUID** are available on the host hypervisor, it becomes impossible to guarrantee what pool is selected until you either fix the GUID or you rename the pool. In my case, I had little choice but to recover from the host and thus required me to have Proxmox be able to see, but not import the pools from my HBA. When not in emergency recovery, I use PCI-e passthrough for this HBA on a VM dedicated to NAS functions.

Sorry, I didn't mean to add more reasons for why your issue is so grave, but indeed you are not just senselessly worrying about a non-issue! I didn't worry about it enough up front and it almost cost me a lot more work than otherwise.

Funny bit is, so I lucked out on discovering this vfio.ids method early on as I was merely searching for a way to shave off ~20..25 seconds from every boot of the hypervisor by not initializing the HBA twice like it was without the addition. It didn't occur to me until an almost totally unrelated disaster crept up on me and forced me to carefully analyze the environment in which my server operates under.

So, I guess the morale of my story is,

a) always ensure that you randomize ZFS GUIDs on pools that you have copied by block (`dd`) and intend on restoring.

b) always have an up-to-date copy of your boot flags for your environment and be damn sure to understand the side effects of each one as well as you can!

c) AFAIK, the two methods I described in the beginning are the two best methods of achieving this, when one wishes not to modify systemd units, default config or whatever as you proposed. I pray that I am wrong on this and somebody else can prove me wrong!!

d) I wish to add that so far, the `vfio` method seems to be the best solution to date for me, personally. I realize that not all possible setups will be able to get by with doing it like so, but if you haven't explicitly tried it yet, I do strongly suggest doing so if at all possible.

Lastly, I wanted to say that historically, I have always made sure that mdadm, hardware raid etc are disabled early on, such as when using System Rescue Disc. This has sometimes meant even disabling said hw from BIOS, albeit this doesn't always work (please don't get me started!)


Cheers,
Jeff
 
Last edited:
Hi Jeff,
thanks for your comments!
I also use blacklisting plus virtual driver assignment method (options vfio-pci ids=XXXX:XXXX )in the modprobe config files.

Perhaps you could clarify for me: what do you mean by "maintaining an up-to-date initrd"? Is it like for ex. when updating it will be replaced with a "mint" version, not integrating the options we set in the .confs? This would affect all users working with blacklisting though ..

Cheers
 
Hiya,

I mean by keeping your initrd in sync with any local configuration modifications you might have in place, such as...

I place the `vfio.ids=` bit at `/etc/modules-load.d/vfio.conf` and in my kernel boot flags at `/etc/kernel/cmdline`. I did this with the thinking that it would aid me in remembering to **always** have it initialize the hardware in a specific order (it mattered in my case). Bad things can result if my host begins to see the SAS HBA as usable hardware and tries to claim it beforehand having a chance to boot the rightful owner of the VM.

This is in direct opposite to the normal `vfio.ids` string that I use **only** in the kernel boot flags in order to guarantee that video subsystem loads in a particular order. This is on an entirely different box.

Huh? Meh, I'm in a mood right now... but I shall leave you with this. Cheers
 
Hi Jeff,
thanks for your comments!
I also use blacklisting plus virtual driver assignment method (options vfio-pci ids=XXXX:XXXX )in the modprobe config files.

Perhaps you could clarify for me: what do you mean by "maintaining an up-to-date initrd"? Is it like for ex. when updating it will be replaced with a "mint" version, not integrating the options we set in the .confs? This would affect all users working with blacklisting though ..

Cheers
Sorry, I know I already replied to this the other day, but I felt like my response was rather poor. I will do my best to keep this reply short and to the point -- something I am not exactly well known for! :-) -- by *maintaining an up-to-date initrd* I mean exactly what you thought.

We must not allow a "mint" version of the configuration to override our modifications that prevent the host from ever seeing whatever hardware we are concerned with. As I try to keep my changes in a different file, this helps to prevent the accidental overwrite in that sense. I suppose I am concerned about forgetting, perhaps such as needing to do a fresh install followed by restoration of particular files from a backup?

Honestly, after writing this message and seeing how difficult it was on my mind to produce an answer, leads me to think that I am overthinking this whole thing! Or perhaps my worry is a bit too extreme. I am not sure, but I do know that I am OCD at times, especially when it comes to these sorts of things :-)

Cheers
 
Hi, thanks for your answer(s) ;)
Sure it is all about avoiding that the host takes over the HBA. But it seems to me it should be enough to remember/document where I made those changes and for ex. after an update check if these custom configs are still there. If that's the case, then the initrd and kernel should reflect those.
To be safe last time I doublechecked the driver being in use by the device ..
 
Hi, thanks for your answer(s) ;)
Sure it is all about avoiding that the host takes over the HBA. But it seems to me it should be enough to remember/document where I made those changes and for ex. after an update check if these custom configs are still there. If that's the case, then the initrd and kernel should reflect those.
To be safe last time I doublechecked the driver being in use by the device ..
Yes, I do feel confident enough that by the comments that I have added to the files in question should be enough. If that isn't enough, I also have documents I track that document the general overview of each node, including boot flags, what they do and so forth. I always expect to have, at minimum, a backup of the node's `/etc` dir and I sincerely doubt that I would forget something like the kernel modules locations. I have been doing this for far too long! :]

So far, I have yet to have this backfire on me and I have already been through a couple disaster recovery scenarios, each with their own unique challenges. I feel as though what I have setup is stressed tested by now. I feel confident that I could restore an entire node from backups and snapshots should the need ever arise.

Thanks,
Jeff