[SOLVED] Disk Passthrough: Proper guide on how to passthrough disks to VMs

Apr 11, 2024
9
0
1
As per title, I've searched at length in the docs, forum and on the net and could not find a proper complete guide on how to correctly exposed a local host disk (say a nvme drive) to a VM. So I though about starting this post, hoping to define a guide for future use cases.

My use case: my VM needs to be able "natively" see the underlying disk as the VM itself needs to be able to manage the LVM on disk. This is just for a second disk - it doens't need to be the VM's OS boot disk. But I guess this might be a generic answer applicable to many.

Questions:
1. How to I correctly achieve this achieving correct PCIe speed and hardware performance - my drive is a NVMe considerably fast (~50k IOPS) and will need the storage media for a BD?
2. Should I use fx440 or q35 since it is PCIe?
3. Should I enable vfio and hardware passthorugh immou=pt? Which is the exact parameter for GRUB for AMD-Vi (amd immou)?
4. Should I blacklist the PCI device at host level so that it does not appear as a PVE disk / cannot be used by the host?

Thank you so much in advance your support.
 
1. How to I correctly achieve this achieving correct PCIe speed and hardware performance - my drive is a NVMe considerably fast (~50k IOPS) and will need the storage media for a BD?
You don't really have any influence on this, if you PCIe passthrough the device. What's a "BD"?
2. Should I use fx440 or q35 since it is PCIe?
It does not really matter, for the NVMe or passthrough but it may matter for other stuff.
3. Should I enable vfio and hardware passthorugh immou=pt? Which is the exact parameter for GRUB for AMD-Vi (amd immou)?
It makes no different as iommu=pt (which is not what you wrote) only influences devices that are NOT passed through.
4. Should I blacklist the PCI device at host level so that it does not appear as a PVE disk / cannot be used by the host?
Probably. it's best to not less Proxmox touch devices that are dedicated to VMs (even though passthrough breaks the principle of virtualization: hardware abstraction and independence). However, you cannot blacklist devices. You can only blacklist drivers and/or early bind devices to vfio-pci (which I think you meant).
 
Thank you so much for the blazing fast reply. And apologies for the typos - I meant:
BD ==> DB
immou ==> iommu

More in specific:
It makes no different as iommu=pt (which is not what you wrote) only influences devices that are NOT passed through.
What I meant here is if there is the need to enable passthrough in any way at bootloader or load any additional kernel module. Or if a vanilla pve installation is sufficient to perform disk passthrough with:

qm set <VM-ID> -scsi<n> /dev/disk/by-id/SOME-DISK-ID

vfio-pci (which I think you meant)
You are correct. This was the goal.

Could you please link a few reference or a some indication on how to achieve the full passthrough from a "out-of-the-box" cluster installation?

Thank you very much.
 
You can use LVM on a virtual disk just fine. Passthrough is not needed for that. Perhaps you should try it that way and see if it meets your performance goals before making things complicated, limiting your migration options, and forcing memory pre-allocation.

The only reason you would for sure need passthrough is if you want the VM to be able to use SMART on the drive.

ETA: Both Red Hat and Ubuntu install on LVM by default, even in a VM.
 
Last edited:
What I meant here is if there is the need to enable passthrough in any way at bootloader or load any additional kernel module. Or if a vanilla pve installation is sufficient to perform disk passthrough with:

qm set <VM-ID> -scsi<n> /dev/disk/by-id/SOME-DISK-ID
For disk passthrough (Wiki), you don't need to enable IOMMU (which is not at all what iommu=pt does) which is needed for PCIe passthrough (manual).
You are correct. This was the goal.

Could you please link a few reference or a some indication on how to achieve the full passthrough from a "out-of-the-box" cluster installation?
What do you mean by "full passthrough"? Maybe check the manual and the wiki (for which I provided a link). What you are asking has been asked many times before and various post on this forum will provide more details, peoples experiences and links to various resources.
 
I mean, this should be the doc related to disk PT:
https://pve.proxmox.com/wiki/Passthrough_Physical_Disk_to_Virtual_Machine_(VM)

Is it really all there is to it?
That is indeed the link I provides for you.
Suppose you have a two disk like:
Code:
/dev/disk/by-id/nvme-Dell0DM001-1CH166_Z1F41BLC
/dev/disk/by-id/nvme-Dell0DM001-1CH166_Z1F41BLC_1

Both pointing to the same /dev/nvme1n1, Wwhich one should you choose?
It depends if you want to pass the partition or the whole drive. You decide!
 
Thank you very much. It is much clearer now.

I agree with you, it would be better to manage this via LVM. Indeed my goal is to let the VM OS see the vg inside the physical group. This is needed as I need OpenEBS to directly create kubernetes volumes on that LVM.

I wonder now if this can be achieved without passthrough?
 
You are right. Let's just say I might have been sidetracked worrying too much on the performance cap of virtio or even the (even if remote) idea of being able to do a simple disaster recovery by removing the drive, attaching it to another linux distro and directly inspecting the device. But there are better ways to do disaster recovery of course (and this is not it).

One possible real reason to do otherwise: the VM os might be something like Talos linux, which is ssh-less and where I won't be able to connect to to prepare pv and vg. In this case, am I right to assume I would need to connect the disk to another VM with shell and lvm2 to prepare the partition and then move it back to the Talos VM?

Thanks again!
 
One possible real reason to do otherwise: the VM os might be something like Talos linux, which is ssh-less and where I won't be able to connect to to prepare pv and vg. In this case, am I right to assume I would need to connect the disk to another VM with shell and lvm2 to prepare the partition and then move it back to the Talos VM?
I am not familiar with Talos Linux, but I would think it still has console access even if ssh is absent. If so, you should be able to use the NoVNC console from the PVE menu instead of SSH.
 
Reasons to do pass-through (disk or PCI):
  • You really need the extra percent of performance and have verified this by testing.
  • Your virtualized NAS wants to use SMART so you have to do PCI passthrough to enable that.
  • The entire disk is dedicated to one VM anyway.
 
Reasons to do pass-through (disk or PCI):
  • You really need the extra percent of performance and have verified this by testing.
  • Your virtualized NAS wants to use SMART so you have to do PCI passthrough to enable that.
  • The entire disk is dedicated to one VM anyway.
Thanks again. Right now I only check the last box.. thus I'll try and use the virtualize approach.

FYI: Talos does not have a limited console that just report status. It is a new approach to the cloud native way. But I might end up getting rid of it and move to a more manageable distro.

Best,
A
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!