Setting up a fileserver VM on Proxmox - How to handle the data.

Maxwell · Apr 19, 2024

Dear Forum,

I'd like to setup a fileserver (plain Debian + SMB +git) on a Proxmox host and I'm wondering what is the recommended approach for doing this in a home lab / mini production environment. The host PC has two HDD RAID 1 Arrays (10 TB each) handled by a hardware RAID controller, which hold all the payload data. Proxmox and the VMs are stored on a separate NVM SSD.

I would like to keep the data separated from the VM as much as possible. The data is backed up elsewhere and I do not necessarily need snapshots of the data drives, however, it would be nice to be able to snapshot the fileserver to save its configuration as well as to do backups of the fileserver VM with PBS (again excluding data).

One option I've thought about is to pass through either the entire RAID controller or the disks to the fileserver VM. My assumption would be that if I remove the controller from the fileserver VM or delete the entire VM, the data would remain unharmed? Are there any other drawbacks except the loss in flexibility?

Another option would be to create Proxmox datastores (e.g. using LVM) on the disks and add those to the VM. However, following this approach, it seems to be quite simple to screw up the data / the disks' LVM configuration by incautiously deleting the fileserver's VM in Proxmox. In addition to that, I'm not hundert percent sure about the implications of using LVM / LVM thin data storages and how this approach scales with an increasing number of snapshots (which in my understanding would necessarily include all data drives as there seems to be be no way of disabling this in Proxmox).

Everything else, such as using ZFS, seems to be out of the picture due to the hardware RAID controller anyway.

I'd also try to avoid all non-standard approaches, which cannot be implemented using the GUI as much as possible.

Any advice on how to set this up properly would be greatly appreciated.

Thanks,

Maxwell

Dunuin · Apr 19, 2024

Maxwell said:
My assumption would be that if I remove the controller from the fileserver VM or delete the entire VM, the data would remain unharmed?

Correct.

Maxwell said:
(which in my understanding would necessarily include all data drives as there seems to be be no way of disabling this in Proxmox).

Its possible to exclude virtual disks from backups. Snapshots will snapshot everything.

Also keep in mind that neither a snapshot nor a raid1 will replace a proper backup. You should have proper backups (see 3-2-1 backup stragety) of those 20TB anyway.

Maxwell · Apr 20, 2024

Thanks for the reply... I have continued to investigate the idea of passing the RAID controller directly to the fileserver VM.

The server has several PCIe cards, namely above-mentioned RAID controller, an HBA (for tape) and a network card (i350). At least the RAID controller and the HBA need to be passed through to the fileserver.

When checking the IOMMU groups, it seems that all PCIe cards except for one are grouped into one IOMMU, while one PCIe card (currently the HBA) has a separate IOMMU group.

Here it says "For working PCI passthrough, you need a dedicated IOMMU group for all PCI devices you want to assign to a VM."

Is one IOMMU allowed to hold more than one device when passing at least one / all (?) device(s) of the group through to the VM?
Are the IOMMU groups a functional issue which must be properly set up for passthrough to work ok or are they "just" relevant in terms of security, i.e. to avoid the possibility to access memory from both the Host as well as from the VM?
Any chance to change the IOMMU groups except by swapping the cards in the PCIe slots (tried that...)? I am aware of the possibility to use the ACS kernel patch, but this seems to come with its own problems and risks: http://vfio.blogspot.com/2014/08/iommu-groups-inside-and-out.html

Any additional recommendation on how to proceed?

Thank you!

leesteken · Apr 20, 2024

The IOMMU groups together all devices that can communicate without it being noticed (or prevented) by the system memory controller/CPU. Since PCI(e) devices can read/write all of the allocated memory (DMA), this is therefore essential for security and proper isolation of VMs and the host. If you ignore the groups, a VM will in principle have access to all of the Proxmox host memory (and therefore also all other VMs).

You cannot share devices from the same IOMMU group between VMs and/or the Proxmox host (unless you break the security isolation). The Proxmox host will instantly lose all devices in the same group when you start a VM with passthrough of one or more devices from that group.

Sometimes the groups change with newer motherboard BIOS versions but the only other remedy is changing slots or using another motherboard. Everything except one PCIe slot (and one M.2 slot) in a single group is quite common for Ryzen motherboards, for example (except X570).

Maxwell · Apr 20, 2024

Thank you for the information. To the double-check: If I plug in all three cards (RAID, HBA, NIC) in the PCIe slots that make them appear in one single IOMMU group, I can pass all three cards through to the same VM? Is this considered secure?

leesteken · Apr 20, 2024

Maxwell said:
If I plug in all three cards (RAID, HBA, NIC) in the PCIe slots that make them appear in one single IOMMU group, I can pass all three cards through to the same VM? Is this considered secure?

Yes that's secure, as all devices that can secretly communicate around the IOMMU are in the same memory area (assuming no use of pcie_acs_override).
But I fear that there are also other devices (not counting PCI(e) bridges etc.) in that same group that you need on the Proxmox host and then the host will crash (or not be secure due to use of pcie_acs_override).

Maxwell · Apr 20, 2024

Ok, that makes sense. Here's what pvesh get /nodes/pmx/hardware/pci --pci-class-blacklist ""is giving me.

Code:

│ class    │ device │ id           │ iommugroup │ vendor │ device_name                                                             │
│ 0x010400 │ 0x028d │ 0000:01:00.0 │          1 │ 0x9005 │ Series 8 12G SAS/PCIe 3                                                 │
│ 0x010700 │ 0x0097 │ 0000:02:00.0 │          1 │ 0x1000 │ SAS3008 PCI-Express Fusion-MPT SAS-3                                    │
│ 0x020000 │ 0x1521 │ 0000:03:00.0 │          1 │ 0x8086 │ I350 Gigabit Network Connection                                         │
│ 0x020000 │ 0x1521 │ 0000:03:00.1 │          1 │ 0x8086 │ I350 Gigabit Network Connection                                         │
│ 0x060400 │ 0x1901 │ 0000:00:01.0 │          1 │ 0x8086 │ 6th-10th Gen Core Processor PCIe Controller (x16)                       │
│ 0x060400 │ 0x1905 │ 0000:00:01.1 │          1 │ 0x8086 │ Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x8)  │
│ 0x060400 │ 0x1909 │ 0000:00:01.2 │          1 │ 0x8086 │ Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x4)  │

Can I accept the PCIe controllers to be on the same IOMMU group or is this a problem?

Also, SeaBios(?) does not seem to be too happy about the controller's older firmware:

Is there a realistic change that updating the RAID controller's firmware will improve this?

Thanks!

leesteken · Apr 20, 2024

Maxwell said:

Code:

│ class    │ device │ id           │ iommugroup │ vendor │ device_name                                                             │
│ 0x010400 │ 0x028d │ 0000:01:00.0 │          1 │ 0x9005 │ Series 8 12G SAS/PCIe 3                                                 │
│ 0x010700 │ 0x0097 │ 0000:02:00.0 │          1 │ 0x1000 │ SAS3008 PCI-Express Fusion-MPT SAS-3                                    │
│ 0x020000 │ 0x1521 │ 0000:03:00.0 │          1 │ 0x8086 │ I350 Gigabit Network Connection                                         │
│ 0x020000 │ 0x1521 │ 0000:03:00.1 │          1 │ 0x8086 │ I350 Gigabit Network Connection                                         │
│ 0x060400 │ 0x1901 │ 0000:00:01.0 │          1 │ 0x8086 │ 6th-10th Gen Core Processor PCIe Controller (x16)                       │
│ 0x060400 │ 0x1905 │ 0000:00:01.1 │          1 │ 0x8086 │ Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x8)  │
│ 0x060400 │ 0x1909 │ 0000:00:01.2 │          1 │ 0x8086 │ Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x4)  │

Can I accept the PCIe controllers to be on the same IOMMU group or is this a problem?

I'm not sure what you are asking. If you pass one, two or three devices to a VM, the host loses all three devices (we can ignore the three PCIe Controllers). Only you can decide whether that is a problem. The VM will be fine (and securely isolated from the host) with one, two, three of the devices passed through.

Maxwell said:
Also, SeaBios(?) does not seem to be too happy about the controller's older firmware:

Maybe give try OVMF (UEFI) a try? I have no experience with this at all. Maybe someone else knows.

Maxwell · Apr 20, 2024

Thanks for the info. Things slowly start to make sense...

I've updated the RAID controller's firmware to the latest version available on the Adaptec website, but no success with SeaBios
Switching to OVMF (UEFI) indeed makes things working; now passthrough for all three cards works fine. Two important things are to notice here:
- In contrast to SeaBios, the controllers' Boot screens are never shown during the boot process of the VM
- The VMs take somewhat longer to start (20-30 seconds) compared to plain VMs without hardware passthrough.

What I'm still wondering about:

Is it correct that lspci on the Proxmox host still lists the cards which were passed through?
On the host PC (not the VM), for example the aacraid driver still seems to be loaded.

I'm NOT using the option iommu=pt, so according to the docs (Link) some DMA stuff is still handled by the hypervisor.

Thanks in advance for any additional insights!

leesteken · Apr 20, 2024

Maxwell said:
Thanks for the info. Things slowly start to make sense...

I've updated the RAID controller's firmware to the latest version available on the Adaptec website, but no success with SeaBios

Switching to OVMF (UEFI) indeed makes things working; now passthrough for all three cards works fine. Two important things are to notice here:

In contrast to SeaBios, the controllers' Boot screens are never shown during the boot process of the VM

The VMs take somewhat longer to start (20-30 seconds) compared to plain VMs without hardware passthrough.

Note that all the VM memory must be pinned into actual host RAM when doing passthrough (because of the DMA), which can be slow. Maybe the devices don't have UEFI BIOSes and that's why they don't show with OVMF. I hope that that's not a problem.

Maxwell said:
What I'm still wondering about:

Is it correct that lspci on the Proxmox host still lists the cards which were passed through?

Yes because they are still connected to the PCIe bus and part of the system. The actual driver in use (lspci -k) will be vfio-pci instead of the actual driver.

Maxwell said:
On the host PC (not the VM), for example the aacraid driver still seems to be loaded.

lspci -k will still show Kernel modules: aacraid but that only tells you which driver can be bound to the device. The driver might stay loaded and show in lsmod but that's fine. You could consider early binding the devices to vfio-pci so Proxmox never touches them (and thus does not load the driver because it does not need to bind it to the device): https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_host_device_passthrough (you might need some additional softdep's).

Maxwell said:
I'm NOT using the option iommu=pt, so according to the docs (Link) some DMA stuff is still handled by the hypervisor.

I find the Proxmox documentation about iommu=pt to be unclear and different from the Linux kernel parameter documentation. Everybody seems to think it is essential for PassThrough, while it only affects non-passed through devices and only set the default mapping to the identity. The passed through devices always to via the IOMMU (with a non-identity mapping) to ensure isolation.

Maxwell · Apr 21, 2024

Leesteken, thank you for all the insight. lspci -k indeed lists the vfio-pci driver - so that makes perfect sense.

All three cards are working fine now; however, at least on the HBA, thruput seems to be somewhat limited. For example, when reading from an LTO 6 tape, I only get 250 GByte/h rather than the expected > 500 GByte/h (which I also observed when doing this on bare metal.)

Any recommendations / ideas on how to approach this systematically? Are there any options in Proxmox worth trying out? I've already tested iommu=pt, but this seems to have any impact at all (which is kind of in line with the last post...).

Many thanks again!

Search

Search

Setting up a fileserver VM on Proxmox - How to handle the data.

Maxwell

New Member

Dunuin

Distinguished Member

Maxwell

New Member

leesteken

Distinguished Member

Maxwell

New Member

leesteken

Distinguished Member

Maxwell

New Member

leesteken

Distinguished Member

Maxwell

New Member

leesteken

Distinguished Member

Maxwell

New Member