Drive Errors on first VM creation with ZFS,BTRFS, and EXT4

mreyes

New Member
Apr 16, 2022
4
0
1
I have built my first proxmox home server. I have proxmox installed on 2x 120GB ssd, and I have 2x 1TB sdd as my primary storage for VM and Containers.
When I create a new vm, the vm fails when booted up with the errors shown below. I have tried reinstalling proxmox and redoing the vm, but the same error occurs. I also tried using BTFRS as a last resort but that had its own errors, and I would rather use ZFS anyways. I have tried swapping between 120gb and 1TB as the boot drive for proxmox + local storage drive, but the same error shown on the screenshot occurs.

Motherboard: B550M AORUS PRO-P
CPU: AMD Ryzen 7 5700G
SSD: 2x 120GB Kingston
2x 1TB Crucial

I am new to both proxmox and ZFS, so I may be missing something obvious.

It does seem from the logs that when the VM is initialized that all devices get brought down, usb keyboard, my ethernet connection, etc. I am not passing those devices into the VM. The VM doesnt have any network interfaces as I am doing PCI passthrough of my intel quad gigabit card into the vm. PXL_20220415_233030840 (1) (1) (1).jpg
 
Last edited:
Looks like your SATA controller (to which the SSDs are connected) is in the same IOMMU group as the Intel network controller you are passing through to the VM.
Devices in the same IOMMU group cannot be shared between different VMs or between VMs and the host (for security/isolation reasons). Therefore, the Proxmox host loses all devices in the same group as the Intel network controller and becomes unresponsive because it no longer has disks or a network connection.
On the Ryzen platform (except when using a X570 chipset), only the PCIe lanes connected to the CPU are in separate IOMMU groups. All other devices and PCIe/M.2 slots are part of one big IOMMU group and are connected via the motherboard chipset. Looking at your motherboard, you need to put the Intel network controller into the x16 PCIe slot closest to the CPU, or get a converter for the M.2 slot (which has 4 PCIe lanes) closest to the CPU.
You can get a nice overview of your IOMMU groups using this command: for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done.
 
Looks like your SATA controller (to which the SSDs are connected) is in the same IOMMU group as the Intel network controller you are passing through to the VM.
Devices in the same IOMMU group cannot be shared between different VMs or between VMs and the host (for security/isolation reasons). Therefore, the Proxmox host loses all devices in the same group as the Intel network controller and becomes unresponsive because it no longer has disks or a network connection.
On the Ryzen platform (except when using a X570 chipset), only the PCIe lanes connected to the CPU are in separate IOMMU groups. All other devices and PCIe/M.2 slots are part of one big IOMMU group and are connected via the motherboard chipset. Looking at your motherboard, you need to put the Intel network controller into the x16 PCIe slot closest to the CPU, or get a converter for the M.2 slot (which has 4 PCIe lanes) closest to the CPU.
You can get a nice overview of your IOMMU groups using this command: for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done.

Thank you for your response! I did not know about the SATA ports being in the same group as the network card.

However, I think i would encounter a similar but different error. In my other x16 slot I already have an lsi hba card that is connected to my 8 HDD, that I planned to pass through to a truenass vm. If I switch the network card and the LSI HBA card, would the issue merely be postponed, and I wouldnt be able to create a truenass vm, as I couldn't pci pass through the card without disconnecting the boot drives.
 
PCI(e) devices can read/write all memory at any time and communicate with each other via the PCI(e) bus without the CPU knowing about it. That's why devices that are not guaranteed to be isolated are put into the same IOMMU group (or because the BIOS/chipset makers don't really care). This is also the reason that ballooning can't work with PCI(e) passthrough.

Ryzen only has 20 PCIe lanes for passthrough (12 for the APUs because of the integrated graphics), and your motherboard is also limited with only one PCIe slot connected to the CPU. You could try a converter for the M.2 slot to PCIe x4. Or switch to a X570 motherboard. Or see if the motherboard supports bifurcation of the x16 slot into 4x4 with riser cables. Or use the aforementioned M.2 slot as the boot drive instead of SATA drives (but you will also lose at least a USB controller and on-board network).

Or you can "break" all groups and multi-function devices by adding pcie_acs_override=downstream,multifunction (patched are built into Proxmox) to the kernel parameters. Note that this also breaks security and isolation between VMs and the Proxmox host.
 
PCI(e) devices can read/write all memory at any time and communicate with each other via the PCI(e) bus without the CPU knowing about it. That's why devices that are not guaranteed to be isolated are put into the same IOMMU group (or because the BIOS/chipset makers don't really care). This is also the reason that ballooning can't work with PCI(e) passthrough.

Ryzen only has 20 PCIe lanes for passthrough (12 for the APUs because of the integrated graphics), and your motherboard is also limited with only one PCIe slot connected to the CPU. You could try a converter for the M.2 slot to PCIe x4. Or switch to a X570 motherboard. Or see if the motherboard supports bifurcation of the x16 slot into 4x4 with riser cables. Or use the aforementioned M.2 slot as the boot drive instead of SATA drives (but you will also lose at least a USB controller and on-board network).

Or you can "break" all groups and multi-function devices by adding pcie_acs_override=downstream,multifunction (patched are built into Proxmox) to the kernel parameters. Note that this also breaks security and isolation between VMs and the Proxmox host.

I see, thanks for the explanation. I looked more into it and even found other forum posts with similar concepts where you posted. I think I will first try to break the IOMU groups with pcie_acs_override=downstream,multifunction. I know it is unsecure so wanted to someone else opinion regarding proper security standards. I am not exposing this server outside of my network other than through a single port for openvpn. So only access point is through openvpn into my pfsense vm that I am trying to setup. Would the best course of action be to switch the intel gigabyte to the top pcie express slot as first suggested, then utilize pcie_acs_override=downstream,multifunction to break the second pcie slot with the LSI HBA card into its owm IOMMU group so it can be passed into a truenas vm? Or would that still affect the top pcie slot?

The most secure method I guess would to switch the pcie slots so that the intel gigabyte card can be passed into the pfsense vm, and then forget about having a truenass vm with passthrough HBA card. Just create a ZFS raidz2 pool with my 8 HDD in proxmox directly and stand up a container to make it a network drive.
 
If you break the groups, the host (which still has some devices from the group) and the VM (which has also a device from the group) are not longer securely isolated. I would prefer to isolate the network devices more than the storage devices passed to a trusted VM.
A VM with passthrough (which makes ballooning impossible) that uses ZFS (on the inside) needs a lot of memory. I would go for a container that provides a local network share for a ZFS pool or subvol on the host. Then the host and the container can share the ARC, which I think is a better use of your memory.
What kind of VM gets the quad Intel network card? You could run OpenWrt in a container as well, but if you want to protect the host kernel, you'll need a VM indeed.
 
If you break the groups, the host (which still has some devices from the group) and the VM (which has also a device from the group) are not longer securely isolated. I would prefer to isolate the network devices more than the storage devices passed to a trusted VM.
A VM with passthrough (which makes ballooning impossible) that uses ZFS (on the inside) needs a lot of memory. I would go for a container that provides a local network share for a ZFS pool or subvol on the host. Then the host and the container can share the ARC, which I think is a better use of your memory.
What kind of VM gets the quad Intel network card? You could run OpenWrt in a container as well, but if you want to protect the host kernel, you'll need a VM indeed.
The pfsense VM is getting the quad intel network card. Want to have pfsense, with pfblocker and openvpn in thatvm. I will forget breaking the IOMMU groups and just do a local zfs pool with a container running a file sharing server.

I swapped the pcie cards and now the pfsense vm is working. I am having issue getting the network to work and get internet to work, but that is a separate issue I can continue troubleshooting myself.

Thank you so much for your help! I greatly appreciate the help as I was lost and focusing on the wrong thing (file system) instead of iommu groups.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!