Disks only detected when passed through to a VM, not on the host (using a HBA)

TheDragon

New Member
Jan 20, 2023
7
1
3
I'm currently using proxmox-ve: 7.4-1 (running kernel: 6.2.11-2-pve) I haven't upgraded to PVE 8 yet, as I'm aware of another thread describing issues with HBAs.


lsblk only shows the SSDs that are directly connected to my motherboard.



This is the output of lspci -nnk

0000:01:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] [1000:0072] (rev 03)
Subsystem: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] [1000:3020]
Kernel driver in use: vfio-pci
Kernel modules: mpt3sas


My problem is the HBA is detected by the PVE host, but the attached disks are not, however if I pass the HBA through to a VM the disks are detected.

Could anyone offer any suggestions for how to debug this?
 
My problem is the HBA is detected by the PVE host, but the attached disks are not, however if I pass the HBA through to a VM the disks are detected.

Most likely because the controller already uses / is bound to the driver for PCI(e)-passthrough:
Kernel driver in use: vfio-pci
Check, if the PVE-host sees the disks, when the controller uses its "normal" driver (mpt3sas?!).
 
  • Like
Reactions: TheDragon
Sorry to hijack this. Have the same problem. VM sees the passthrough LSI's hdds, but when not using passthrough, the host does not see the hdds

But how to do this step in detail?
Check, if the PVE-host sees the disks, when the controller uses its "normal" driver (mpt3sas?!).
 
Sorry to hijack this. Have the same problem. VM sees the passthrough LSI's hdds, but when not using passthrough, the host does not see the hdds
What is the output of lspci -nnk for the LSI PCI(e) device?
But how to do this step in detail?
Unbind the device from vfio-pci and rebind is to the actual driver? It depends on the output of the command above.
 
Hey, thanks for the super quick response

Here is the output
Bash:
02:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2 [1000:0087] (rev 05)
        Subsystem: Broadcom / LSI 9207-8i SAS2.1 HBA [1000:3020]
        Kernel driver in use: vfio-pci
        Kernel modules: mpt3sas
 
Hey, thanks for the super quick response

Here is the output
Bash:
02:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2 [1000:0087] (rev 05)
        Subsystem: Broadcom / LSI 9207-8i SAS2.1 HBA [1000:3020]
        Kernel driver in use: vfio-pci
        Kernel modules: mpt3sas
If you want to use the device on the Proxmox host, you need to unbind the vfio-pci driver from is and bind it to the mpt3sas driver. Or disconnect the device from the PCI(e) bus and rescan the bus. There is no guarantee that the device resets properly and works.
Do you want to keep switching between use by a VM and the host, then you might want to use a hookscript. Or do you want to undo the steps you did to passthrough this device and only use it with the Proxmox host?
 
Most of my services are in lxc containers which i pass the, already mounted, drives directories.

So i need the Proxmox as a host to see the hdds on the hba.
The VM passthrough was just a test..

Thanks again
 
So i need the Proxmox as a host to see the hdds on the hba.
The VM passthrough was just a test..
Maybe rebooting the Proxmox host is enough? Did you early bind the device to vfio-pci in a .conf-file in the /etc/modprobe.d/ directory? Then remove that and apply the changes before reboot. Sounds like you only need to undo the things you did for testing.
 
Hm. It was like that before testing also
I did passthrough a gpu to a VM tho.

Maybe i did something wrong and also bind the hba.
Code:
cat /etc/modprobe.d/*.conf --plain                                               blacklist amdgpu
blacklist radeon
# generated by nvidia-installer
blacklist nouveau
options nouveau modeset=0
# This file contains a list of modules which are not supported by Proxmox VE

# nvidiafb see bugreport https://bugzilla.proxmox.com/show_bug.cgi?id=701
blacklist nvidiafb
options vfio-pci ids=10de:13c2,10de:0fbb disable_vga=1
options zfs zfs_arc_max=8589934592
 
Last edited:
  • Like
Reactions: CKFrizz
Hm. It was like that before testing also
I did passthrough a gpu to a VM tho.

Maybe i did something wrong and also bind the hba.

Code:
cat /etc/modprobe.d/*.conf                                                       ───────┬───────────────────────────────────────────────────────────────────────────       │ File: /etc/modprobe.d/blacklist.conf
───────┼───────────────────────────────────────────────────────────────────────────   1   │ blacklist amdgpu
   2   │ blacklist radeon
───────┴──────────────────────────────────────────────────────────────────────────────────┬───────────────────────────────────────────────────────────────────────────       │ File: /etc/modprobe.d/nvidia-installer-disable-nouveau.conf
───────┼───────────────────────────────────────────────────────────────────────────   1   │ # generated by nvidia-installer
   2   │ blacklist nouveau
   3   │ options nouveau modeset=0
───────┴──────────────────────────────────────────────────────────────────────────────────┬───────────────────────────────────────────────────────────────────────────       │ File: /etc/modprobe.d/pve-blacklist.conf
───────┼───────────────────────────────────────────────────────────────────────────   1   │ # This file contains a list of modules which are not supported by Proxmox        │ VE
   2   │
   3   │ # nvidiafb see bugreport https://bugzilla.proxmox.com/show_bug.cgi?id=701
   4   │ blacklist nvidiafb
───────┴──────────────────────────────────────────────────────────────────────────────────┬───────────────────────────────────────────────────────────────────────────       │ File: /etc/modprobe.d/vfio.conf
───────┼───────────────────────────────────────────────────────────────────────────   1   │ options vfio-pci ids=10de:13c2,10de:0fbb disable_vga=1
───────┴──────────────────────────────────────────────────────────────────────────────────┬───────────────────────────────────────────────────────────────────────────       │ File: /etc/modprobe.d/zfs.conf
───────┼───────────────────────────────────────────────────────────────────────────   1   │ options zfs zfs_arc_max=8589934592
───────┴───────────────────────────────────────────────────────────────────────────
That is really hard to read but I don't see the 1000:0087 in there. What is the output of lspci -knns 02:00 after a fresh reboot of Proxmox? Maybe your current configuration is not active and you need to run update-initramfs -u before reboot? What is the output of cat /proc/cmdline?
Devices are not bound to vfio-pci after a reboot unless you configured it to be. They are only bound to vfio-pci automatically when you passthrough the device to a VM but that should be fixed by a Proxmox host reboot.
 
Thanks again for all the help.
I fixed the cat statement, sorry for that.

I will try and report back.
 
Hello all,
I also have to chime in on this topic. I have the same problem. Since I can not boot from the disk enclosure, I use the "normal" RAID controller for that. The disks are recognized, but the log says that the disks have unsupported sector sizes:
Code:
Sep 22 11:00:04 pve kernel: sd 1:0:3:0: [sde] Unsupported sector size 4160.
Sep 22 11:00:04 pve kernel: sd 1:0:6:0: [sdh] Unsupported sector size 4160.
Sep 22 11:00:04 pve kernel: sd 1:0:5:0: [sdg] Unsupported sector size 4160.
Sep 22 11:00:04 pve kernel: sd 1:0:4:0: [sdf] Unsupported sector size 4160.
Sep 22 11:00:04 pve kernel: sd 1:0:7:0: [sdi] Unsupported sector size 4160.
Sep 22 11:00:04 pve kernel: sd 1:0:8:0: [sdj] Unsupported sector size 4160.
Sep 22 11:00:04 pve kernel: sd 1:0:9:0: [sdk] Unsupported sector size 4160.
The correct drivers are loaded and ready to use.
Code:
01:00.0 RAID bus controller [0104]: Broadcom / LSI MegaRAID SAS 2108 [Liberator] [1000:0079] (rev 05)
        Subsystem: Fujitsu Technology Solutions RAID Ctrl SAS 6G 5/6 512MB (D2616) [1734:1176]
        Kernel driver in use: megaraid_sas
        Kernel modules: megaraid_sas
06:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2 [1000:0087] (rev 05)
        Subsystem: Broadcom / LSI 9207-8e SAS2.1 HBA [1000:3040]
        Kernel driver in use: mpt3sas
        Kernel modules: mpt3sas
 
After fresh reboot:

Code:
02:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2 [1000:0087] (rev 05)
        Subsystem: Broadcom / LSI 9207-8i SAS2.1 HBA [1000:3020]
        Kernel driver in use: vfio-pci
        Kernel modules: mpt3sas

Code:
❯ cat /proc/cmdline --plain
BOOT_IMAGE=/boot/vmlinuz-6.2.16-12-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt reboot=force
 
Hello all,
I also have to chime in on this topic. I have the same problem. Since I can not boot from the disk enclosure, I use the "normal" RAID controller for that. The disks are recognized, but the log says that the disks have unsupported sector sizes:
Code:
Sep 22 11:00:04 pve kernel: sd 1:0:3:0: [sde] Unsupported sector size 4160.
Sep 22 11:00:04 pve kernel: sd 1:0:6:0: [sdh] Unsupported sector size 4160.
Sep 22 11:00:04 pve kernel: sd 1:0:5:0: [sdg] Unsupported sector size 4160.
Sep 22 11:00:04 pve kernel: sd 1:0:4:0: [sdf] Unsupported sector size 4160.
Sep 22 11:00:04 pve kernel: sd 1:0:7:0: [sdi] Unsupported sector size 4160.
Sep 22 11:00:04 pve kernel: sd 1:0:8:0: [sdj] Unsupported sector size 4160.
Sep 22 11:00:04 pve kernel: sd 1:0:9:0: [sdk] Unsupported sector size 4160.
The correct drivers are loaded and ready to use.
Code:
01:00.0 RAID bus controller [0104]: Broadcom / LSI MegaRAID SAS 2108 [Liberator] [1000:0079] (rev 05)
        Subsystem: Fujitsu Technology Solutions RAID Ctrl SAS 6G 5/6 512MB (D2616) [1734:1176]
        Kernel driver in use: megaraid_sas
        Kernel modules: megaraid_sas
06:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2 [1000:0087] (rev 05)
        Subsystem: Broadcom / LSI 9207-8e SAS2.1 HBA [1000:3040]
        Kernel driver in use: mpt3sas
        Kernel modules: mpt3sas
I know some enterprise drives could use 520 bytes sectors instead of 512 and 4160 appears to be 8x520 instead of the more common 8x512 (4096 or 4k).
Maybe the drives can be reformatted to 4k? I think you need to find out how to do that via the manufacturer of the drives.
 
After fresh reboot:

Code:
02:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2 [1000:0087] (rev 05)
        Subsystem: Broadcom / LSI 9207-8i SAS2.1 HBA [1000:3020]
        Kernel driver in use: vfio-pci
        Kernel modules: mpt3sas

Code:
❯ cat /proc/cmdline --plain
BOOT_IMAGE=/boot/vmlinuz-6.2.16-12-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt reboot=force
Is this before starting any VM? Kernel driver in use: vfio-pci can only be cause because you configured it somewhere or you start a VM with passthrough of 02:00.
Did you reapply your configuration with update-initramfs -u -k all and proxmox-boot-tool refresh?
Please try to remember what you changed and change it back. I'm out of ideas where you could have caused this, so I can only suggest reinstalling Proxmox.
 
Yea, its really weird..
I never intended to passthrough this hba.

I cant figure out how to make it behave "normally".

I will remove the gpu passthrough to the vm and reboot again.
 
Ok that actually did it..

I simply removed the hardware pci from webui from the VM.. (it was a GPU)

I do not understand why it would also apply on the LSI.

Any ideas? i used the hardware ids of gpu for vfio
 
Ok that actually did it..

I simply removed the hardware pci from webui from the VM.. (it was a GPU)

I do not understand why it would also apply on the LSI.

Any ideas? i used the hardware ids of gpu for vfio
Check your IOMMU groups, maybe both devices are in the same group, so the Proxmox host loses the device when you do passthrough of any other device in the same group. Put the device in another PCIe slot to see if it moves to another IOMMU group (which might also change its PCI ID from 02:00 to something else). Your motherboard determines the IOMMU groups.
 
  • Like
Reactions: kafteras
You are correct.

Code:
Group 2:        [8086:0c01]     00:01.0  PCI bridge                               Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller
                [8086:0c05]     00:01.1  PCI bridge                               Xeon E3-1200 v3/4th Gen Core Processor PCI Express x8 Controller
                [10de:13c2] [R] 01:00.0  VGA compatible controller                GM204 [GeForce GTX 970]
                [10de:0fbb]     01:00.1  Audio device                             GM204 High Definition Audio Controller
                [1000:0087] [R] 02:00.0  Serial Attached SCSI controller          SAS2308 PCI-Express Fusion-MPT SAS-2
 
ok this is fixed basicaly..

i used `pcie_acs_override=downstream,multifunction` as kernel options in grub and was able to split them into individual groups.
Which in turn means, i can passthrough just the gpu to vm and leave the hba untouched

Group 14: [10de:13c2] [R] 01:00.0 VGA compatible controller GM204 [GeForce GTX 970]
Group 15: [10de:0fbb] 01:00.1 Audio device GM204 High Definition Audio Controller
Group 16: [1000:0087] [R] 02:00.0 Serial Attached SCSI controller SAS2308 PCI-Express Fusion-MPT SAS-2
Thanks for all your help @leesteken
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!