Disks only detected when passed through to a VM, not on the host (using a HBA)

TheDragon · Jul 5, 2023

I'm currently using proxmox-ve: 7.4-1 (running kernel: 6.2.11-2-pve) I haven't upgraded to PVE 8 yet, as I'm aware of another thread describing issues with HBAs.

lsblk only shows the SSDs that are directly connected to my motherboard.

This is the output of lspci -nnk

0000:01:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] [1000:0072] (rev 03)
Subsystem: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] [1000:3020]
Kernel driver in use: vfio-pci
Kernel modules: mpt3sas

My problem is the HBA is detected by the PVE host, but the attached disks are not, however if I pass the HBA through to a VM the disks are detected.

Could anyone offer any suggestions for how to debug this?

Neobin · Jul 5, 2023

TheDragon said:
My problem is the HBA is detected by the PVE host, but the attached disks are not, however if I pass the HBA through to a VM the disks are detected.

Most likely because the controller already uses / is bound to the driver for PCI(e)-passthrough:

TheDragon said:
Kernel driver in use: vfio-pci

Check, if the PVE-host sees the disks, when the controller uses its "normal" driver (mpt3sas?!).

kafteras · Sep 22, 2023

Sorry to hijack this. Have the same problem. VM sees the passthrough LSI's hdds, but when not using passthrough, the host does not see the hdds

But how to do this step in detail?

Check, if the PVE-host sees the disks, when the controller uses its "normal" driver (mpt3sas?!).

leesteken · Sep 22, 2023

kafteras said:
Sorry to hijack this. Have the same problem. VM sees the passthrough LSI's hdds, but when not using passthrough, the host does not see the hdds

What is the output of lspci -nnk for the LSI PCI(e) device?

kafteras said:
But how to do this step in detail?

Unbind the device from vfio-pci and rebind is to the actual driver? It depends on the output of the command above.

kafteras · Sep 22, 2023

Hey, thanks for the super quick response

Here is the output

Bash:

02:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2 [1000:0087] (rev 05)
        Subsystem: Broadcom / LSI 9207-8i SAS2.1 HBA [1000:3020]
        Kernel driver in use: vfio-pci
        Kernel modules: mpt3sas

leesteken · Sep 22, 2023

kafteras said:

Hey, thanks for the super quick response

Here is the output

Bash:

02:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2 [1000:0087] (rev 05)
        Subsystem: Broadcom / LSI 9207-8i SAS2.1 HBA [1000:3020]
        Kernel driver in use: vfio-pci
        Kernel modules: mpt3sas

If you want to use the device on the Proxmox host, you need to unbind the vfio-pci driver from is and bind it to the mpt3sas driver. Or disconnect the device from the PCI(e) bus and rescan the bus. There is no guarantee that the device resets properly and works.
Do you want to keep switching between use by a VM and the host, then you might want to use a hookscript. Or do you want to undo the steps you did to passthrough this device and only use it with the Proxmox host?

kafteras · Sep 22, 2023

Most of my services are in lxc containers which i pass the, already mounted, drives directories.

So i need the Proxmox as a host to see the hdds on the hba.
The VM passthrough was just a test..

Thanks again

leesteken · Sep 22, 2023

kafteras said:
So i need the Proxmox as a host to see the hdds on the hba.
The VM passthrough was just a test..

Maybe rebooting the Proxmox host is enough? Did you early bind the device to vfio-pci in a .conf-file in the /etc/modprobe.d/ directory? Then remove that and apply the changes before reboot. Sounds like you only need to undo the things you did for testing.

kafteras · Sep 22, 2023

Hm. It was like that before testing also
I did passthrough a gpu to a VM tho.

Maybe i did something wrong and also bind the hba.

Code:

cat /etc/modprobe.d/*.conf --plain                                               blacklist amdgpu
blacklist radeon
# generated by nvidia-installer
blacklist nouveau
options nouveau modeset=0
# This file contains a list of modules which are not supported by Proxmox VE

# nvidiafb see bugreport https://bugzilla.proxmox.com/show_bug.cgi?id=701
blacklist nvidiafb
options vfio-pci ids=10de:13c2,10de:0fbb disable_vga=1
options zfs zfs_arc_max=8589934592

leesteken · Sep 22, 2023

kafteras said:

Hm. It was like that before testing also
I did passthrough a gpu to a VM tho.

Maybe i did something wrong and also bind the hba.

Code:

cat /etc/modprobe.d/*.conf                                                       ───────┬───────────────────────────────────────────────────────────────────────────       │ File: /etc/modprobe.d/blacklist.conf
───────┼───────────────────────────────────────────────────────────────────────────   1   │ blacklist amdgpu
   2   │ blacklist radeon
───────┴──────────────────────────────────────────────────────────────────────────────────┬───────────────────────────────────────────────────────────────────────────       │ File: /etc/modprobe.d/nvidia-installer-disable-nouveau.conf
───────┼───────────────────────────────────────────────────────────────────────────   1   │ # generated by nvidia-installer
   2   │ blacklist nouveau
   3   │ options nouveau modeset=0
───────┴──────────────────────────────────────────────────────────────────────────────────┬───────────────────────────────────────────────────────────────────────────       │ File: /etc/modprobe.d/pve-blacklist.conf
───────┼───────────────────────────────────────────────────────────────────────────   1   │ # This file contains a list of modules which are not supported by Proxmox        │ VE
   2   │
   3   │ # nvidiafb see bugreport https://bugzilla.proxmox.com/show_bug.cgi?id=701
   4   │ blacklist nvidiafb
───────┴──────────────────────────────────────────────────────────────────────────────────┬───────────────────────────────────────────────────────────────────────────       │ File: /etc/modprobe.d/vfio.conf
───────┼───────────────────────────────────────────────────────────────────────────   1   │ options vfio-pci ids=10de:13c2,10de:0fbb disable_vga=1
───────┴──────────────────────────────────────────────────────────────────────────────────┬───────────────────────────────────────────────────────────────────────────       │ File: /etc/modprobe.d/zfs.conf
───────┼───────────────────────────────────────────────────────────────────────────   1   │ options zfs zfs_arc_max=8589934592
───────┴───────────────────────────────────────────────────────────────────────────

That is really hard to read but I don't see the 1000:0087 in there. What is the output of lspci -knns 02:00 after a fresh reboot of Proxmox? Maybe your current configuration is not active and you need to run update-initramfs -u before reboot? What is the output of cat /proc/cmdline?
Devices are not bound to vfio-pci after a reboot unless you configured it to be. They are only bound to vfio-pci automatically when you passthrough the device to a VM but that should be fixed by a Proxmox host reboot.

kafteras · Sep 22, 2023

Thanks again for all the help.
I fixed the cat statement, sorry for that.

I will try and report back.

CKFrizz · Sep 22, 2023

Hello all,
I also have to chime in on this topic. I have the same problem. Since I can not boot from the disk enclosure, I use the "normal" RAID controller for that. The disks are recognized, but the log says that the disks have unsupported sector sizes:

Code:

Sep 22 11:00:04 pve kernel: sd 1:0:3:0: [sde] Unsupported sector size 4160.
Sep 22 11:00:04 pve kernel: sd 1:0:6:0: [sdh] Unsupported sector size 4160.
Sep 22 11:00:04 pve kernel: sd 1:0:5:0: [sdg] Unsupported sector size 4160.
Sep 22 11:00:04 pve kernel: sd 1:0:4:0: [sdf] Unsupported sector size 4160.
Sep 22 11:00:04 pve kernel: sd 1:0:7:0: [sdi] Unsupported sector size 4160.
Sep 22 11:00:04 pve kernel: sd 1:0:8:0: [sdj] Unsupported sector size 4160.
Sep 22 11:00:04 pve kernel: sd 1:0:9:0: [sdk] Unsupported sector size 4160.

The correct drivers are loaded and ready to use.

Code:

01:00.0 RAID bus controller [0104]: Broadcom / LSI MegaRAID SAS 2108 [Liberator] [1000:0079] (rev 05)
        Subsystem: Fujitsu Technology Solutions RAID Ctrl SAS 6G 5/6 512MB (D2616) [1734:1176]
        Kernel driver in use: megaraid_sas
        Kernel modules: megaraid_sas
06:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2 [1000:0087] (rev 05)
        Subsystem: Broadcom / LSI 9207-8e SAS2.1 HBA [1000:3040]
        Kernel driver in use: mpt3sas
        Kernel modules: mpt3sas

kafteras · Sep 22, 2023

After fresh reboot:

Code:

02:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2 [1000:0087] (rev 05)
        Subsystem: Broadcom / LSI 9207-8i SAS2.1 HBA [1000:3020]
        Kernel driver in use: vfio-pci
        Kernel modules: mpt3sas

Code:

❯ cat /proc/cmdline --plain
BOOT_IMAGE=/boot/vmlinuz-6.2.16-12-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt reboot=force

leesteken · Sep 22, 2023

CKFrizz said:

Hello all,
I also have to chime in on this topic. I have the same problem. Since I can not boot from the disk enclosure, I use the "normal" RAID controller for that. The disks are recognized, but the log says that the disks have unsupported sector sizes:

Code:

Sep 22 11:00:04 pve kernel: sd 1:0:3:0: [sde] Unsupported sector size 4160.
Sep 22 11:00:04 pve kernel: sd 1:0:6:0: [sdh] Unsupported sector size 4160.
Sep 22 11:00:04 pve kernel: sd 1:0:5:0: [sdg] Unsupported sector size 4160.
Sep 22 11:00:04 pve kernel: sd 1:0:4:0: [sdf] Unsupported sector size 4160.
Sep 22 11:00:04 pve kernel: sd 1:0:7:0: [sdi] Unsupported sector size 4160.
Sep 22 11:00:04 pve kernel: sd 1:0:8:0: [sdj] Unsupported sector size 4160.
Sep 22 11:00:04 pve kernel: sd 1:0:9:0: [sdk] Unsupported sector size 4160.

The correct drivers are loaded and ready to use.

Code:

01:00.0 RAID bus controller [0104]: Broadcom / LSI MegaRAID SAS 2108 [Liberator] [1000:0079] (rev 05)
        Subsystem: Fujitsu Technology Solutions RAID Ctrl SAS 6G 5/6 512MB (D2616) [1734:1176]
        Kernel driver in use: megaraid_sas
        Kernel modules: megaraid_sas
06:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2 [1000:0087] (rev 05)
        Subsystem: Broadcom / LSI 9207-8e SAS2.1 HBA [1000:3040]
        Kernel driver in use: mpt3sas
        Kernel modules: mpt3sas

I know some enterprise drives could use 520 bytes sectors instead of 512 and 4160 appears to be 8x520 instead of the more common 8x512 (4096 or 4k).
Maybe the drives can be reformatted to 4k? I think you need to find out how to do that via the manufacturer of the drives.

leesteken · Sep 22, 2023

kafteras said:

After fresh reboot:

Code:

02:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2 [1000:0087] (rev 05)
        Subsystem: Broadcom / LSI 9207-8i SAS2.1 HBA [1000:3020]
        Kernel driver in use: vfio-pci
        Kernel modules: mpt3sas

Code:

❯ cat /proc/cmdline --plain
BOOT_IMAGE=/boot/vmlinuz-6.2.16-12-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt reboot=force

Is this before starting any VM? Kernel driver in use: vfio-pci can only be cause because you configured it somewhere or you start a VM with passthrough of 02:00.
Did you reapply your configuration with update-initramfs -u -k all and proxmox-boot-tool refresh?
Please try to remember what you changed and change it back. I'm out of ideas where you could have caused this, so I can only suggest reinstalling Proxmox.

kafteras · Sep 22, 2023

Yea, its really weird..
I never intended to passthrough this hba.

I cant figure out how to make it behave "normally".

I will remove the gpu passthrough to the vm and reboot again.

kafteras · Sep 22, 2023

Ok that actually did it..

I simply removed the hardware pci from webui from the VM.. (it was a GPU)

I do not understand why it would also apply on the LSI.

Any ideas? i used the hardware ids of gpu for vfio

leesteken · Sep 22, 2023

kafteras said:
Ok that actually did it..

I simply removed the hardware pci from webui from the VM.. (it was a GPU)

I do not understand why it would also apply on the LSI.

Any ideas? i used the hardware ids of gpu for vfio

Check your IOMMU groups, maybe both devices are in the same group, so the Proxmox host loses the device when you do passthrough of any other device in the same group. Put the device in another PCIe slot to see if it moves to another IOMMU group (which might also change its PCI ID from 02:00 to something else). Your motherboard determines the IOMMU groups.

kafteras · Sep 22, 2023

You are correct.

Code:

Group 2:        [8086:0c01]     00:01.0  PCI bridge                               Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller
                [8086:0c05]     00:01.1  PCI bridge                               Xeon E3-1200 v3/4th Gen Core Processor PCI Express x8 Controller
                [10de:13c2] [R] 01:00.0  VGA compatible controller                GM204 [GeForce GTX 970]
                [10de:0fbb]     01:00.1  Audio device                             GM204 High Definition Audio Controller
                [1000:0087] [R] 02:00.0  Serial Attached SCSI controller          SAS2308 PCI-Express Fusion-MPT SAS-2

kafteras · Sep 22, 2023

ok this is fixed basicaly..

i used `pcie_acs_override=downstream,multifunction` as kernel options in grub and was able to split them into individual groups.
Which in turn means, i can passthrough just the gpu to vm and leave the hba untouched

Group 14: [10de:13c2] [R] 01:00.0 VGA compatible controller GM204 [GeForce GTX 970]
Group 15: [10de:0fbb] 01:00.1 Audio device GM204 High Definition Audio Controller
Group 16: [1000:0087] [R] 02:00.0 Serial Attached SCSI controller SAS2308 PCI-Express Fusion-MPT SAS-2

Thanks for all your help @leesteken

Disks only detected when passed through to a VM, not on the host (using a HBA)

Member

Distinguished Member

Member

Distinguished Member

Member

Distinguished Member

Member

Distinguished Member

Member

Distinguished Member

Member

New Member

Member

Distinguished Member

Distinguished Member

Member

Member

Distinguished Member

Member

Member

We value your privacy