PCIe Passthrough LSI SAS2008 problems

Cipher

New Member
Jul 15, 2015
12
0
1
Germany, Hamburg
Hi all,

I'm trying to get PCIe passthrough running with proxmox 4.0.57 and an IBM M1115 controller crossflashed to the LSI IT firmware. The Mainboard is a Supermicro X10SLL-F with a Xeon E3-1226 v3.

I used rockstor (centos) at first and tried setting up a btrfs pool, but installed debian 8 to see if I get the same problems in another "clean" OS.

I get the controller passed through to the VM and I can see the HDDs (4x 6TB WD red), but I can't really use the HDDs:

  • Accessing the smart values is very spotty at best
  • "device is read only" errors from rockstor when adding a btrfs pool
  • mount errors (wrong fs type, bad option, bad superblock on /dev/sda1, missing codepage or helper program, or other error) for either xfs and ext3 filesystems
  • disks loose their partitioning seemingly at random/after some time. Could not reproduce this
  • smartctl not working as expected (tried different options -d sat etc, no change), inside rockstor smart values would show in 1 of 10 tries
Code:
root@rockstor:~# smartctl -i /dev/sdasmartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.16.0-4-amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org


=== START OF INFORMATION SECTION ===
Device Model:     WDC WD60EFRX-68MYMN1
Serial Number:    WD-WX21D15A74Y0
LU WWN Device Id: 5 0014ee 2b6886b68
Firmware Version: 82.00A82
User Capacity:    6,001,175,126,016 bytes [6.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5700 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Dec  3 10:21:21 2015 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled


root@rockstor:~# smartctl -i /dev/sda
smartctl 6.4 2014-10-07 r4002 [x86_64-linux-3.16.0-4-amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org


Short INQUIRY response, skip product id
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

fdisk output, disks loose their partitioning at random (they come "back" after a reboot, but mounting still not possible)
Code:
root@rockstor:~# fdisk -l


Disk /dev/vda: 32 GiB, 34359738368 bytes, 67108864 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xb850e335


Device Boot Start End Sectors Size Id Type
/dev/vda1 * 2048 64286719 64284672 30.7G 83 Linux
/dev/vda2 64288766 67106815 2818050 1.4G 5 Extended
/dev/vda5 64288768 67106815 2818048 1.4G 82 Linux swap / Solaris


Disk /dev/sdc: 5.5 TiB, 6001175126016 bytes, 11721045168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/sdb: 5.5 TiB, 6001175126016 bytes, 11721045168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/sda: 5.5 TiB, 6001175126016 bytes, 11721045168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
The backup GPT table is corrupt, but the primary appears OK, so that will be used.


Disk /dev/sdd: 5.5 TiB, 6001175126016 bytes, 11721045168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 0EDA213C-9ACE-4751-ABFC-5DF85E4F918D


Device Start End Sectors Size Type
/dev/sdd1 2048 11721045134 11721043087 5.5T Linux filesystem
fdisk output some 20 minutes earlier:
Code:
root@rockstor:~# fdisk -l


Disk /dev/vda: 32 GiB, 34359738368 bytes, 67108864 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xb850e335


Device Boot Start End Sectors Size Id Type
/dev/vda1 * 2048 64286719 64284672 30.7G 83 Linux
/dev/vda2 64288766 67106815 2818050 1.4G 5 Extended
/dev/vda5 64288768 67106815 2818048 1.4G 82 Linux swap / Solaris


Disk /dev/sdc: 5.5 TiB, 6001175126016 bytes, 11721045168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: E1F3C335-1D64-4F29-AEE2-EA14F2149B63


Device Start End Sectors Size Type
/dev/sdc1 2048 11721045134 11721043087 5.5T Linux filesystem


Disk /dev/sdb: 5.5 TiB, 6001175126016 bytes, 11721045168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 70E83063-BA1F-446B-B61B-BA14D5D512CE


Device Start End Sectors Size Type
/dev/sdb1 2048 11721045134 11721043087 5.5T Linux filesystem


Disk /dev/sda: 5.5 TiB, 6001175126016 bytes, 11721045168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: D0DDBDFB-6DC9-4990-BF21-46838C3F8628


Device Start End Sectors Size Type
/dev/sda1 2048 11721045134 11721043087 5.5T Linux filesystem


Disk /dev/sdd: 5.5 TiB, 6001175126016 bytes, 11721045168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 0EDA213C-9ACE-4751-ABFC-5DF85E4F918D


Device Start End Sectors Size Type
/dev/sdd1 2048 11721045134 11721043087 5.5T Linux filesystem
but lsblk still shows the partitions:
Code:
root@rockstor:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 5.5T 0 disk
└─sda1 8:1 0 5.5T 0 part
sdb 8:16 0 5.5T 0 disk
└─sdb1 8:17 0 5.5T 0 part
sdc 8:32 0 5.5T 0 disk
└─sdc1 8:33 0 5.5T 0 part
sdd 8:48 0 5.5T 0 disk
└─sdd1 8:49 0 5.5T 0 part
sr0 11:0 1 247M 0 rom
vda 254:0 0 32G 0 disk
├─vda1 254:1 0 30.7G 0 part /
├─vda2 254:2 0 1K 0 part
└─vda5 254:5 0 1.4G 0 part [SWAP]

Creating a filesystem:
Code:
root@rockstor:~# mkfs.ext3 /dev/sdb1
mke2fs 1.42.12 (29-Aug-2014)
Creating filesystem with 1465130385 4k blocks and 183144448 inodes
Filesystem UUID: 18f2512e-f6bc-47f1-850b-0cec02534612
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848, 512000000, 550731776, 644972544


Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

root@rockstor:~# mount /dev/sda1 /mnt
mount: wrong fs type, bad option, bad superblock on /dev/sda1,
       missing codepage or helper program, or other error


       In some cases useful info is found in syslog - try
       dmesg | tail or so.
root@rockstor:~# fsck /dev/sda1
fsck from util-linux 2.25.2
e2fsck 1.42.12 (29-Aug-2014)
ext2fs_open2: Bad magic number in super-block
fsck.ext2: Superblock invalid, trying backup blocks...
fsck.ext2: Bad magic number in super-block while trying to open /dev/sda1


The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
 or
    e2fsck -b 32768 <device>

root@rockstor:~# e2fsck -b 8193 /dev/sda
e2fsck 1.42.12 (29-Aug-2014)
e2fsck: Bad magic number in super-block while trying to open /dev/sda


The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
 or
    e2fsck -b 32768 <device>

Code:
root@rockstor:~# lspci -nnk
01:00.0 Serial Attached SCSI controller [0107]: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] [1000:0072] (rev 03)
        Subsystem: LSI Logic / Symbios Logic Device [1000:3020]
        Kernel driver in use: mpt2sas

qm monitor 100
Code:
info pciBus 1, device 0, function 0:
SAS controller: PCI device 1000:0072
IRQ 10.
BAR0: I/O at 0x7000 [0x70ff].
BAR1: 64 bit memory at 0xfe8c0000 [0xfe8c3fff].
BAR3: 64 bit memory at 0xfe880000 [0xfe8bffff].
BAR6: 32 bit memory at 0xffffffffffffffff [0x0007fffe].id "hostpci0"

Proxmox log output
Code:
Dec 2 22:55:28 intrepid kernel: [ 680.200164] vfio-pci 0000:01:00.0: enabling device (0400 -> 0403)

vm 100.conf
Code:
balloon: 2048bootdisk: virtio0
cores: 4
cpu: host
ide2: local:iso/debian-8.1.0-amd64-netinst.iso,media=cdrom,size=247M
machine: q35
hostpci0: 01:00.0,pcie=1
memory: 8192
name: rockstor
net0: e1000=56:51:10:FE:F0:A5,bridge=vmbr0
numa: 0
ostype: l26
smbios1: uuid=4fc0047a-ac89-45f3-9d02-2c1c251364ab
sockets: 1
unused0: local:100/vm-100-disk-1.qcow2
virtio0: local:100/vm-100-disk-2.qcow2,cache=writeback,size=32G

Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on vfio_iommu_type1.allow_unsafe_interrupts=1 rootdelay=10 scsi_mod.scan=sync"
Code:
# /etc/modules: kernel modules to load at boot time.#
# This file contains the names of kernel modules that should be loaded
# at boot time, one per line. Lines beginning with "#" are ignored.
#pci_stub
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
kvm
kvm_intel
Code:
/etc/initramfs-tools/modules
# List of modules that you want to include in your initramfs.
# They will be loaded at boot time in the order below.
#
# Syntax:  module_name [args ...]
#
# You must run update-initramfs(8) to effect this change.
#
# Examples:
#
# raid1
# sd_mod
#pci_stub ids=1000:0072
vfio-pci ids=1000:0072
Code:
 dmesg | grep -e DMAR -e IOMMU
[    0.000000] ACPI: DMAR 0x00000000DDA63A50 000080 (v01 INTEL  BDW      00000001 INTL 00000001)
[    0.000000] DMAR: IOMMU enabled
[    0.028070] DMAR: Host address width 39
[    0.028071] DMAR: DRHD base: 0x000000fed90000 flags: 0x1
[    0.028077] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap d2008c20660462 ecap f010da
[    0.028078] DMAR: RMRR base: 0x000000df697000 end: 0x000000df6a5fff
[    0.028080] DMAR-IR: IOAPIC id 8 under DRHD base  0xfed90000 IOMMU 0
[    0.028080] DMAR-IR: HPET id 0 under DRHD base 0xfed90000
[    0.028081] DMAR-IR: x2apic is disabled because BIOS sets x2apic opt out bit.
[    0.028082] DMAR-IR: Use 'intremap=no_x2apic_optout' to override the BIOS setting.
[    0.028226] DMAR-IR: Enabled IRQ remapping in xapic mode
[    0.557899] DMAR: No ATSR found
[    0.557961] DMAR: dmar0: Using Queued invalidation
[    0.557968] DMAR: Setting RMRR:
[    0.557977] DMAR: Setting identity map for device 0000:00:14.0 [0xdf697000 - 0xdf6a5fff]
[    0.557998] DMAR: Setting identity map for device 0000:00:1a.0 [0xdf697000 - 0xdf6a5fff]
[    0.558015] DMAR: Setting identity map for device 0000:00:1d.0 [0xdf697000 - 0xdf6a5fff]
[    0.558027] DMAR: Prepare 0-16MiB unity mapping for LPC
[    0.558032] DMAR: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
[    0.558041] DMAR: Intel(R) Virtualization Technology for Directed I/O
The controller is added via the quemu/vm conf file: hostpci0: 01:00,pcie=1,driver=vfio and all the necessary options are set according to https://pve.proxmox.com/wiki/Pci_passthrough1 and https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF

I'm at a total loss here, everything seems fine. Even no errors in any logfiles (host and vm). Hopefully someone has an idea how to tackle this.

Cheers and thanks for reading this wall of text :)
 
Mhh, I don't have any troubles booting the VM, it boots up as quick as expected. Interesting thing though, I set the RAM to variable between 2GB to 8GB. I just set it to fixed 8GB and the problem seems to be resolved. I'll do some more tests and look if its stable.
 
Did you ever get this working? I'm having almost the exact same symptoms with an LSI 9200-8e. Tons of errors in dmesg output, though. The adapter goes into a reset loop when I try to do any I/O. I upgraded to the lastest testing kernel, and still no go.

I'm thinking it may not be possible for it to work correctly due to my e3-1240 v2's lack of ACS on the pcie root port.
 
Did you ever get this working? I'm having almost the exact same symptoms with an LSI 9200-8e. Tons of errors in dmesg output, though. The adapter goes into a reset loop when I try to do any I/O. I upgraded to the lastest testing kernel, and still no go.

I'm thinking it may not be possible for it to work correctly due to my e3-1240 v2's lack of ACS on the pcie root port.
Yeah, it works, the freenas VM runs since I set it up in December with fixed RAM.
 
WAIT...it's working now?! Standard machine type (not q35) and OVMF work. I'm not passing pcie=1 on the mapping, either. It's always the last config you try, I guess.
 
WAIT...it's working now?! Standard machine type (not q35) and OVMF work. I'm not passing pcie=1 on the mapping, either. It's always the last config you try, I guess.

Do you use FreeNAS?
I wanted to try it but I get the FreeNAS install cd to boot with UEFI/OVMF .
 
Hi,

I would upgrade because you wrote
with proxmox 4.0.57
this is an old version and we had in the past PCI passthrough problems
 
I am having the same problems...
Hardware: Supermicro X11SSH-CTF mainboard with onboard LSI SAS3008 controller. I am trying to passthrough this controller to a linux VM, also I passthrough a networkcard (so called "virtual function" (SR-IOV) of the onboard Intel network card) and an additional HDD connected to the onboard SATA controller.

But the same happens to me as to "Ciper". Partitions are randomly showing up and leaving. Sometimes "fdisk -l" tells me, the GPT is corrupt. Sometimes it doesn't show me any HDD connected to the controller and just gives me errors like "mpt3sas_cm0: fault_state(0x2622)!" of the controller. The network card has mysterious problems, too, sometimes it is not getting a DHCP-lease, sometimes no traffic is flowing.

It's working when:
- I don't use a SCSI-controller in the VM (no matter which one)
- and I don't pass through the single HDD connected via SATA to my mainboard
- and I disable ballooning

If I add a SCSI- or virtio-device to the VM or if I pass through the HDD (even as SATA-device) or if I enable ballooning, then I get these problems.

My GRUB command line:
Code:
GRUB_CMDLINE_LINUX="root=ZFS=rpool/ROOT/pve-1 boot=zfs intel_iommu=on pcie_acs_override=downstream ixgbe.max_vfs=15"

and my VM-configuration:
Code:
autostart: 0
balloon: 2048
boot: cd
bootdisk: scsi0
cores: 8
cpu: Skylake-Client
cpulimit: 7.5
cpuunits: 2048
hostpci0: 01:00.0,pcie=1
hostpci1: 05:10.1,pcie=1
hotplug: disk,network,usb
keyboard: de
machine: q35
memory: 16384
name: t-data
numa: 0
onboot: 0
ostype: l26
sata0: local:iso/archlinux-2017.09.01-x86_64.iso,media=cdrom,size=518M
scsi0: /dev/disk/by-id/ata-ST4000DM000-1F2168_Z304A8Y9,aio=native,backup=0,cache=none,rerror=report,size=3907018584K,werror=report
scsihw: virtio-scsi-pci
shares: 2000
smbios1: family=Tobby-VMs,manufacturer=Tobby,product=Tobby-Data-VM,uuid=5b3799dd-c8aa-42a9-9b4f-a3df5fac766c,version=2.2.experimental
sockets: 1
startup: down=30
tablet: 0
usb0: spice,usb3=0
usb1: spice,usb3=0
usb2: spice,usb3=0
usb3: spice,usb3=0
vga: qxl
watchdog: i6300esb,action=poweroff

Using standard machine type and/or OVMF doesn't help, even not with disabled ballooning (balloon: 0)

Ah, and "pveversion -v" says:
Code:
proxmox-ve: 5.0-21 (running kernel: 4.10.17-3-pve)
pve-manager: 5.0-31 (running version: 5.0-31/27769b1f)
pve-kernel-4.10.17-2-pve: 4.10.17-20
pve-kernel-4.10.15-1-pve: 4.10.15-15
pve-kernel-4.10.17-3-pve: 4.10.17-21
pve-kernel-4.10.17-1-pve: 4.10.17-18
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve3
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-12
qemu-server: 5.0-15
pve-firmware: 2.0-2
libpve-common-perl: 5.0-16
libpve-guest-common-perl: 2.0-11
libpve-access-control: 5.0-6
libpve-storage-perl: 5.0-14
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.0-9
pve-qemu-kvm: 2.9.0-5
pve-container: 2.0-15
pve-firewall: 3.0-2
pve-ha-manager: 2.0-2
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.0.8-3
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.6.5.11-pve17~bpo90
 
Last edited:
I do not know your System but I would say you have a problem with your iommu groups.
Onboard controller normally use shared pcie lanes so this makes problems.
 
Hmm doesn't look like shared PCIe-lanes:
x11ssh-ctf.jpg

IOMMU-groups look ok, too:
Code:
root@t-hyper:~# find /sys/kernel/iommu_groups/ -type l
/sys/kernel/iommu_groups/17/devices/0000:05:11.1
/sys/kernel/iommu_groups/7/devices/0000:00:1c.4
/sys/kernel/iommu_groups/25/devices/0000:05:13.1
/sys/kernel/iommu_groups/15/devices/0000:05:10.5
/sys/kernel/iommu_groups/5/devices/0000:00:1c.0
/sys/kernel/iommu_groups/23/devices/0000:05:12.5
/sys/kernel/iommu_groups/13/devices/0000:05:10.1
/sys/kernel/iommu_groups/3/devices/0000:00:16.1
/sys/kernel/iommu_groups/3/devices/0000:00:16.0
/sys/kernel/iommu_groups/21/devices/0000:05:12.1
/sys/kernel/iommu_groups/11/devices/0000:05:00.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/18/devices/0000:05:11.3
/sys/kernel/iommu_groups/8/devices/0000:00:1f.4
/sys/kernel/iommu_groups/8/devices/0000:00:1f.2
/sys/kernel/iommu_groups/8/devices/0000:00:1f.0
/sys/kernel/iommu_groups/26/devices/0000:05:13.3
/sys/kernel/iommu_groups/16/devices/0000:05:10.7
/sys/kernel/iommu_groups/6/devices/0000:00:1c.2
/sys/kernel/iommu_groups/24/devices/0000:05:12.7
/sys/kernel/iommu_groups/14/devices/0000:05:10.3
/sys/kernel/iommu_groups/4/devices/0000:00:17.0
/sys/kernel/iommu_groups/22/devices/0000:05:12.3
/sys/kernel/iommu_groups/12/devices/0000:05:00.1
/sys/kernel/iommu_groups/2/devices/0000:00:14.2
/sys/kernel/iommu_groups/2/devices/0000:00:14.0
/sys/kernel/iommu_groups/20/devices/0000:05:11.7
/sys/kernel/iommu_groups/10/devices/0000:04:00.0
/sys/kernel/iommu_groups/10/devices/0000:03:00.0
/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/19/devices/0000:05:11.5
/sys/kernel/iommu_groups/9/devices/0000:01:00.0
/sys/kernel/iommu_groups/27/devices/0000:05:13.5

So my virtual-function-network-card is in group 13 without anything else, my SAS-controller is in group 9 without anything else. Looks good to me. :(
 
It's missing the ZFS modules so it can't boot
Code:
Failed to load ZFS modules.
Manually load the modules and exit.


Busybox v1.22.1 (Debian 1:1.22.0-19+b3) built in shell (ash)
Enter 'help' for a list of built-in commands.

/bin/sh: can't access tty; job control turned off
/ #
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!