Thanks, I jest left the config in@Richard Isted I'm not sure if you need both lines, but regardless, could you attach yourdmesg
again?
GRUB_CMDLINE_LINUX_DEFAULT
for now.Attached current dmesg. Really appreciate the help, thank.
Thanks, I jest left the config in@Richard Isted I'm not sure if you need both lines, but regardless, could you attach yourdmesg
again?
GRUB_CMDLINE_LINUX_DEFAULT
for now.update-grub
after that change and then rebooted the host?update-grub
output after editing /etc/default/grub
which I presume is what I should have been doing.update-grub
Generating grub configuration file ...
W: This system is booted via proxmox-boot-tool:
W: Executing 'update-grub' directly does not update the correct configs!
W: Running: 'proxmox-boot-tool refresh'
Copying and configuring kernels on /dev/disk/by-uuid/83FF-019E
Copying kernel and creating boot-entry for 6.2.16-19-pve
Copying kernel and creating boot-entry for 6.2.16-3-pve
Found linux image: /boot/vmlinuz-6.2.16-19-pve
Found initrd image: /boot/initrd.img-6.2.16-19-pve
Found linux image: /boot/vmlinuz-6.2.16-3-pve
Found initrd image: /boot/initrd.img-6.2.16-3-pve
Adding boot menu entry for UEFI Firmware Settings ...
done
pci=realloc=off
or reserve=0x80000000,0xfffffff
to /etc/kernel/cmdline
and running proxmox-boot-tool refresh
resolved the issue for me. As I'm booting from a ZFS disk it seems that /etc/default/grub is ignored.pci=realloc=off
as my boot option, can always change to reserve=0x80000000,0xfffffff
if that is a better option.reserve
works before I jumped to conclusions.dmesg
logs on lines starting with BIOS-e820 where it initially identifies and reserve memory blocks:[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007398dfff] usable
[ 0.000000] BIOS-e820: [mem 0x000000007398e000-0x000000007458dfff] reserved
[ 0.000000] BIOS-e820: [mem 0x000000007458e000-0x000000007dcd1fff] usable
[ 0.000000] BIOS-e820: [mem 0x000000007dcd2000-0x000000007ddd9fff] ACPI NVS
[ 0.000000] efi: Remove mem79: MMIO range=[0x80000000-0x8fffffff] (256MB) from e820 map
[ 0.000000] efi: Not removing mem80: MMIO range=[0xfed1c000-0xfed1ffff] (16KB) from e820 map
[ 1.552235] mpt3sas 0000:0b:00.0: BAR 1: can't reserve [mem 0x809c0000-0x809c3fff 64bit]
[ 1.552243] mpt2sas_cm1: pci_request_selected_regions: failed
[ 1.552288] mpt2sas_cm1: failure at drivers/scsi/mpt3sas/mpt3sas_scsih.c:12348/_scsih_probe()!
mmconfig
to happen so it doesn't attempt to seek out for "unused" blocks.dmesg
and seeking for the failure on the can't reserve
message while loading the mpt3sas driver, then check if it's within a previously reclaimed block. If so, then simply add this block to a reserve
kernel parameter.pci=nommconf
OR pci=realloc=off
, but to be honest I would stay away from those two as I'm not quite sure they can impact other devices you may have.Nice work here and big thanks for sharing your findings!A recent patch introduced by 6.2 tries to reclaim memory back from bios at the early boot process, after the bios report they are reserved. Eventually during the boot process, it will be used for mmio (to communicate with devices such as pcie). The problem is that on some platforms, this isn't a safe operation as some of the previously reported as reserved memory block isn't actually usable, thus failing whenever any of the devices that happen to be assigned to use that block (in our case the SAS controller but really could be anything else).
IMO it seems reasonable. After all the work you put in to dissect this issue it might best if you reply to the aforementioned patch that introduced this with all the details you gathered though. We certainly can jump in too, and definitively would cherry-pick any resulting patches.This issue came from a patch introduced in 6.2.
It also seems to have bitten other users as well, and not just us owners of this specific hardware. After it got merged, it quickly broke some other laptop hardware, and the developers patched it by avoiding that by avoiding the reclaim when the memory chunk is too small.
To avoid any of these workarounds at all, I would love some help bringing this up to the kernel upstream, and maybe to ubuntu kernel. Does that seem reasonable @t.lamprecht ?
menuentry 'Install Proxmox VE (Graphical)' --class debian --class gnu-linux --class gnu --class os {
echo 'Loading Proxmox VE Installer ...'
linux /boot/linux26 ro ramdisk_size=16777216 rw quiet splash=silent
echo 'Loading initial ramdisk ...'
initrd /boot/initrd.img
}
It's the linux entry. There are already settings there e.g. splash=silent. Just add it after that.Sorry I am new to grub and kernel parameters. I am trying to add pci=realloc=off but I don't know where to do that. is it here:
Code:menuentry 'Install Proxmox VE (Graphical)' --class debian --class gnu-linux --class gnu --class os { echo 'Loading Proxmox VE Installer ...' linux /boot/linux26 ro ramdisk_size=16777216 rw quiet splash=silent echo 'Loading initial ramdisk ...' initrd /boot/initrd.img }
I don't see any entryies that start with GRUB so I am not sure what to do, any help would be appreciated
update-grub
update-grub
proxmox-ve: 8.2.0 (running kernel: 6.8.4-3-pve)
pve-manager: 8.2.2 (running version: 8.2.2/9355359cd7afbae4)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.4-3
proxmox-kernel-6.8.4-3-pve-signed: 6.8.4-3
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
amd64-microcode: 3.20230808.1.1~deb12u1
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.6
libpve-cluster-perl: 8.0.6
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.1
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.2.1
libqb0: 1.0.5-1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
openvswitch-switch: 3.1.0-2+deb12u1
proxmox-backup-client: 3.2.2-1
proxmox-backup-file-restore: 3.2.2-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.6
pve-container: 5.1.10
pve-docs: 8.2.2
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.0
pve-firewall: 5.0.7
pve-firmware: 3.11-1
pve-ha-manager: 4.0.4
pve-i18n: 3.2.2
pve-qemu-kvm: 8.1.5-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.3-pve2
root@pve:~# lsmod |grep mpt
mptctl 40960 1
mptbase 110592 1 mptctl
mpt3sas 364544 14
raid_class 12288 1 mpt3sas
scsi_transport_sas 53248 2 ses,mpt3sas
root@pve:~# dpkg -l |grep mpt
ii libopenmpt0:amd64 0.6.9-1 amd64 module music library based on OpenMPT -- shared library
ii mpt-status 1.2.0-8+hwraid1+Debian.stretch.9.9 amd64 get RAID status out of mpt (and other) HW RAID controllers
03:00.0 RAID bus controller [0104]: Broadcom / LSI MegaRAID SAS 2208 [Thunderbolt] [1000:005b] (rev 05)
DeviceName: Integrated RAID
Subsystem: Dell PERC H710P Mini (for monolithics) [1028:1f34]
Kernel driver in use: megaraid_sas
Kernel modules: megaraid_sas
01:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] [1000:0072] (rev 03)
DeviceName: LSI SAS 1068
Subsystem: Broadcom / LSI 9211-8i [1000:3020]
Kernel driver in use: mpt3sas
Kernel modules: mpt3sas
The thread title is "No SAS2008 after upgrade". Is that not what you have?@donhwyo : You're using mpt3sas, that works. The problem is only with megaraid-sas.