Thanks, I jest left the config in@Richard Isted I'm not sure if you need both lines, but regardless, could you attach yourdmesg
again?
GRUB_CMDLINE_LINUX_DEFAULT
for now.Attached current dmesg. Really appreciate the help, thank.
Thanks, I jest left the config in@Richard Isted I'm not sure if you need both lines, but regardless, could you attach yourdmesg
again?
GRUB_CMDLINE_LINUX_DEFAULT
for now.update-grub
after that change and then rebooted the host?update-grub
output after editing /etc/default/grub
which I presume is what I should have been doing.update-grub
Generating grub configuration file ...
W: This system is booted via proxmox-boot-tool:
W: Executing 'update-grub' directly does not update the correct configs!
W: Running: 'proxmox-boot-tool refresh'
Copying and configuring kernels on /dev/disk/by-uuid/83FF-019E
Copying kernel and creating boot-entry for 6.2.16-19-pve
Copying kernel and creating boot-entry for 6.2.16-3-pve
Found linux image: /boot/vmlinuz-6.2.16-19-pve
Found initrd image: /boot/initrd.img-6.2.16-19-pve
Found linux image: /boot/vmlinuz-6.2.16-3-pve
Found initrd image: /boot/initrd.img-6.2.16-3-pve
Adding boot menu entry for UEFI Firmware Settings ...
done
pci=realloc=off
or reserve=0x80000000,0xfffffff
to /etc/kernel/cmdline
and running proxmox-boot-tool refresh
resolved the issue for me. As I'm booting from a ZFS disk it seems that /etc/default/grub is ignored.pci=realloc=off
as my boot option, can always change to reserve=0x80000000,0xfffffff
if that is a better option.reserve
works before I jumped to conclusions.dmesg
logs on lines starting with BIOS-e820 where it initially identifies and reserve memory blocks:[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007398dfff] usable
[ 0.000000] BIOS-e820: [mem 0x000000007398e000-0x000000007458dfff] reserved
[ 0.000000] BIOS-e820: [mem 0x000000007458e000-0x000000007dcd1fff] usable
[ 0.000000] BIOS-e820: [mem 0x000000007dcd2000-0x000000007ddd9fff] ACPI NVS
[ 0.000000] efi: Remove mem79: MMIO range=[0x80000000-0x8fffffff] (256MB) from e820 map
[ 0.000000] efi: Not removing mem80: MMIO range=[0xfed1c000-0xfed1ffff] (16KB) from e820 map
[ 1.552235] mpt3sas 0000:0b:00.0: BAR 1: can't reserve [mem 0x809c0000-0x809c3fff 64bit]
[ 1.552243] mpt2sas_cm1: pci_request_selected_regions: failed
[ 1.552288] mpt2sas_cm1: failure at drivers/scsi/mpt3sas/mpt3sas_scsih.c:12348/_scsih_probe()!
mmconfig
to happen so it doesn't attempt to seek out for "unused" blocks.dmesg
and seeking for the failure on the can't reserve
message while loading the mpt3sas driver, then check if it's within a previously reclaimed block. If so, then simply add this block to a reserve
kernel parameter.pci=nommconf
OR pci=realloc=off
, but to be honest I would stay away from those two as I'm not quite sure they can impact other devices you may have.Nice work here and big thanks for sharing your findings!A recent patch introduced by 6.2 tries to reclaim memory back from bios at the early boot process, after the bios report they are reserved. Eventually during the boot process, it will be used for mmio (to communicate with devices such as pcie). The problem is that on some platforms, this isn't a safe operation as some of the previously reported as reserved memory block isn't actually usable, thus failing whenever any of the devices that happen to be assigned to use that block (in our case the SAS controller but really could be anything else).
IMO it seems reasonable. After all the work you put in to dissect this issue it might best if you reply to the aforementioned patch that introduced this with all the details you gathered though. We certainly can jump in too, and definitively would cherry-pick any resulting patches.This issue came from a patch introduced in 6.2.
It also seems to have bitten other users as well, and not just us owners of this specific hardware. After it got merged, it quickly broke some other laptop hardware, and the developers patched it by avoiding that by avoiding the reclaim when the memory chunk is too small.
To avoid any of these workarounds at all, I would love some help bringing this up to the kernel upstream, and maybe to ubuntu kernel. Does that seem reasonable @t.lamprecht ?
menuentry 'Install Proxmox VE (Graphical)' --class debian --class gnu-linux --class gnu --class os {
echo 'Loading Proxmox VE Installer ...'
linux /boot/linux26 ro ramdisk_size=16777216 rw quiet splash=silent
echo 'Loading initial ramdisk ...'
initrd /boot/initrd.img
}
It's the linux entry. There are already settings there e.g. splash=silent. Just add it after that.Sorry I am new to grub and kernel parameters. I am trying to add pci=realloc=off but I don't know where to do that. is it here:
Code:menuentry 'Install Proxmox VE (Graphical)' --class debian --class gnu-linux --class gnu --class os { echo 'Loading Proxmox VE Installer ...' linux /boot/linux26 ro ramdisk_size=16777216 rw quiet splash=silent echo 'Loading initial ramdisk ...' initrd /boot/initrd.img }
I don't see any entryies that start with GRUB so I am not sure what to do, any help would be appreciated
update-grub
update-grub