[SOLVED] Updated to VE 7.0 - no web gui, DMAR errors on console

Hi,
just updated my homelab machine (HP Proliant Gen 8) to VE 7.0. However now the web-gui is not accessible, and I am getting these two error in the console at boot:

DMAR: DRHD: handling fault status reg 2 DMAR: [INTR-REMAP] Request device [01:00.0] fault index 18 [fault reason 38] Blocked an interrupt request due to source-id verification failure

Using my Proliant's iLo HTML-console, I can access command line, but often the abovementioned error appears and the machine locks up. The server also cannot ping anything, so there might be a network issue (as well?).

(My longshot: time ago I did some attempts on PCIe passthrue, so I might have settings lingering from there, perhaps causing an issue now? There are no auto-starting machines that use the passthrues anymore).

Any ideas, please? :confused: (Sorry, I know this is a bit short on details, but this just happened)
 

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
5,207
1,513
164
South Tyrol/Italy
shop.proxmox.com
Hi,

did you used the opt-in 5.11 or the 5.4 kernel previously? You could try rebooting into the older kernel, which should be still kept as installed on the system to rule out a regression there
 

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
5,207
1,513
164
South Tyrol/Italy
shop.proxmox.com
(My longshot: time ago I did some attempts on PCIe passthrue, so I might have settings lingering from there, perhaps causing an issue now? There are no auto-starting machines that use the passthrues anymore).
more information on that would be definitively good. Also mor of the kernel log (dmesg) surrounding above error.
 
I believe I used 5.4 kernel before.

Ok, I booted using the 5.4 kernel and I am not seeing the quoted errors anymore. Also typing thrue the console is much faster (was very sluggish on 5.11). However network seems still down (cannot ping my router) which of course explains why the web-gui is not accessible... What would be the easiest thing to troubleshoot that? (The device is a Broadcom NetXtreme BCM5720 2-port ethernet).

Sorry, the iLo console is slow, and I cannot copy/paste there... grrh.
 
I guess I have two problems:
- new kernel having issues with my HW -> slow typing plus the new errors
- upgrade messing something up with my network setup

For now I keep running on 5.4 kernel, no errors, but no network. As far as I see, my old eno1 and eno2 are still there (correct MAC address). In my /etc/network/interfaces:
- eno1 and eno2 declared as iface enoX inet manual
- auto bond0 with iface bond0 inet manual, bond-slaves eno1 eno2
- auto vmbr0 with iface vmbr0 inet static, manual IP address (as before)/24, bridge-ports bond0

ip link show shows that eno1 and eno2 are 'broadcast, multicast, up, lower_up'. bond0 is up but 'no-carrier' and vmbr0 is up.

There is a veth102i0@if2, veth108i0@if2 etc, total of five and I don't know what these are.

I am seeing other threads and comments about network issues, guess I am not the only one and somebody will figure this out.
 

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
5,207
1,513
164
South Tyrol/Italy
shop.proxmox.com
- new kernel having issues with my HW -> slow typing plus the new errors
I got my hands on an HP gen8 to test, I can reproduce the sluggish TTY and see the one error message, but if I connect over SSH all is well there, so it seems that that issue is affecting the console framebuffer and/or input devices only.

There is a veth102i0@if2, veth108i0@if2 etc, total of five and I don't know what these are.
Those are the virtual NICs of the container, you can read out the VMID and from there.

Can you post the whole ip addr output (censor public IPs, if any)?

I am seeing other threads and comments about network issues, guess I am not the only one and somebody will figure this out.
Network issues is a very wide area with a thousand possible different causes, from misconfiguration to kernel regression or even issues with the HW (NIC/switch).

Do you have the newest firmware installed on the HP?
 
  • Like
Reactions: leesteken
I got my hands on an HP gen8 to test, I can reproduce the sluggish TTY and see the one error message, but if I connect over SSH all is well there, so it seems that that issue is affecting the console framebuffer and/or input devices only.

Ok, good to know. I also noticed that the HTML5 console sometimes stops accepting input, so I might have yet another separate issue there.


Can you post the whole ip addr output (censor public IPs, if any)?

At the moment no, as I can't copy-paste. The output does not include any IP address'.


Network issues is a very wide area with a thousand possible different causes, from misconfiguration to kernel regression or even issues with the HW (NIC/switch).

I am referring to couple of posts with issues of no networking after upgrade to VE7. I've seen the discussion about the MAC address changing on the bridge interface, but I can't see how that would affect me (at home, and I am not doing any MAC-address whitelisting). I did replace my main switch some time ago, and I am having doubts about it's configuration (it did however work fine with the 6.4 for weeks). I might need to dig up my old switch and test it, when I am at the server next time...

I do not get DHCP offers at all when I boot from the install ISO (either 6.4 or 7.0) so yeah, dunno what's going on at the moment...
 

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
5,207
1,513
164
South Tyrol/Italy
shop.proxmox.com
FYI, we got a very similar report in the Bugtracker: https://bugzilla.proxmox.com/show_bug.cgi?id=3507
There the user could mitigate the issues by adding intel_iommu=off to the kernel commandline, which a HPE knowledge base entry even recommends as a solution for a seemingly similar issue:
https://support.hpe.com/hpesc/public/docDisplay?docId=kc0131952en_us&docLocale=en_US

At the moment no, as I can't copy-paste. The output does not include any IP address'.
But you see all network cards there? Also no issue in the syslog (journalctl -b)?
 

robingr

Member
Sep 5, 2017
10
1
23
32
I can confirm the issue on a Microserver Gen8, but even with the error the server boots up sucessfully after a while.
Just tested the
Code:
intel_iommu=off
and
Code:
intremap=no_x2apic_optout
workaround, unfornatually not working.

Code:
[    0.136176] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is 330)

[    1.643849] DMAR: DRHD: handling fault status reg 2
[    1.643917] DMAR: [INTR-REMAP] Request device [01:00.0] fault index 19 [fault reason 38] Blocked an interrupt request due to source-id verification failure

Latest ILO and Firmware.
 

janos

Active Member
Aug 24, 2017
173
15
38
Hungary
Hello,
Yesterday i upgraded from Proxmox 6 (with 5.4 kernel) to Proxmox 7 on a HP DL360 Gen8 server.

This server working several years without any issues.

After when we upgraded to Proxmox 7, we stared experience random freeze (no kernel trace or similar, its simply freeze).

I saw these error messages in syslog:
Code:
DMAR: DRHD: handling fault status reg 102
DMAR: [INTR-REMAP] Request device [01:00.0] fault index 21 [fault reason 38] Blocked an interrupt request due to source-id verification failure

i grepped for this in the log, and never saw these kind of error before, only startad to display after the upgrade

In google i found some iommu related issues with this error:

Now, as a workaround i booted the 5.4 kernel, and its working without issues.
 
Sorry for late update; I finally fixed the issue by redoing the /etc/network/interfaces file step by step. I believe the main issue after all was the bridge MAC-address; my "new" Dell switch was by default doing a LAG (two gb ethernet ports) using Layer 2 (MAC address') and switching that to Layer 3 (IP address') was probably the thing that solved it.

I still have the DMAR errors in the syslog (while using the 5.11 kernel) but the server runs Ok, and as I can access it over SSH or the web-gui, I am not too concerned with the sluggish iLO interface right now. I'll need to test if the intel_iommu=off parameter helps any.

Thanks!
 

ctolzane

New Member
Jul 14, 2021
1
0
1
57
Hello,

I have an HP G8 as well and I had exactly the same console issue with Proxmox7 (slow and most of the time not accepting ILO virtual keyboard inputs anymore). Adding option "nointremap" solved the pb.

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on nointremap"

Christophe.
 
Last edited:

hoomanjavadpoor

New Member
Aug 14, 2021
1
0
1
22
the same issue on dl360g8 and fix with enable "iommu" and add some grub option for the graphic card (the error occurred cause I have Nvidia GPU)

Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on pcie_acs_override=downstream,multifunction video=efifb:eek:ff"
 

haxxa

Active Member
Jun 26, 2015
37
6
28
I have an Intel NIC populating the PCIE slot in my HP MicroServer Gen 8.

Reverting to Kernel 5.4, fixes the issues or using the "intremap=off" kernel parameter fixes it too (when using kernel 5.11<). Use "intremap=off" instead of "nointremap" as it is deprecated. I also set "intel_iommu=off" as I don't use IOMMU.
 
Last edited:
  • Like
Reactions: Whitterquick
Dec 23, 2017
9
0
21
34
Sofia
dcp.solutions
Hello Friends,

I have the same issue on ProLiant DL380e Gen8. PVE 7 SMP PVE 5.11.22-10 (Tue, 28 Sep 2021 08:15:41 +0200) x86_64 GNU/Linux
I can confirm slow console video performance.

This works however
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=off intremap=off"

we are using this at the end

Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=igfx_off intremap=off"

Code:
01:00.1 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200EH

0-02-05-57f9424b8cc508e933997e3ae375a620c58abb94b59c324aff39d78028189b42_206caafedc1a3c.jpg
 
Last edited:

oladarula

New Member
Nov 12, 2021
1
0
1
48
had the same issue.

in /etc/default/grub:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=off intremap=off"
update-grub and reboot fixed it for me.

Thanks for your help guys!
 
Last edited:

RolandK

Member
Mar 5, 2019
155
16
23
49
i found some linux kernel upstream bugreport for this at https://bugzilla.kernel.org/show_bug.cgi?id=214795

edit: think this is NOT the same, but related, as ilo + iommu seem to interfere somehow.

on my server i'm getting:
[12925.378093] DMAR: DRHD: handling fault status reg 2
[12925.378158] DMAR: [INTR-REMAP] Request device [01:00.0] fault index 17 [fault reason 38] Blocked an interrupt request due to source-id verification failure

and 01:00.0 is:

# lspci |grep 01:00.0
01:00.0 System peripheral: Hewlett-Packard Company Integrated Lights-Out Standard Slave Instrumentation & System Support (rev 05





>update-grub and reboot fixed it for me.
on recent proxmox, use "pve-efiboot-tool refresh" after changing grub config


Code:
root@pve-hp:/etc/default# update-grub
Generating grub configuration file ...
W: This system is booted via proxmox-boot-tool:
W: Executing 'update-grub' directly does not update the correct configs!
W: Running: 'proxmox-boot-tool refresh'
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!