Stuck immediately after Bootloader / start of Kernel Boot

silverstone

Well-Known Member
Apr 28, 2018
165
20
58
36
I can observe several Issues (NOT only related to Proxmox VE 9 / Debian Trixie) but rather also e.g. Kernel 6.8 or even Kernel 6.5 on multiple Xeon E3 v3 Systems based on Supermicro X10SLL-F / X10SLM-F Platforms.

This is what it looks like immediately after the Bootloader is exited and the Kernel starts the Boot Process:
1754983014536.png

It's stuck there.

Any idea what's going on ?

I need to run Kernel 5.15.131 in order to get it to boot :rolleyes: .

Possibly a very old PCIe Device such as a Mellanox NIC ConnectX or ConnectX-2 ?

I tried to also enable Above 4G Decoding in the BIOS bus that didn't help unfortunately :(.
 
Last edited:
I tried both with SOL (Serial over LAN) and a local Null-Modem Cable connected between the DB9 Serial Port of one Host and the DB9 Serial Port of the other Host.

Nothing is working :( .

I tried with: minicom --device /dev/ttyX --baudrate 115200 for X equal to:
  • ttyS0
  • ttyS1
  • ttyS2
  • tty0
  • tty1
  • tty2

And while I could for a brief Period get something over SOL through IPMIView, I couldn't manage to get that working using ipmitool so that I could log it in real Time unfortunately.

Any other Idea ?
 
Any Idea ?

EDIT 1: back several Months for a similar Issue (digging through my Emails since Forum Search is quite Bad), @t.lamprecht suggested adding Options earlyprintk=vga,keep to Kernel Command Line, so I might as well try that.

Weird that Serial Debugging don't work though. Is the Freeze / Panic happeening way too early for serial Debugging to work (less than 1 Second after Kernel Starts booting) ?
 
Last edited:
According to the blockdiagram, that Supermicro provides, your UARTs are both coming directly from the Aspeed AST2400 BMC. Plus there seems to be a suspicious "Serial Mux" BIOS setting in the "IPMI" Submenu.

Might be, that both UARTs are exclusively provided for SOL-usage by default. And only by switching that "Mux" setting, you could get them routed directly to the COM1/COM2 headers, mentioned in the manual. Maybe then you can get some real debug output with "console=ttySx,115200n8 earlyprintk".
 
  • Like
Reactions: silverstone
According to the blockdiagram, that Supermicro provides, your UARTs are both coming directly from the Aspeed AST2400 BMC. Plus there seems to be a suspicious "Serial Mux" BIOS setting in the "IPMI" Submenu.

Might be, that both UARTs are exclusively provided for SOL-usage by default. And only by switching that "Mux" setting, you could get them routed directly to the COM1/COM2 headers, mentioned in the manual. Maybe then you can get some real debug output with "console=ttySx,115200n8 earlyprintk".
Thank you for your Reply :) .

When I was in BIOS the other Day, I am pretty sure it was under the Advanced Features Submenu, NOT the IPMI one.

COM1 was set to COM by Default, whereas COM2 was set to SOL by Default.

Logically COM1 would be the DB9 Serial Port in the Back whereas COM2 (if NOT set to SOL) would be the Headers (since that would NOT be connected, unless you install a PCI Slot DB9 Serial Port Adapter), but maybe they are mixed up. It's possible of course.

Right now my Assumption was that the Kernel Freezes / Panics so quickly that EVEN Serial Debugging was not initialized yet.

Of course I could try to also switch COM1 and COM2 around. I could get the GRUB Menu showing up in Supermicro IPMIView over SOL, although the Freeze/Panic is so fast that I couldn't see anything after that.
 
The screenshot you posted is from PCI(e) initialization. There should be a lot of output before that.
So no, I doubt, that the kernel crashes "that fast". "0.34 seconds" (according to your screenshot) might sound early, but in terms of kernel boot, there is a lot going on before that.

Do you still have the "quiet" commandline parameter somewhere in your Linux kernel commandline for the bootloader?
If yes, get rid of it. Otherwise you can also add "loglevel=7" for maximum output verbosity.
Oh and make sure to edit the right files. Some setups use systemd-boot (/etc/kernel/cmdline), while others use grub2 (/etc/default(grub). Double-check that in any case!

And BTW, is there a specific reason, that you limited your trials to upto "ttyS2"? It might be, that the kernel still sees the chipset UARTs (depending on how Supermicro set up the BIOS) and lists the Aspeed ones only above that. So maybe even higher numbers.
 
  • Like
Reactions: silverstone
The screenshot you posted is from PCI(e) initialization. There should be a lot of output before that.
So no, I doubt, that the kernel crashes "that fast". "0.34 seconds" (according to your screenshot) might sound early, but in terms of kernel boot, there is a lot going on before that.
Alright, Fingers crossed :).

Do you still have the "quiet" commandline parameter somewhere in your Linux kernel commandline for the bootloader?
If yes, get rid of it. Otherwise you can also add "loglevel=7" for maximum output verbosity.
I think I added a debug Parameter too (also used to debug Initramfs), but it might well be that there is quiet Parameter that I missed. Thanks for the Tip. Let me check that :).

Oh and make sure to edit the right files. Some setups use systemd-boot (/etc/kernel/cmdline), while others use grub2 (/etc/default(grub). Double-check that in any case!
That's surely NOT an Issue in my Case :). It's always GRUB2. For how "bad" everybody considers it compared to Systemd or ZFSBootMenu especially with ZFS (and I had to go back to /boot on a mdadm RAID-1 Mirror of EXT4 Partition because it would NOT work, even with compatibility Mode set, when ZFS Snapshots got created), I am using that pretty much everywhere (minus the ARM Single Board Computers).

And BTW, is there a specific reason, that you limited your trials to upto "ttyS2"? It might be, that the kernel still sees the chipset UARTs (depending on how Supermicro set up the BIOS) and lists the Aspeed ones only above that. So maybe even higher numbers.
My logic was that since COM1 & COM2 are the Physical Ports, I was thinking that they would be the First ones (plus maybe one "spare" or ttyS0 was reserved by something else on the ASPEED BMC).

Is there any Way to find out which one it's really active ?

I couldn't get it to work at all even in a Booted Environment though (when I boot Kernel 5.15.131 I did try to send some Serial Messages from one PC to the other, but unfortunately it didn't work).

Furthermore I remember I once had to Troublshoot a Network Switch using DB9 Serial, and I'm pretty sure I was using /dev/ttyS0 or /dev/ttyS1 in that Case (different Situation, the System was already booted up and the Supermicro PC was NOT trying to send Debug Information via a Null-Modem Cable like I am trying here).

Otherwise I should probably order a few USB Adapters. I have plenty of CH340/CH341/CP2102 and Similar Adapters (mostly bought to troubleshoot Microcontrollers, some to flash some BIOS Chips via SPI), but they are almost all breakout Type OR it's NOT an USB Connection :rolleyes:.
 
As I just re-read your posting, this seems like it might be helpful:
https://www.seeedstudio.com/blog/2019/12/11/rs232-vs-ttl-beginner-guide-to-serial-communication/

As you wrote about all those different, but most likely all TTL-level, USB to UART adapters, the above might help your understanding.
You will certainly need something that talks RS232, not just TTL-level. Otherwise you will never see output on the DB-9 connection of your board.

That might also explain, why the SOL works for you. No pyhsical level translation required there. :-)
 
As I just re-read your posting, this seems like it might be helpful:
https://www.seeedstudio.com/blog/2019/12/11/rs232-vs-ttl-beginner-guide-to-serial-communication/

As you wrote about all those different, but most likely all TTL-level, USB to UART adapters, the above might help your understanding.
You will certainly need something that talks RS232, not just TTL-level. Otherwise you will never see output on the DB-9 connection of your board.

That might also explain, why the SOL works for you. No pyhsical level translation required there. :-)
I'm not sure if that is the Issue.

As I said, when I connected the DB9 on the Back of one of these Supermicro Motherboards to a Zyxel GS1910-24 Switch, I could get it to work without Issues with minicom. Since RS232 seemed fine there, why do you think that's the Issue ?

I agree with the Adapters that I listed (CH340/CH341/CP2102) I also need a TTL to RS-232 Adapter (unsure about the exact P/N, I ordered some from Aliexpress already).

However the current Procedure is to just use a Serial DB9 Female - DB9 Female (Null-Model) Cable as it is. Also, when I had to connect via minicom to the Zyxel GS1910-24 Switch, it was a Serial DB9 Female - DB9 Male Cable. Again, no level Shifting required, thus I believe none is required between 2 Supermicro Motherboards using the Null-Modem Cable.

Since the Problem seems to be to Connect the DB9 between the 2 Supermicro Motherboards with the Null-Model Cable, I wonder if the Serial Port Redirection in the BIOS needs to be different betwently the "Host" (Master) and "Target" (Slave, the one I want to troubleshoot). For COM1 I mean ...
 
Don't get me wrong, but this should not be a very hard task.
So, the logical conclusion would be, that you either have defective hardware, or that you are doing something wrong.
You could also have a look at the output on the COM-header of your board with an oscilloscope, Maybe that could give some hints about why it doesn't work like intended.
 
  • Like
Reactions: silverstone
Don't get me wrong, but this should not be a very hard task.
So, the logical conclusion would be, that you either have defective hardware, or that you are doing something wrong.
You could also have a look at the output on the COM-header of your board with an oscilloscope, Maybe that could give some hints about why it doesn't work like intended.
No it should not.

I agree, the most likely explanation is something very stupid going on. Probably with the User :p. I just fail to see what exactly ...

This Week I'm completely under Water at work, so no Time to play around with Scopes. But yeah, when Software fails, you have to look at the Hardware :rolleyes:.
 
Final idea for today, considering the board's age:
Did you already check the voltage level of the CMOS battery? The evil range somewhere between 0-2V is always a good explanation for "stupid" or "weird" stuff going on. If it's nothing else, that is. :-)
 
  • Like
Reactions: silverstone
Final idea for today, considering the board's age:
Did you already check the voltage level of the CMOS battery? The evil range somewhere between 0-2V is always a good explanation for "stupid" or "weird" stuff going on. If it's nothing else, that is. :-)
Good Point :). It could be running low indeed.

As an intermediary Step I think I could try to see if the Port is "alive" on both Systems (at separate Steps) to a Zyxel GS1910-24 Switch, and check if they work at all.

If they do then it might also be a faulty Null-Modem Cable.