[SOLVED] Proxmox 6.x installation / kernel boot fails on Dell R6525 Epyc 7543 (Milan)

May 19, 2021
37
4
13
47
Hi, we just got our brand new Baby from Dell, a R6525 with a Dual Epyc 7543 (Milan) set of CPU's.

Since it will become part of our PVE cluster, i tried to install Proxmox 6.3 on it without success, the same with Proxmox 6.4.

The system gets stuck after the boot message, at least nothing happens here anymore

`Loading initial RAM-Disk`

What i tried so far

- an installation of a current FedoraCore 34 works (kernel 5.11)
- removing quiet from command line does not help, neither adding nomodeset as a kernel parameter (things i found somewhere else).
- gparted 1.1 and 1.3 do not boot either (same behaviour)
- i assume its a visual problem, since my virtual media shows me continous loading of the media after booting (like the system is still running), especially on gparted it stops, if i blindly press <return>, it goes on loading ...
- in the installed fedora system i added the kernel/initrd from one of our running cluster nodes (Kernel 5.4.73-1-pve) and tried to boot it, also, no single output once the kernel should come up
- VGA compatible controller: Matrox Electronics Systems Ltd. Integrated Matrox G200eW3 Graphics Controller (rev 04)

- lspci see below

Do you have any ideas? or is the 6.4 installer just not yet ready?

Thanks
 

Attachments

  • lspci.txt
    14.6 KB · Views: 7
Stoiko, thanks for the quick response! just tried that verision but it gets stuck as well. the Virtual Disk does not load anymore but no output on the bootloader ...

please let me know if i can be of any help to debug the issue

Screenshot 2021-05-19 at 17.07.00.png
 
Hmm - Things I would try:
* actually dd the ISO to a physical USB-key and boot from that (had my fair share of failing installs from the ISO due to iDrac/ILO/IPMI remote disk-attachment having problems)
* try booting the debug mode - this should bring you to a shell rather soon after reading the initrd
* remove the quiet-flag from the grub entry for the debug mode

I hope this helps!
 
Booting the PVE 5.11 Kernel with earlycon=efifb gives me (a horribly slow) console and shows the kernel output.

but at the end it disables the bootconsole efifb0 and im stuck (or rather blind) again ...

sshot.jpg
 
The kernel-commandline does not look like it's from the installer?
(the kernel on the iso is /boot/linux26.. and proxdebug is missing)

else - since it's a rack-server - maybe try using the serial as console (it can be forwarded to the iDrac - and at least in older models it was possible to connect to it when connecting to the iDrac via ssh)
 
you are right, its not, i copied the kernel and the initrd out and tried a manual boot.

so, i attached the virtual com port now and can see the full boot log (attached). but in debug mode it tells me i'm in a shell but when i type commands, they do not get executed.

comparing the boot log to the fedora system, it seems to me, that the mga200 driver is not loaded so the fbcon can not switch to it. is there a way in proxmox to load modules via the kernel cmdline?
 

Attachments

  • serial-boot.txt
    150 KB · Views: 2
Last edited:
i just double-checked with my Fedora Install, the console itself seems to work


Code:
Fedora 34 (Server Edition)

Kernel 5.11.20-300.fc34.x86_64 on an x86_64 (ttyS0)

Web console: https://fedora:9090/ or https://xxx:9090/



fedora login: root

Password:

Last login: Wed May 19 19:35:14 from xxx

[root@fedora ~]#
 
i guess this is the reason for the non-working rescue shell on the serial console in debug mode (init script from initrd)

Code:
debugsh() {
    setsid sh -c '/bin/sh </dev/tty1 >/dev/tty1 2>&1'
}
 
Last edited:
after patching the initrd, adding all modules and booting it into a shell i found the following error while inserting the mgag200 module

Code:
mgag200 0000:62:00.0: [drm] *ERROR* can't reserve VRAM

another last try was to simply boot the default 6.4 installer and log everything to the serial console (attached file).

also here, i guess once the installer is supposed to start, one can see very same the error

Code:
[   41.463098] mgag200 0000:62:00.0: vgaarb: deactivate vga console
[   41.469427] mgag200 0000:62:00.0: [drm] *ERROR* can't reserve VRAM

for now i am at the end of my latin :-(
 

Attachments

  • serial-boot-install.txt
    162.4 KB · Views: 0
after patching the initrd, adding all modules and booting it into a shell i found the following error while inserting the mgag200 module
the PVE installer needs a console - since it runs as X11 application for now - so even if you get a console it won't help you for installing the system with the PVE installer - you can always try to install PVE on top of debian:
https://pve.proxmox.com/wiki/Install_Proxmox_VE_on_Debian_Buster

else regarding your issue - I found the following links which might help:

https://elrepo.org/bugs/view.php?id=668

https://bbs.archlinux.org/viewtopic.php?id=235124
(this indicates that it might be a kernel config issue)

could you try adding the mgag200.nomodeset=0 parameter to the kernel commandline?
 
Thanks Stoiko,

- the patching of the initrd was for debug purposes only to figure out the root of the problem.
- the kernel config issue is not the problem, both, the fedora 34 as well as the default pve kernel do not have CONFIG_X86_SYSFB=y set
- mgag200.nomodeset=0 parameter did not help

i will try to boot an old already installed pve system now and if this fails i will try Debian Buster as a base

Further thoughts are still welcome, since i think this hardware will become very common in the next years...
 
Next update to the issue : proxmox 6.4 install on top of debian works, but the console output does not as soon as the PVE kernel 5.4.114-1 gets installed and booted. blindly logging into the non-responding console and running commans works as well as logging in via SSH!

The error is still

[ 284.895948] mgag200 0000:62:00.0: remove_conflicting_pci_framebuffers: bar 1: 0x9e008000 -> 0x9e00bfff [ 284.895950] mgag200 0000:62:00.0: remove_conflicting_pci_framebuffers: bar 2: 0x9d800000 -> 0x9dffffff [ 284.895951] mgag200 0000:62:00.0: remove_conflicting_pci_framebuffers: passed res_id (0) is not a memory bar [ 284.895954] mgag200 0000:62:00.0: vgaarb: deactivate vga console [ 284.896333] [drm:mgag200_driver_load [mgag200]] *ERROR* can't reserve VRAM [ 284.896339] mgag200 0000:62:00.0: Fatal error during GPU init: -6
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!