[SOLVED] Proxmox Won't Boot with Latest Kernel 5.13.19-2-pve

DavidKahl

New Member
Aug 15, 2021
24
5
3
31
Hi All,

I hope this reaches you all well.

I just ran updates on my Server, one of which was the the latest 5.13.19-2-pve Kernel. Following a reboot the system wouldn't boot into the Proxmox PVE. Only once selecting the old Kernel in the GRUB Screen 5.13.19-1-pve would the system actual run and allow me to boot. Any reboot reverts the server into the newer Kernel version which results in it having to be set manually.

I've tried to uninstall the newer kernel and that results in an Error and wants to uninstall the PVE as well, so I'm lost what my other options are. Furthermore, I tried to set the Grub Default to boot record "1" which should also be the older version but that didn't work either.

Any Ideas?

NB: On Boot there were NOT OK entries for two old ZPOOL entries. - These weren't seen in the ZPOOL List - I'm unsure if that has any impact?!

Resources: PVEVERSION -V
proxmox-ve: 7.1-1 (running kernel: 5.13.19-1-pve)
pve-manager: 7.1-7 (running version: 7.1-7/df5740ad)
pve-kernel-helper: 7.1-6
pve-kernel-5.13: 7.1-5
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.13.19-1-pve: 5.13.19-3
ceph-fuse: 16.2.6-pve2
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-4
libpve-storage-perl: 7.0-15
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.1.2-1
proxmox-backup-file-restore: 2.1.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-4
pve-cluster: 7.1-2
pve-container: 4.1-2
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-4
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3

Resources GRUB:
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet"
GRUB_CMDLINE_LINUX=""

Appreciate your assist.

D
 

Steffenprox

Member
Feb 22, 2020
57
12
13
33
Hello i don't think you use grub on your configuration. Check this wiki page: https://pve.proxmox.com/wiki/Host_Bootloader

My system with AMD Ryzen™5 4500U system is using Systemd-boot and could also not start with the Kernel 5.13.19-2-pve.

I managed to choose the previous kernel as default at boot by selecting the previous 5.13.19-1-pve Kernel. When you hover it press "d" its selected by default. see: https://www.man7.org/linux/man-pages/man7/sd-boot.7.html

My Proxmox Server starts and breaks after a few seconds with the 5.13.19-1-pve Kernel.
 

Attachments

  • 5.13.19-2-pve-.log
    334 KB · Views: 18

DavidKahl

New Member
Aug 15, 2021
24
5
3
31
Hi Steffen,

Thanks for the support.

When I check with efibootmgr -v this is the Proxmox output:

Boot0001* proxmox HD(2,GPT,ca72790a-27b1-4118-80a5-c46fd6868c6c,0x800,0x100000)/File(\EFI\PROXMOX\GRUBX64.EFI)

From my interpretation its GRUB being used. Furthermore when I check the settings pertaining to the Systems-Boot all files are empty.
 

tauceti

New Member
May 11, 2021
21
5
3
40
I have the same issue with the new kernel... 5.13.19-2-pve where I cannot boot anymore :(
I also use grub and have an AMD system.
How can I remove the new kernel without removing the pve packages as David mentioned?

OK I got it:
check the apt logs:
Start-Date: 2021-12-04 21:23:24
Commandline: apt upgrade
Install: pve-headers-5.13.19-2-pve:amd64 (5.13.19-4, automatic), pve-kernel-5.13.19-2-pve:amd64 (5.13.19-4, automatic)
Upgrade: pve-kernel-5.13:amd64 (7.1-4, 7.1-5), pve-headers-5.13:amd64 (7.1-4, 7.1-5), pve-kernel-helper:amd64 (7.1-4, 7.1-6)
End-Date: 2021-12-04 21:24:01

So install first the old versions 7.1-4 of the upgraded packages: pve-kernel-5.13:amd64 (7.1-4, 7.1-5), pve-headers-5.13:amd64 (7.1-4, 7.1-5), pve-kernel-helper:amd64 (7.1-4, 7.1-6)
e.g. with apt install pve-kernel-5.13=7.1-4
Then remove pve-headers-5.13.19-2-pve and pve-kernel-5.13.19-2-pve
then also apt autoremove and reboot.
 
Last edited:
  • Like
Reactions: babelhoo and moose

nicedevil

Member
Aug 5, 2021
52
3
8
same on my site, I have a HM80 from Minisforum.
Took me ages to find out what was the issue.

On boot it stops right after AMD GPU initialization, and then after around 5 minutes (maybe less, maybe more) it tells that the synchronize boot up for ifupdown is the problem.

1638699764450.png

On system shutdown it stops at
1638699786987.png

The solution is/was to select the working pve-kernel-5.13.19-1 kernel (or any other working one) at the startupscreen of proxmox.
Install the old version like tauceti did with apt install pve-kernel-5.13=7.1-4 and then remove the old kernel with dpkg -P pve-kernel-5.13.19-2-pve. Reboot and the world is working again...
 
Last edited:
  • Like
Reactions: moose and tauceti

moose

Member
Nov 19, 2017
19
7
23
Bavarian Alps
Same problem on my AMD based system, a HP MicroServer Gen 10 with AMD Opteron(tm) X3421 APU CPU. My INTEL based Zotac ZBox with Intel Celeron CPU N3150 @ 1.60GHz CPU is not affected.

I can confirm the problem occurs just after initializing the AMD GPU (better said, trying to do so ;-) ) because the screen gets black and the boot process hangs afterwards ...

FYI I add 3 files from /var/log directory: debug, kern.log and messages (because I can't upload files w/o extention, I had to add .log to debug and messages files). I separated the output sections from booting kernel 5.13.19-2-pve (which hangs) and 5.11.22-7-pve (which works fine) by 3 blank lines. Maybe this will help the 'kernel composers' at ProxMox to find the bug and to compile a new kernel w/o bugs ;-)

Greetinx

moose
 

Attachments

  • debug.log
    33.6 KB · Views: 5
  • kern.log
    728.7 KB · Views: 3
  • messages.log
    798.6 KB · Views: 2
Last edited:
  • Like
Reactions: tauceti

tauceti

New Member
May 11, 2021
21
5
3
40
same on my site, I have a HM80 from Minisforum.
Took me ages to find out what was the issue.

On boot it stops right after AMD GPU initialization, and then after around 5 minutes (maybe less, maybe more) it tells that the synchronize boot up for ifupdown is the problem.

View attachment 32063

On system shutdown it stops at
View attachment 32064

The solution is/was to select the working pve-kernel-5.13.19-1 kernel (or any other working one).
Install the old version like tauceti did with apt install pve-kernel-5.13=7.1-4 and then remove the old kernel with dpkg -P pve-kernel-5.13.19-2-pve. Reboot and the world is working again...
Awesome thanks! I have the same errors in the logs also with ifupdown and also with system shutdown
 

nicedevil

Member
Aug 5, 2021
52
3
8
Now question for the guys out here that know more about me for linux :D

how can we block the update to 5.13.19-2-pve?

ok, found a solution, but I guess that will block all future kernels with 5.13.xxx as well right? Just holding 5.13.19-2-pve wont block it at all :(

apt-mark hold 'pve-kernel-5.13'
 
Last edited:

aC23

Member
Nov 21, 2019
10
0
6
44
I ran into the same issue with kernel 5.13.19-2-pve after upgrading to Proxmox 7.1 from v6. Booting using an older kernel is what resolved it. Is this just a faulty kernel?
 

DavidKahl

New Member
Aug 15, 2021
24
5
3
31
@moose: I have the same on the Proliant with the exact same specs. An Intel server isn't affected.

@tauceti: I'll try the fix later -and then revert.

Thanks in advance!
 

DavidKahl

New Member
Aug 15, 2021
24
5
3
31
Hi All,

So the provided fix by @tauceti & @nicedev work so thanks a bunch for that.

On the other end I still believe that there is an issue in the 5.13 Kernel - the reboot sequence doesn't physically shut down the device before booting it back up. When you send the Reboot/Shut Down command the device stays on physically. Any suggestions on that?

BR

David
 
  • Like
Reactions: tauceti
Apr 21, 2015
23
5
23
Running into this issue as well. Specifically with the -2 kernel. -1 Boots just fine.

Also put a Gen 10 Hp Proliant Microserver with AMD Opteron Dual Core on the stack of affected machines in the same veign.

This is my home box with community repo. This also a thing on enterprise?
Can't physically access my work machines nearly as easily.

Als seeing the issue @DavidKahl points out with the -1 kernel.

On the other end I still believe that there is an issue in the 5.13 Kernel - the reboot sequence doesn't physically shut down the device before booting it back up. When you send the Reboot/Shut Down command the device stays on physically. Any suggestions on that?

Edit:

As far as booting headless box with older kernel.
You can just edit grub file to select older kernel per default.
This change doesn't really do anything other then dictating what option is autoselected by grub menu, exactly what you do when you select with a keyboard.

Might be a safer option as its non destructive.

/etc/default/grub

And then value GRUB DEFAULT change from 0 to "1>2". (With quotes)
Which in my case selects the advanced boot options and in the following submenu the kernel I want to boot.

The numbers represent the menu options you would choose. 0 being the first option and > specifying a subsequent option in a submenu.

This 1>2 is correct for me but double check for yourself as it entirely depends on the kernels and in fact what other options you may have defined in the grub menu.

After that run update-grub and your box at least autoboots correctly for now.

When you download and install kernels this obviously changes so keep this change in the back of your heads. It's just a workaround untill this problem's root cause is somehow addressed.
 
Last edited:

DavidKahl

New Member
Aug 15, 2021
24
5
3
31
Hi @Kephin ,

Thanks for elaborating this, and my server is headless as well. Hence if I do move the system it's a pain in the backside, but its setup to a monitor until I fix this issue.

I've uninstalled all Kernels but the "5.13.19-1" - so I guess in Grub I should configure this to "1>0"? Let me know your thoughts - would be much appreciated.

Best

D
 
Apr 21, 2015
23
5
23
Hi @Kephin ,

Thanks for elaborating this, and my server is headless as well. Hence if I do move the system it's a pain in the backside, but its setup to a monitor until I fix this issue.

I've uninstalled all Kernels but the "5.13.19-1" - so I guess in Grub I should configure this to "1>0"? Let me know your thoughts - would be much appreciated.

Best

D
If you've already removed the problematic kernel and you auto-boot correctly now there's not real value in doing this i think.
But if this is the latest kernel you have installed then 1>0 would be correct, assuming 1 (beeing the second menu option in the first menu) actually opens the advanced options/kernel list for you as well.
However if you re-install the -2 kernel, it would likely become 1>2, so again keep this in mind.
It's not intended as a deploy and forget solution, just as a quick and dirty workaround without actually juggeling packages.

Easiest way to know for sure what options you need is to just hook up screen + keyboard and write down what options you need to choose to boot what you want.
It's more for future reference or anybody else running into this.
It's pretty easy to convince grub to have different default behavior this way for troubleshooting purposes.
 
Last edited:
  • Like
Reactions: DavidKahl

DavidKahl

New Member
Aug 15, 2021
24
5
3
31
Thanks All!

Just to Summarise this Thread:

If you have an issue with the latest Kernel "5.13.19-2-pve" and it won't boot properly with such Kernel (AMD Systems seem affected) use the following steps:

Code:
apt install pve-kernel-5.13=7.1-4

then remove the old kernel with the following

Code:
dpkg -P pve-kernel-5.13.19-2-pve

Set the kernel updates to avoid the latest 5.13 with

Code:
apt-mark hold 'pve-kernel-5.13'

--
if you're still having issues modify GRUB with:

Code:
nano /etc/default/grub

And then value GRUB DEFAULT change from 0 to "1>2". (With quotes)

Note: "1>2" denotes the Advanced option with the second kernel installed.

Run the following to update your boot system

Code:
update-grub


Special thanks to @tauceti @Kephin @nicedevil
 
Apr 21, 2015
23
5
23
Thanks All!

Just to Summarise this Thread:

If you have an issue with the latest Kernel "5.13.19-2-pve" and it won't boot properly with such Kernel (AMD Systems seem affected) use the following steps:

Code:
apt install pve-kernel-5.13=7.1-4

then remove the old kernel with the following

Code:
dpkg -P pve-kernel-5.13.19-2-pve

Set the kernel updates to avoid the latest 5.13 with

Code:
apt-mark hold 'pve-kernel-5.13'

--
if you're still having issues modify GRUB with:

Code:
nano /etc/default/grub

And then value GRUB DEFAULT change from 0 to "1>2". (With quotes)

Note: "1>2" denotes the Advanced option with the second kernel installed.

Run the following to update your boot system

Code:
update-grub


Special thanks to @tauceti @Kephin @nicedevil

My tip is entirely seperate from the path's outlined earlier in this topic.
If you run into this issue and you have not done anything yet, only my suggested GRUB config change is enough to have a bootable scenario again.

Deleting the offending kernel and/or excluding it from updates is not required if you just tell GRUB what kernel to use for booting.
It's likely this issue will be resolved at some point and then you'll want the new kernel again. Or possibly even the "broken" one with a change elsewhere.
Deleting it will complicate troubleshooting/testing.

Edit:

This issue is not solved it merely has a workaround!
 
Last edited:

nicedevil

Member
Aug 5, 2021
52
3
8
My tip is entirely seperate from the path's outlined earlier in this topic.
If you run into this issue and you have not done anything yet, only my suggested GRUB config change is enough to have a bootable scenario again.

Deleting the offending kernel and/or excluding it from updates is not required if you just tell GRUB what kernel to use for booting.
It's likely this issue will be resolved at some point and then you'll want the new kernel again. Or possibly even the "broken" one with a change elsewhere.
Deleting it will complicate troubleshooting/testing.

Edit:

This issue is not solved it merely has a workaround!

That's absolutly correct but it is still not a huge thing to just do a

Code:
apt-mark unhold 'pve-kernel-5.13'

and refresh the update page afterwards. Both solutions will work :)

@DavidKahl my machine is shutting down normal as it should on 5.13.19.1. Maybe I bios setting isn't correct?

BTW: I have 2 servers running proxmox in my homelab. Both of them are running AMD CPU one is 4800U and one is 3600X. Last one don't have that issue with kernel 5.13.19.2. It is running on an ASRockRack X470DU (or something like this).
 

MaPf

New Member
Apr 5, 2020
3
0
1
43
It might not belong here but my Proxmox PMG is running on a small Ryzen 1500B and does not start on any 5.13 kernel. I have to set GRUB to start with 5.11.22 from before the PMG update
 
Apr 21, 2015
23
5
23
Just adding info, i've been able to extract the following from kern.log on my box:

Code:
Dec  6 20:24:45 arcturus kernel: [   20.359574] amdgpu 0000:00:01.0: amdgpu: amdgpu_device_ip_init failed
Dec  6 20:24:45 arcturus kernel: [   20.359580] amdgpu 0000:00:01.0: amdgpu: Fatal error during GPU init
Dec  6 20:24:45 arcturus kernel: [   20.359586] amdgpu 0000:00:01.0: amdgpu: amdgpu: finishing device.
Dec  6 20:24:45 arcturus kernel: [   20.362059] BUG: kernel NULL pointer dereference, address: 00000000000001db
Dec  6 20:24:45 arcturus kernel: [   20.362071] #PF: supervisor read access in kernel mode
Dec  6 20:24:45 arcturus kernel: [   20.362075] #PF: error_code(0x0000) - not-present page
Dec  6 20:24:45 arcturus kernel: [   20.362079] PGD 0 P4D 0
Dec  6 20:24:45 arcturus kernel: [   20.362084] Oops: 0000 [#1] SMP NOPTI
Dec  6 20:24:45 arcturus kernel: [   20.362090] CPU: 1 PID: 433 Comm: systemd-udevd Tainted: P           O      5.13.19-2-pve #1
Dec  6 20:24:45 arcturus kernel: [   20.362096] Hardware name: HPE ProLiant MicroServer Gen10/ProLiant MicroServer Gen10, BIOS 5.12 02/19/2020
Dec  6 20:24:45 arcturus kernel: [   20.362101] RIP: 0010:smu8_dpm_powergate_acp+0xc/0x40 [amdgpu]
Dec  6 20:24:45 arcturus kernel: [   20.362790] Code: 7a f7 fd ff 44 89 ea 4c 89 e7 31 c9 be 13 00 00 00 e8 68 f7 fd ff 31 c0 41 5c 41 5d 5d c3 0f 1f 44 00 00 48 8b 87 c0 01 00 00 <40> 38 b0 db 01 00 00 74 23 55 31 d2 48 89 e5 40 84 f6 74 0c be 0b
Dec  6 20:24:45 arcturus kernel: [   20.362799] RSP: 0018:ffffa7cb0184b860 EFLAGS: 00010286
Dec  6 20:24:45 arcturus kernel: [   20.362804] RAX: 0000000000000000 RBX: ffff981688ce0000 RCX: 000000000000000a
Dec  6 20:24:45 arcturus kernel: [   20.362810] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff98168965d800
Dec  6 20:24:45 arcturus kernel: [   20.362813] RBP: ffffa7cb0184b880 R08: 000000000000000f R09: 0000000000000000
Dec  6 20:24:45 arcturus kernel: [   20.362817] R10: ffff981688502c01 R11: ffff981688502c00 R12: ffff98168965d800
Dec  6 20:24:45 arcturus kernel: [   20.362821] R13: ffffffffc12ac300 R14: ffff981688ce0010 R15: ffff981688ce0000
Dec  6 20:24:45 arcturus kernel: [   20.362825] FS:  00007fcf4293a8c0(0000) GS:ffff981777480000(0000) knlGS:0000000000000000
Dec  6 20:24:45 arcturus kernel: [   20.362830] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec  6 20:24:45 arcturus kernel: [   20.362834] CR2: 00000000000001db CR3: 0000000102dd2000 CR4: 00000000001506e0
Dec  6 20:24:45 arcturus kernel: [   20.362839] Call Trace:
Dec  6 20:24:45 arcturus kernel: [   20.362845]  ? pp_set_powergating_by_smu+0x1ee/0x2b0 [amdgpu]
Dec  6 20:24:45 arcturus kernel: [   20.363487]  amdgpu_dpm_set_powergating_by_smu+0x70/0x100 [amdgpu]
Dec  6 20:24:45 arcturus kernel: [   20.364133]  acp_hw_fini+0x154/0x160 [amdgpu]
Dec  6 20:24:45 arcturus kernel: [   20.364705]  amdgpu_device_fini+0x1d3/0x49f [amdgpu]
Dec  6 20:24:45 arcturus kernel: [   20.365504]  amdgpu_driver_unload_kms+0x43/0x70 [amdgpu]
Dec  6 20:24:45 arcturus kernel: [   20.366171]  amdgpu_driver_load_kms.cold+0x46/0x83 [amdgpu]
Dec  6 20:24:45 arcturus kernel: [   20.366890]  amdgpu_pci_probe+0x12a/0x1b0 [amdgpu]
Dec  6 20:24:45 arcturus kernel: [   20.367473]  local_pci_probe+0x48/0x80
Dec  6 20:24:45 arcturus kernel: [   20.367484]  pci_device_probe+0x105/0x1c0
Dec  6 20:24:45 arcturus kernel: [   20.367490]  really_probe+0x24b/0x4c0
Dec  6 20:24:45 arcturus kernel: [   20.367499]  driver_probe_device+0xf0/0x160
Dec  6 20:24:45 arcturus kernel: [   20.367503]  device_driver_attach+0xab/0xb0
Dec  6 20:24:45 arcturus kernel: [   20.367508]  __driver_attach+0xb2/0x140
Dec  6 20:24:45 arcturus kernel: [   20.367513]  ? device_driver_attach+0xb0/0xb0
Dec  6 20:24:45 arcturus kernel: [   20.367517]  bus_for_each_dev+0x7e/0xc0
Dec  6 20:24:45 arcturus kernel: [   20.367522]  driver_attach+0x1e/0x20
Dec  6 20:24:45 arcturus kernel: [   20.367525]  bus_add_driver+0x135/0x1f0
Dec  6 20:24:45 arcturus kernel: [   20.367530]  driver_register+0x91/0xf0
Dec  6 20:24:45 arcturus kernel: [   20.367534]  __pci_register_driver+0x57/0x60
Dec  6 20:24:45 arcturus kernel: [   20.367538]  amdgpu_init+0x77/0x1000 [amdgpu]
Dec  6 20:24:45 arcturus kernel: [   20.368226]  ? 0xffffffffc14e8000
Dec  6 20:24:45 arcturus kernel: [   20.368233]  do_one_initcall+0x48/0x1d0
Dec  6 20:24:45 arcturus kernel: [   20.368242]  ? kmem_cache_alloc_trace+0xfb/0x240
Dec  6 20:24:45 arcturus kernel: [   20.368250]  do_init_module+0x62/0x290
Dec  6 20:24:45 arcturus kernel: [   20.368255]  load_module+0x265e/0x2720
Dec  6 20:24:45 arcturus kernel: [   20.368261]  __do_sys_finit_module+0xc2/0x120
Dec  6 20:24:45 arcturus kernel: [   20.368266]  __x64_sys_finit_module+0x1a/0x20
Dec  6 20:24:45 arcturus kernel: [   20.368270]  do_syscall_64+0x61/0xb0
Dec  6 20:24:45 arcturus kernel: [   20.368277]  ? switch_fpu_return+0x49/0xc0
Dec  6 20:24:45 arcturus kernel: [   20.368283]  ? exit_to_user_mode_prepare+0x8f/0x1b0
Dec  6 20:24:45 arcturus kernel: [   20.368288]  ? syscall_exit_to_user_mode+0x27/0x50
Dec  6 20:24:45 arcturus kernel: [   20.368292]  ? __x64_sys_mmap+0x33/0x40
Dec  6 20:24:45 arcturus kernel: [   20.368296]  ? do_syscall_64+0x6e/0xb0
Dec  6 20:24:45 arcturus kernel: [   20.368301]  ? do_syscall_64+0x6e/0xb0
Dec  6 20:24:45 arcturus kernel: [   20.368305]  ? sysvec_apic_timer_interrupt+0x4e/0x90
Dec  6 20:24:45 arcturus kernel: [   20.368310]  ? asm_sysvec_apic_timer_interrupt+0xa/0x20
Dec  6 20:24:45 arcturus kernel: [   20.368315]  entry_SYSCALL_64_after_hwframe+0x44/0xae
Dec  6 20:24:45 arcturus kernel: [   20.368319] RIP: 0033:0x7fcf4278d9b9
Dec  6 20:24:45 arcturus kernel: [   20.368324] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a7 54 0c 00 f7 d8 64 89 01 48
Dec  6 20:24:45 arcturus kernel: [   20.368332] RSP: 002b:00007ffd7c5d6cd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
Dec  6 20:24:45 arcturus kernel: [   20.368340] RAX: ffffffffffffffda RBX: 00005556206fb360 RCX: 00007fcf4278d9b9
Dec  6 20:24:45 arcturus kernel: [   20.368344] RDX: 0000000000000000 RSI: 00007fcf42931e2d RDI: 000000000000001a
Dec  6 20:24:45 arcturus kernel: [   20.368348] RBP: 0000000000020000 R08: 0000000000000000 R09: 00005556206fb830
Dec  6 20:24:45 arcturus kernel: [   20.368351] R10: 000000000000001a R11: 0000000000000246 R12: 00007fcf42931e2d
Dec  6 20:24:45 arcturus kernel: [   20.368355] R13: 0000000000000000 R14: 00005556206fcd90 R15: 00005556206fb360
Dec  6 20:24:45 arcturus kernel: [   20.368360] Modules linked in: amdgpu(+) amd64_edac edac_mce_amd kvm_amd iommu_v2 ccp gpu_sched drm_ttm_helper kvm ttm irqbypass drm_kms_helper cec crct10dif_pclmul joydev rc_core input_leds ghash_clmulni_intel i2c_algo_bit aesni_intel fb_sys_fops syscopyarea sysfillrect sysimgblt crypto_simd pcspkr k10temp 8250_dw fam15h_power cryptd efi_pstore mac_hid zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sunrpc drm ip_tables x_tables autofs4 btrfs blake2b_generic xor zstd_compress raid6_pq dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c hid_generic usbmouse usbkbd usbhid hid xhci_pci xhci_pci_renesas ehci_pci ahci crc32_pclmul i2c_piix4 ehci_hcd xhci_hcd tg3 libahci video
Dec  6 20:24:45 arcturus kernel: [   20.368510] CR2: 00000000000001db
Dec  6 20:24:45 arcturus kernel: [   20.368576] ---[ end trace f610c6dc2aa70ce3 ]---

This seems to happen right after it logs:

Code:
Dec  6 20:24:45 arcturus kernel: [   20.358459] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
Dec  6 20:24:45 arcturus kernel: [   20.358653] kfd kfd: amdgpu: error getting iommu info. is the iommu enabled?
Dec  6 20:24:45 arcturus kernel: [   20.358665] kfd kfd: amdgpu: Error initializing iommuv2
Dec  6 20:24:45 arcturus kernel: [   20.359554] kfd kfd: amdgpu: device 1002:9874 NOT added due to errors
Dec  6 20:24:45 arcturus kernel: [   20.359568] kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:9874

Which it also logs on working kernel, accept then it doesn't log "amdgpu: amdgpu_device_ip_init failed".
Exerpt from booting kernel:

Code:
Dec  6 20:28:06 arcturus kernel: [   21.313155] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
Dec  6 20:28:06 arcturus kernel: [   21.313244] kfd kfd: amdgpu: error getting iommu info. is the iommu enabled?
Dec  6 20:28:06 arcturus kernel: [   21.313253] kfd kfd: amdgpu: Error initializing iommuv2
Dec  6 20:28:06 arcturus kernel: [   21.314039] kfd kfd: amdgpu: device 1002:9874 NOT added due to errors
Dec  6 20:28:06 arcturus kernel: [   21.314046] amdgpu 0000:00:01.0: amdgpu: SE 1, SH per SE 1, CU per SH 8, active_cu_number 4
Dec  6 20:28:06 arcturus kernel: [   21.315519] [drm] fb mappable at 0x1FF7B9000
Dec  6 20:28:06 arcturus kernel: [   21.315523] [drm] vram apper at 0x1FF000000
Dec  6 20:28:06 arcturus kernel: [   21.315525] [drm] size 8294400
Dec  6 20:28:06 arcturus kernel: [   21.315526] [drm] fb depth is 24
Dec  6 20:28:06 arcturus kernel: [   21.315527] [drm]    pitch is 7680
Dec  6 20:28:06 arcturus kernel: [   21.315644] fbcon: amdgpudrmfb (fb0) is primary device

Don't have physical access to look for iommu in the bios of my machine right now to see if anything can be enabled/disabled and if this helps.
If anybody else with a Microserver can look at this angle it's probably helpful.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!