[SOLVED] Upgrade error 5.4 to 6 - No reboot

wonko6x9

Active Member
Oct 16, 2018
11
1
43
55
USA
My system is an HP ProLiant DL180 G6 with two xeons, and 48Gig of ram, 12x2terabyte hard drives, and 1x10terabyte drive.

When upgrading my Proxmox installation from 5.4 to 6.2-6, the install seemed to go well.

On reboot the system would not boot with errors like the following:

A single instance of - can not request for apei bert registers

and then a loop of variations on

rcu_sched detected stalls on CPUs/tasks
re_sched kthread starved for xx jiffies!
RCU grace-period kthread stack dump
info: task swapper/0:1 blocked for more than
tainted: g 5.4.44-1-pve #1
Echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
watchdog detected hard lockup on cpu 0



This went on for a couple of hours. I rebooted and selected "advanced options" and deslected the 5.4.44 kernel and selected kernel 4.15.xx and have now been able to boot in.

I am at a loss how to proceed to solve the problem at this point though.

Any thoughts?
 
Last edited:
I appreciate the rapid reply. Here is the output:

pveversion -v

proxmox-ve: 6.2-1 (running kernel: 4.15.18-30-pve)
pve-manager: 6.2-6 (running version: 6.2-6/ee1d7754)
pve-kernel-5.4: 6.2-3
pve-kernel-helper: 6.2-3
pve-kernel-5.4.44-1-pve: 5.4.44-1
pve-kernel-4.15: 5.4-19
pve-kernel-4.15.18-30-pve: 4.15.18-58
pve-kernel-4.15.18-26-pve: 4.15.18-54
pve-kernel-4.15.18-20-pve: 4.15.18-46
pve-kernel-4.15.18-18-pve: 4.15.18-44
pve-kernel-4.15.18-12-pve: 4.15.18-36
pve-kernel-4.15.18-11-pve: 4.15.18-34
pve-kernel-4.15.18-10-pve: 4.15.18-32
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-5-pve: 4.15.18-24
pve-kernel-4.15.17-1-pve: 4.15.17-9
ceph-fuse: 12.2.13-pve1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-3
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
Password:
pve-cluster: 6.1-8
pve-container: 3.1-8
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-3
pve-qemu-kvm: 5.0.0-4
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-3
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.4-pve1

uname -r
4.15.18-30-pve


Do note, that the uname response is probably due to my selecting that kernel on login vs the 5.4.44 which was getting hung up on boot. At least that is my presumption.

Again, thanks for your help on this.
 
So it appears I have found something that at least solved my problem. I installed irqbalance and set it up in "oneshot=yes" mode at boot. This allows irqbalance to distribute the irq requests early in the boot process to multiple CPUs preventing the lockup that was happening on CPU 0. It then deactivates, and the system runs as normal.

Actually the whole boot process time was cut down significantly, which makes me think this may have been an issue that was percolating for some 5.4.44-1-pve is up and running fine.

Whew!
 
  • Like
Reactions: Moayad
So it appears I have found something that at least solved my problem. I installed irqbalance and set it up in "oneshot=yes" mode at boot. This allows irqbalance to distribute the irq requests early in the boot process to multiple CPUs preventing the lockup that was happening on CPU 0. It then deactivates, and the system runs as normal.

Actually the whole boot process time was cut down significantly, which makes me think this may have been an issue that was percolating for some 5.4.44-1-pve is up and running fine.

Whew!

I have the same problem. Is it possible to write the commands in your install step?
 
I can't give you exact commands on your machine, because the ones I needed seem to be a bit different from the ones others use.

I can tell you what I did though:

1. install irqbalance -

Code:
#apt-get update
#apt-get install irqbalance

2. Edit the "default file". This can be found at /etc/sysconfig/irqbalance, but mine was in /etc/default/, so my command was:

Code:
# cd /etc/default/
# nano irqbalance

I like nano, if you want to use VI, go ahead and replace nano with VI.

You then look for this line:

Code:
#ONESHOT=

change it to:

Code:
ONESHOT=yes

Then (if using nano) hit Control-X to exit, and yes to save it.

If you have issues finding the file, go to root and type in:

Code:
find -name irqbalance

This will give you several options, and you can narrow it down from there.

It was pretty easy. Reboot, and you should be good if that is the issue. The trick for me was finding the file to edit. Note that there is one in init.d, but you do NOT want to edit it there. It specifically refers you back to the default file to edit the behavior. Make sure you check that, and you should be good.

Hope that helps!
 
I can't give you exact commands on your machine, because the ones I needed seem to be a bit different from the ones others use.

I can tell you what I did though:

1. install irqbalance -

Code:
#apt-get update
#apt-get install irqbalance

2. Edit the "default file". This can be found at /etc/sysconfig/irqbalance, but mine was in /etc/default/, so my command was:

Code:
# cd /etc/default/
# nano irqbalance

I like nano, if you want to use VI, go ahead and replace nano with VI.

You then look for this line:

Code:
#ONESHOT=

change it to:

Code:
ONESHOT=yes

Then (if using nano) hit Control-X to exit, and yes to save it.

If you have issues finding the file, go to root and type in:

Code:
find -name irqbalance

This will give you several options, and you can narrow it down from there.

It was pretty easy. Reboot, and you should be good if that is the issue. The trick for me was finding the file to edit. Note that there is one in init.d, but you do NOT want to edit it there. It specifically refers you back to the default file to edit the behavior. Make sure you check that, and you should be good.

Hope that helps!

Thank you very much for providing detailed information. Frankly, I wasn't expecting a quick answer.

_2020-07-18-21-22-53-01.gif

After the screen above, I returned to the Proxmox 5.4 version. However, I would like to try and install your method in a separate virtual machine, I am sure it will work.
 
No problem. This was a weird enough one, and it took enough google-fu to figure it out, I thought it warranted some additional notes for others to see it. It isn't often you have a problem come up that five hundred thousand people haven't already posted answers to, and the issue is sorting out the good answers from the bad ones.

This may be one of the "bad" answers, but it worked for me, and seems logical, sound, and unlikely to do additional harm if it isn't the problem. It should also be easy to reverse should it cause someone's system to puke.

Let me know how it goes for you!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!