Server hanging: "blocked for more than 120 seconds"

altano

Active Member
Apr 6, 2019
45
14
28
40
California, US
alan.norbauer.com
Out of the blue my server started hanging and I don't know why. When it hangs a few things happen:

1) The web GUI, ssh, and all remote access becomes totally unresponsive
2) All my VMs are hung as well
3) I can't interact with the machine, but the console will have errors that look like:

upload_2019-5-13_19-38-31.png

The first part transcribed, for better searchability:

Code:
     INFO: task kworker/u256:5:279 blocked for more than 120 seconds.
Tainted: P           0     4.15.18-14-pve #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
...

I've found a number of other threads where people have SIMILAR error messages, but they are either errors in their guest VMs and not the Proxmox host itself, or seem to be unrelated in other ways.

Usually when I reboot everything is fine again, so I'm not sure how to diagnose this problem. Can anyone think of anything that might help me figure out what's going on here?

Thanks!
 
Code:
root@red:~# pvesm status
Name              Type     Status           Total            Used       Available        %
local              dir     active        28510260         2562444        24476536    8.99%
local-lvm      lvmthin     active        62562304               0        62562304    0.00%
nfsproxmox         nfs     active     13341864960       153466880     13188398080    1.15%

All my VMs are on "nfsproxmox" which is an NFS share backed by a ZFS pool on another machine. "local" is an M.2 NVME ssd that proxmox is installed onto.

And thanks for replying!
 
Ugh, this is happening every few days. My server is unusable. Anyone have any ideas?
I have the same issue, i can't get error logs as it's a remote server. It's been happening for sometime now. I checked my hosting logs and over a 3 month period i'm rebooting my server on average ever 2.25 days. In my hosting panel it shows online but i'm unable to ssh to the server and all my vm's/containers/GUI are unresponsive
 
I went ~two weeks without this hang and it just happened again. Interestingly, my containers and VMs are running fine. I just can't access the Proxmox web GUI or ssh into the host. Local console is also hung. Anyone have any ideas how I might go about diagnosing this problem WHILE the machine is in this state?

Once I reboot I can look at some of the logging I turned on but since the machine is semi-usable I figured someone might have some ideas?
 
PSA: SuperMicro released a bios update (R 1.0a) for this board and I've got 11 days of uptime without a crash, so I think the bios update may have fixed the problem. The update comes with no information so I have no clue if they even intended to address such an issue, but I'm hopeful this was an intentional hardware fix. I obviously won't know until my uptime exceeds all previous crashes by a lot (a month?)

tl;dr: if you've having trouble with the SM Epyc 3000 boards, update your bios
 
PSA: SuperMicro released a bios update (R 1.0a) for this board and I've got 11 days of uptime without a crash, so I think the bios update may have fixed the problem. The update comes with no information so I have no clue if they even intended to address such an issue, but I'm hopeful this was an intentional hardware fix. I obviously won't know until my uptime exceeds all previous crashes by a lot (a month?)

tl;dr: if you've having trouble with the SM Epyc 3000 boards, update your bios


Interestingly for me the crashes seemed to have stopped 40 days uptime now. My BIOS was at the latest firmware so it’s something inside Proxmox that was fixed for me.
 
Interestingly for me the crashes seemed to have stopped 40 days uptime now. My BIOS was at the latest firmware so it’s something inside Proxmox that was fixed for me.
Interesting. Do you have the M11SDV-8C-LN4F? If so the bios was just released 6/13 so if you have 40 days of uptime you definitely don't have the update :)

Either way it doesn't matter if your issue is fixed. I'm glad to hear it!

I'll report back when I have more uptime (or don't).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!