Proxmox crash

sylsyl · May 13, 2025

I have a couple of PCs running Proxmox - but I am far from a Linux guru.

I recently installed immich in an Ubuntu VM. It has a high workload importing a few hundred thousand photos and I have seen my Proxmox machine “crash”.

By crash, I mean become unresponsive across all VMs and the PVE web interface. It doesn’t respond to ping.

I would have blamed it on overheating, but the PC does have intel vPro (a cheap man’s ipmi) and I can remotely control it. Using vPro, I can access the CLI and log in. This is the surprising bit! I can run commands, but ping from PVE doesn’t reach anywhere. After rebooting, netstat shows that the CPU went to near zero when it “crashed”, but obviously it was still collecting some data.

Rebooting the machine from the CLI makes things work again, usually for a few hours until it happens again.

How can I find out what the problem is?

LnxBil · May 13, 2025

After "it hangs", look at the output of dmesg if there are any problems reported there.

mkoeppl · May 13, 2025

Hi,

do you have any log output (journalctl -xe) when that happens?

sylsyl · May 13, 2025

dmesg contains lots of copies of:

Code:

[14781.638259] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                 TDH                  <fd>
                 TDT                  <24>
                 next_to_use          <24>
                 next_to_clean        <fc>
               buffer_info[next_to_clean]:
                 time_stamp           <100bf0538>
                 next_to_watch        <fd>
                 jiffies              <100dcf940>
                 next_to_watch.status <0>
               MAC Status             <80083>
               PHY Status             <796d>
               PHY 1000BASE-T Status  <3800>
               PHY Extended Status    <3000>
               PCI Status             <10>

journalctl -xe containts lots of similar stuff, e.g.:

Code:

May 13 13:06:04 pve2 pvestatd[1073]: storage 'isos' is not online
May 13 13:06:04 pve2 corosync-qdevice[1062]: Can't connect to qnetd host. (-5986): Network address not available (in use?)
May 13 13:06:04 pve2 apcupsd[845]: Communications with UPS lost.
May 13 13:06:05 pve2 kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                               TDH                  <fd>
                               TDT                  <24>
                               next_to_use          <24>
                               next_to_clean        <fc>
                             buffer_info[next_to_clean]:
                               time_stamp           <100bf0538>
                               next_to_watch        <fd>
                               jiffies              <100dcb300>
                               next_to_watch.status <0>
                             MAC Status             <80083>
                             PHY Status             <796d>
                             PHY 1000BASE-T Status  <3800>
                             PHY Extended Status    <3000>
                             PCI Status             <10>
May 13 13:06:07 pve2 pvestatd[1073]: pbs: error fetching datastores - 500 Can't connect to 192.168.61.2:8007 (No route to host)
May 13 13:06:07 pve2 pvestatd[1073]: status update time (9.519 seconds)

It doesn't go back as far as the event starting

I take it this means the network has given up the ghost for some reason? The machine is a Dell Optiplex 7070 with built in ethernet port.

mkoeppl · May 13, 2025

This could potentially be due to a known problem. Please try to disable offloading and see if it works then: https://forum.proxmox.com/threads/e1000-driver-hang.58284/page-7#post-368615

sylsyl · May 13, 2025

Thanks for the pointer. I'll try the fix(es) and see if things improve. I'll add a post to the thread to say yet another poor schmuck got pwned.

Search

Search

Proxmox crash

sylsyl

New Member

LnxBil

Distinguished Member

mkoeppl

Member

sylsyl

New Member

mkoeppl

Member

sylsyl

New Member

We value your privacy