[SOLVED] PVE crashes unexpectedly

msachse

New Member
May 29, 2026
6
4
3
Hello everyone,
I could use some help to figure out why my PVE instance crashes randomly. I am running version 6.14.8-2-pve on a Lenovo ThinkStation P360 which I repurposed from a regular desktop machine to run PVE.

Pulling the logs, I can see the following when PVE becomes unavailable and I have to do a hard reboot:

Code:
May 15 00:00:00 pm kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                             TDH                  <b9>
                             TDT                  <d3>
                             next_to_use          <d3>
                             next_to_clean        <b8>
                           buffer_info[next_to_clean]:
                             time_stamp           <1257dfe98>
                             next_to_watch        <b9>
                             jiffies              <125ca9e00>
                             next_to_watch.status <0>
                           MAC Status             <80083>
                           PHY Status             <796d>
                           PHY 1000BASE-T Status  <3800>
                           PHY Extended Status    <3000>
                           PCI Status             <10>

Any suggestions what I can do to prevent PVE from crashing on me? I am using barely any hardware so it shouldn't be a heat related problem (I assume).
 
The node has seen another crash within 48 hours. I'm adding the journalctl log from before the crash in case that helps figuring out the cause. Any suggestions?

Code:
May 31 10:51:14 pm pvedaemon[1282]: <root@pam> successful auth for user 'mirko@pve'
May 31 11:06:15 pm pvedaemon[1282]: <root@pam> successful auth for user 'mirko@pve'
May 31 11:17:01 pm CRON[238143]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
May 31 11:17:01 pm CRON[238145]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May 31 11:17:01 pm CRON[238143]: pam_unix(cron:session): session closed for user root
May 31 11:21:16 pm pvedaemon[1280]: <root@pam> successful auth for user 'mirko@pve'
May 31 11:36:17 pm pvedaemon[1281]: <root@pam> successful auth for user 'mirko@pve'
May 31 11:42:52 pm pveproxy[123178]: worker exit
May 31 11:42:52 pm pveproxy[1291]: worker 123178 finished
May 31 11:42:52 pm pveproxy[1291]: starting 1 worker(s)
May 31 11:42:52 pm pveproxy[1291]: worker 242671 started
May 31 11:43:38 pm pveproxy[123177]: worker exit
May 31 11:43:38 pm pveproxy[1291]: worker 123177 finished
May 31 11:43:38 pm pveproxy[1291]: starting 1 worker(s)
May 31 11:43:38 pm pveproxy[1291]: worker 242809 started
May 31 11:44:12 pm pveproxy[123179]: worker exit
May 31 11:44:12 pm pveproxy[1291]: worker 123179 finished
May 31 11:44:12 pm pveproxy[1291]: starting 1 worker(s)
May 31 11:44:12 pm pveproxy[1291]: worker 242941 started
May 31 11:51:45 pm pvedaemon[1281]: <root@pam> successful auth for user 'mirko@pve'
May 31 12:07:45 pm pvedaemon[1280]: <root@pam> successful auth for user 'mirko@pve'
May 31 12:17:01 pm CRON[248560]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
May 31 12:17:01 pm CRON[248562]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May 31 12:17:01 pm CRON[248560]: pam_unix(cron:session): session closed for user root
May 31 12:23:46 pm pvedaemon[1281]: <root@pam> successful auth for user 'mirko@pve'
May 31 12:49:15 pm systemd[1]: Starting systemd-tmpfiles-clean.service - Cleanup of Temporary Directories...
May 31 12:49:15 pm systemd-tmpfiles[254243]: /usr/lib/tmpfiles.d/legacy.conf:14: Duplicate line for path "/run/lock", ignoring.
May 31 12:49:15 pm systemd[1]: systemd-tmpfiles-clean.service: Deactivated successfully.
May 31 12:49:15 pm systemd[1]: Finished systemd-tmpfiles-clean.service - Cleanup of Temporary Directories.
May 31 13:17:01 pm CRON[259154]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
May 31 13:17:01 pm CRON[259156]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May 31 13:17:01 pm CRON[259154]: pam_unix(cron:session): session closed for user root
May 31 14:17:01 pm CRON[269660]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
May 31 14:17:01 pm CRON[269662]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May 31 14:17:01 pm CRON[269660]: pam_unix(cron:session): session closed for user root
May 31 14:38:20 pm kernel: hrtimer: interrupt took 13834 ns
May 31 15:17:01 pm CRON[280904]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
May 31 15:17:01 pm CRON[280906]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May 31 15:17:01 pm CRON[280904]: pam_unix(cron:session): session closed for user root
May 31 15:20:13 pm systemd[1]: Starting apt-daily.service - Daily apt download activities...
May 31 15:20:13 pm systemd[1]: apt-daily.service: Deactivated successfully.
May 31 15:20:13 pm systemd[1]: Finished apt-daily.service - Daily apt download activities.
May 31 16:17:01 pm CRON[291806]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
May 31 16:17:01 pm CRON[291808]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May 31 16:17:01 pm CRON[291806]: pam_unix(cron:session): session closed for user root
May 31 17:11:43 pm chronyd[1031]: Leap second list /usr/share/zoneinfo/leap-seconds.list needs update
May 31 17:17:01 pm CRON[302682]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
May 31 17:17:01 pm CRON[302684]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May 31 17:17:01 pm CRON[302682]: pam_unix(cron:session): session closed for user root
May 31 18:17:01 pm CRON[313554]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
May 31 18:17:01 pm CRON[313556]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May 31 18:17:01 pm CRON[313554]: pam_unix(cron:session): session closed for user root
May 31 19:17:01 pm CRON[324010]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
May 31 19:17:01 pm CRON[324012]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May 31 19:17:01 pm CRON[324010]: pam_unix(cron:session): session closed for user root
May 31 20:17:01 pm CRON[334392]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
May 31 20:17:01 pm CRON[334394]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May 31 20:17:01 pm CRON[334392]: pam_unix(cron:session): session closed for user root
May 31 21:10:13 pm systemd[1]: Starting apt-daily.service - Daily apt download activities...
May 31 21:10:13 pm systemd[1]: apt-daily.service: Deactivated successfully.
May 31 21:10:13 pm systemd[1]: Finished apt-daily.service - Daily apt download activities.
May 31 21:17:01 pm CRON[345399]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
May 31 21:17:01 pm CRON[345401]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May 31 21:17:01 pm CRON[345399]: pam_unix(cron:session): session closed for user root
May 31 21:56:21 pm kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                             TDH                  <f8>
                             TDT                  <aa>
                             next_to_use          <aa>
                             next_to_clean        <f7>
                           buffer_info[next_to_clean]:
                             time_stamp           <107257a8d>
                             next_to_watch        <f8>
                             jiffies              <107258240>
                             next_to_watch.status <0>
                           MAC Status             <80083>
                           PHY Status             <796d>
                           PHY 1000BASE-T Status  <3800>
                           PHY Extended Status    <3000>
                           PCI Status             <10>
May 31 21:56:23 pm kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                             TDH                  <f8>
                             TDT                  <aa>
                             next_to_use          <aa>
                             next_to_clean        <f7>
                           buffer_info[next_to_clean]:
                             time_stamp           <107257a8d>
                             next_to_watch        <f8>
                             jiffies              <107258a40>
                             next_to_watch.status <0>
                           MAC Status             <80083>
                           PHY Status             <796d>
                           PHY 1000BASE-T Status  <3800>
                           PHY Extended Status    <3000>
                           PCI Status             <10>
 
The node has seen another crash within 48 hours. Any suggestions? I'm not sure about the best way to share the log files, do let me know if those would help.

Code:
May 31 21:56:21 pm kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                             TDH                  <f8>
                             TDT                  <aa>
                             next_to_use          <aa>
                             next_to_clean        <f7>
                           buffer_info[next_to_clean]:
                             time_stamp           <107257a8d>
                             next_to_watch        <f8>
                             jiffies              <107258240>
                             next_to_watch.status <0>
                           MAC Status             <80083>
                           PHY Status             <796d>
                           PHY 1000BASE-T Status  <3800>
                           PHY Extended Status    <3000>
                           PCI Status             <10>
May 31 21:56:23 pm kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                             TDH                  <f8>
                             TDT                  <aa>
                             next_to_use          <aa>
                             next_to_clean        <f7>
                           buffer_info[next_to_clean]:
                             time_stamp           <107257a8d>
                             next_to_watch        <f8>
                             jiffies              <107258a40>
                             next_to_watch.status <0>
                           MAC Status             <80083>
                           PHY Status             <796d>
                           PHY 1000BASE-T Status  <3800>
                           PHY Extended Status    <3000>
                           PCI Status             <10>

Furthermore here is the config I set for the NIC:
Code:
root@pm:~# ethtool -k eno1
Features for eno1:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
        tx-tcp-segmentation: off
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: off
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
tx-gso-list: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]
rx-udp-gro-forwarding: off
hsr-tag-ins-offload: off [fixed]
hsr-tag-rm-offload: off [fixed]
hsr-fwd-offload: off [fixed]
hsr-dup-offload: off [fixed]
 
Not sure if this will solve the problem, but what I did is create a script using the System Daemon to automatically adjust the NIC settings I stated above when the server reboots.

I will monitor for a week and if no further crashes, will update here
 
  • Like
Reactions: Johannes S
Here is what I did to make the change to the NIC persist upon boot/restart:

Code:
# SSH into the PVE host, then run:
sudo nano /usr/local/bin/prevent_NIC_hang.sh

# Paste the following content:
#!/bin/bash
ethtool -K eno1 gso off gro off tso off
echo "Script executed 10 minutes after boot" >> /var/log/prevent_NIC_hang.log

# Make script executable:
sudo chmod +x /usr/local/bin/prevent_NIC_hang.sh

# Create Systemd Service File
sudo nano /etc/systemd/system/delayed-script.service

# Paste the following content:
[Unit]
Description=Delayed startup script
After=network.target

[Service]
Type=oneshot
ExecStart=/usr/local/bin/prevent_NIC_hang.sh

[Install]
WantedBy=multi-user.target

# Create matching Systemd Timer file
sudo nano /etc/systemd/system/delayed-script.timer

# Paste the following content:
[Unit]
Description=Runs the delayed script 10 minutes after boot

[Timer]
OnBootSec=10m
Unit=delayed-script.service

[Install]
WantedBy=timers.target

# Enable and start the timer
sudo systemctl daemon-reload
sudo systemctl enable--now delayed-script.timer