Network issue after upgrade to 9.0

rellek74

Member
Aug 1, 2023
24
0
6
Hey there. Ever since I upgraded I get an issue where network traffic just. stops.

VM's/LXC's stop passing traffic.
WebGUI stops loading

The first time it happened , 2 days ago, I figured something locked up and rebooted the server and everything was fine.

This time when it happened, I reset the network in the console (systemctl restart networking.service) and the web GUI was then accessible, but none of the VM/LXC's were. I checked systemctl status and everything was green. I then got them back online by disabling and re-enabling each individual network card.

Where should I look for logs to figure out what happened? pvereport looks fine...
 
I noticed my journal entries stopped some time 2 days ago in the GUI, so I checked the journal via ssh and it was loaded with these errors over and over:
Code:
Sep 10 04:07:05 pve kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                              TDH                  <dc>
                              TDT                  <31>
                              next_to_use          <31>
                              next_to_clean        <db>
                            buffer_info[next_to_clean]:
                              time_stamp           <102c2c28c>
                              next_to_watch        <dc>
                              jiffies              <102c3fd00>
                              next_to_watch.status <0>
                            MAC Status             <80083>
                            PHY Status             <796d>
                            PHY 1000BASE-T Status  <3800>
                            PHY Extended Status    <3000>
                            PCI Status             <10>
-- Boot 3fc9ad9a1b874707a11b12a9586600cc --
Sep 10 07:10:27 pve kernel: ACPI BIOS Error (bug): Failure creating named object [\_SB.PCI0.XHC.RHUB.HS02._PLD], AE_ALREADY_EXISTS (20240827/d>
-- Boot 5a88482811454bbfbd878c01024ce2b4 --
Sep 10 04:07:07 pve kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                              TDH                  <dc>
                              TDT                  <31>
                              next_to_use          <31>
                              next_to_clean        <db>
                            buffer_info[next_to_clean]:
                              time_stamp           <102c2c28c>
                              next_to_watch        <dc>
                              jiffies              <102c40501>
                              next_to_watch.status <0>
                            MAC Status             <80083>
                            PHY Status             <796d>
                            PHY 1000BASE-T Status  <3800>
                            PHY Extended Status    <3000>
                            PCI Status             <10>
-- Boot 3fc9ad9a1b874707a11b12a9586600cc --
Sep 10 07:10:27 pve kernel: ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20240827/psobject-220)
-- Boot 5a88482811454bbfbd878c01024ce2b4 --
Sep 10 04:07:09 pve kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                              TDH                  <dc>
                              TDT                  <31>
                              next_to_use          <31>
                              next_to_clean        <db>
                            buffer_info[next_to_clean]:
                              time_stamp           <102c2c28c>
                              next_to_watch        <dc>
                              jiffies              <102c40cc0>
                              next_to_watch.status <0>
                            MAC Status             <80083>
                            PHY Status             <796d>
                            PHY 1000BASE-T Status  <3800>
                            PHY Extended Status    <3000>
                            PCI Status             <10>
-- Boot 3fc9ad9a1b874707a11b12a9586600cc --
Sep 10 07:10:27 pve kernel: ACPI: Skipping parse of AML opcode: Method (0x0014)

After so many of those, I started to see errors that the log2ram was full. I vacuumed that to 500M and restarted log2ram, but I did this after the first time the networking froze.