Some vms on node caught in "restart" loop

whiggs · Jul 28, 2025

Hello all. I am running into a very annoying issue. I have several Windows vms (not all) running on a particular host that caught in a restart loop. By restart loop, I mean the vms will successfully boot up load the desktop, and will then immediately restart (as in the restart screen shows and everything). See video below for vm behavior:

https://youtube.com/shorts/X-oGfo2Vp9o

Here is the interesting part. If I power off the virtual machines, remove their virtual nics, and then power them back on, they don't reboot. Then I can just go into the hardware tab, re-add the virtual nic back to the vm, and then it all seems to be good to go. Everything is fine. That is, until the I restart the vm again. Then the vms go right back into the boot loop. I don't understand what is going on. I did take a look at the log for the node, and there does appear to be something there, but I am not sure how to interpret it. Can anyone help me figure out what is going on?!?

fba · Jul 29, 2025

Hello,

the attached log contains reports about corrected errors of your Broadcom NetXtreme BCM5720 Gigabit Ethernet network adapter. If it is a Dell or HP server you might still try the hardware checking.

Code:

Jul 28 17:09:43 bigprox kernel: {89}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
Jul 28 17:09:43 bigprox kernel: {89}[Hardware Error]: It has been corrected by h/w and requires no further action
Jul 28 17:09:43 bigprox kernel: {89}[Hardware Error]: event severity: corrected
Jul 28 17:09:43 bigprox kernel: {89}[Hardware Error]:  Error 0, type: corrected
Jul 28 17:09:43 bigprox kernel: {89}[Hardware Error]:   section_type: PCIe error
Jul 28 17:09:43 bigprox kernel: {89}[Hardware Error]:   port_type: 0, PCIe end point
Jul 28 17:09:43 bigprox kernel: {89}[Hardware Error]:   version: 3.0
Jul 28 17:09:43 bigprox kernel: {89}[Hardware Error]:   command: 0x0546, status: 0x0010
Jul 28 17:09:43 bigprox kernel: {89}[Hardware Error]:   device_id: 0000:5e:00.0
Jul 28 17:09:43 bigprox kernel: {89}[Hardware Error]:   slot: 0
Jul 28 17:09:43 bigprox kernel: {89}[Hardware Error]:   secondary_bus: 0x00
Jul 28 17:09:43 bigprox kernel: {89}[Hardware Error]:   vendor_id: 0x14e4, device_id: 0x1657
Jul 28 17:09:43 bigprox kernel: {89}[Hardware Error]:   class_code: 020000
Jul 28 17:09:43 bigprox kernel: tg3 0000:5e:00.0: AER: aer_status: 0x00000080, aer_mask: 0x00003000
Jul 28 17:09:43 bigprox kernel: tg3 0000:5e:00.0:    [ 7] BadDLLP               
Jul 28 17:09:43 bigprox kernel: tg3 0000:5e:00.0: AER: aer_layer=Data Link Layer, aer_agent=Receiver ID
Jul 28 17:09:46 bigprox kernel: tg3 0000:5e:00.0: AER: aer_status: 0x00000080, aer_mask: 0x00003000
Jul 28 17:09:46 bigprox kernel: tg3 0000:5e:00.0:    [ 7] BadDLLP               
Jul 28 17:09:46 bigprox kernel: tg3 0000:5e:00.0: AER: aer_layer=Data Link Layer, aer_agent=Receiver ID

For the restarts: Is there any related event shown in the Windows logs? The reboot event might contain a reason or some event nearby might shed some light.
Event IDs to look for (source):

Event ID 41: This event indicates that Windows restarted without a complete shutdown.
Event ID 1074: This event is logged when an application is responsible for the system shutdown or restart. It also indicates when a user restarted or shut down the system by using the Start menu or by pressing Ctrl+Alt+Del.
Event ID 6006: This event indicates that Windows was adequately turned off.
Event ID 6008: This event indicates an improper or dirty shutdown. It is logged when the most recent shutdown was unexpected.

Would you like to share the content of /etc/network/interfaces and the config of one of the vm (qm config <vmid>), to better understand the setup?

Search

Search

Some vms on node caught in "restart" loop

whiggs

New Member

Attachments

fba

Renowned Member

We value your privacy