I just added a second node and migrated some VMs there. It continues to go offline, but accessible via local terminal. Is this a NIC issue?
Apr 12 13:32:09 proxmox2 corosync[951]: [KNET ] link: host: 1 link: 0 is down
Apr 12 13:32:09 proxmox2 corosync[951]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Apr 12 13:32:09 proxmox2 corosync[951]: [KNET ] host: host: 1 has no active links
Apr 12 13:32:10 proxmox2 kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang: TDH <c6> TDT <41> next_to_use <41> next_to_clean <c5>buffer_info[next_to_clean]: time_stamp <1010f8c1c> next_to_watch <c6> jiffies <1010f9640> next_to_watch.status <0>MAC Status <80083>PHY Status <796d>PHY 1000BASE-T Status <3800>PHY Extended Status <3000>PCI Status <10>
Apr 12 13:32:11 proxmox2 corosync[951]: [TOTEM ] Token has not been received in 2250 ms
Apr 12 13:32:11 proxmox2 corosync[951]: [TOTEM ] A processor failed, forming new configuration: token timed out (3000ms), waiting 3600ms for consensus.
Apr 12 13:32:12 proxmox2 kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang: TDH <c6> TDT <41> next_to_use <41> next_to_clean <c5>buffer_info[next_to_clean]: time_stamp <1010f8c1c> next_to_watch <c6> jiffies <1010f9e00> next_to_watch.status <0>MAC Status <80083>PHY Status <796d>PHY 1000BASE-T Status <3800>PHY Extended Status <3000>PCI Status <10>
Apr 12 13:32:14 proxmox2 kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang: TDH <c6> TDT <41> next_to_use <41> next_to_clean <c5>buffer_info[next_to_clean]: time_stamp <1010f8c1c> next_to_watch <c6> jiffies <1010fa600> next_to_watch.status <0>MAC Status <80083>PHY Status <796d>PHY 1000BASE-T Status <3800>PHY Extended Status <3000>PCI Status <10>
Apr 12 13:32:15 proxmox2 corosync[951]: [QUORUM] Sync members[1]: 2
Apr 12 13:32:15 proxmox2 corosync[951]: [QUORUM] Sync left[1]: 1
Apr 12 13:32:15 proxmox2 corosync[951]: [TOTEM ] A new membership (2.49) was formed. Members left: 1
Apr 12 13:32:15 proxmox2 corosync[951]: [TOTEM ] Failed to receive the leave message. failed: 1
Apr 12 13:32:15 proxmox2 pmxcfs[860]: [dcdb] notice: members: 2/860
Apr 12 13:32:15 proxmox2 pmxcfs[860]: [status] notice: members: 2/860
Apr 12 13:32:15 proxmox2 corosync[951]: [QUORUM] This node is within the non-primary component and will NOT provide any services.
Apr 12 13:32:15 proxmox2 corosync[951]: [QUORUM] Members[1]: 2
Apr 12 13:32:15 proxmox2 corosync[951]: [MAIN ] Completed service synchronization, ready to provide service.
Apr 12 13:32:15 proxmox2 pmxcfs[860]: [status] notice: node lost quorum
Apr 12 13:32:15 proxmox2 pmxcfs[860]: [dcdb] crit: received write while not quorate - trigger resync
Apr 12 13:32:15 proxmox2 pmxcfs[860]: [dcdb] crit: leaving CPG group
Apr 12 13:32:15 proxmox2 pve-ha-lrm[1020]: unable to write lrm status file - unable to open file '/etc/pve/nodes/proxmox2/lrm_status.tmp.1020' - Permission denied
Apr 12 13:32:15 proxmox2 pvestatd[986]: storage 'omv' is not online
Apr 12 13:32:15 proxmox2 pvestatd[986]: status update time (5.135 seconds)
Apr 12 13:32:16 proxmox2 pmxcfs[860]: [dcdb] notice: start cluster connection
Apr 12 13:32:16 proxmox2 pmxcfs[860]: [dcdb] crit: cpg_join failed: 14
Apr 12 13:32:16 proxmox2 pmxcfs[860]: [dcdb] crit: can't initialize service
Apr 12 13:32:16 proxmox2 kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang: TDH <c6> TDT <41> next_to_use <41> next_to_clean <c5>buffer_info[next_to_clean]: time_stamp <1010f8c1c> next_to_watch <c6> jiffies <1010fadc0> next_to_watch.status <0>MAC Status <80083>PHY Status <796d>PHY 1000BASE-T Status <3800>PHY Extended Status <3000>PCI Status <10>
Apr 12 13:32:18 proxmox2 kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang: TDH <c6> TDT <41> next_to_use <41> next_to_clean <c5>buffer_info[next_to_clean]: time_stamp <1010f8c1c> next_to_watch <c6> jiffies <1010fb580> next_to_watch.status <0>MAC Status <80083>PHY Status <796d>PHY 1000BASE-T Status <3800>PHY Extended Status <3000>PCI Status <10>
Apr 12 13:32:20 proxmox2 kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang: TDH <c6> TDT <41> next_to_use <41> next_to_clean <c5>buffer_info[next_to_clean]: time_stamp <1010f8c1c> next_to_watch <c6> jiffies <1010fbd40> next_to_watch.status <0>MAC Status <80083>PHY Status <796d>PHY 1000BASE-T Status <3800>PHY Extended Status <3000>PCI Status <10>
Apr 12 13:32:22 proxmox2 pmxcfs[860]: [dcdb] notice: members: 2/860
Apr 12 13:32:22 proxmox2 pmxcfs[860]: [dcdb] notice: all data is up to date
Apr 12 13:32:09 proxmox2 corosync[951]: [KNET ] link: host: 1 link: 0 is down
Apr 12 13:32:09 proxmox2 corosync[951]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Apr 12 13:32:09 proxmox2 corosync[951]: [KNET ] host: host: 1 has no active links
Apr 12 13:32:10 proxmox2 kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang: TDH <c6> TDT <41> next_to_use <41> next_to_clean <c5>buffer_info[next_to_clean]: time_stamp <1010f8c1c> next_to_watch <c6> jiffies <1010f9640> next_to_watch.status <0>MAC Status <80083>PHY Status <796d>PHY 1000BASE-T Status <3800>PHY Extended Status <3000>PCI Status <10>
Apr 12 13:32:11 proxmox2 corosync[951]: [TOTEM ] Token has not been received in 2250 ms
Apr 12 13:32:11 proxmox2 corosync[951]: [TOTEM ] A processor failed, forming new configuration: token timed out (3000ms), waiting 3600ms for consensus.
Apr 12 13:32:12 proxmox2 kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang: TDH <c6> TDT <41> next_to_use <41> next_to_clean <c5>buffer_info[next_to_clean]: time_stamp <1010f8c1c> next_to_watch <c6> jiffies <1010f9e00> next_to_watch.status <0>MAC Status <80083>PHY Status <796d>PHY 1000BASE-T Status <3800>PHY Extended Status <3000>PCI Status <10>
Apr 12 13:32:14 proxmox2 kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang: TDH <c6> TDT <41> next_to_use <41> next_to_clean <c5>buffer_info[next_to_clean]: time_stamp <1010f8c1c> next_to_watch <c6> jiffies <1010fa600> next_to_watch.status <0>MAC Status <80083>PHY Status <796d>PHY 1000BASE-T Status <3800>PHY Extended Status <3000>PCI Status <10>
Apr 12 13:32:15 proxmox2 corosync[951]: [QUORUM] Sync members[1]: 2
Apr 12 13:32:15 proxmox2 corosync[951]: [QUORUM] Sync left[1]: 1
Apr 12 13:32:15 proxmox2 corosync[951]: [TOTEM ] A new membership (2.49) was formed. Members left: 1
Apr 12 13:32:15 proxmox2 corosync[951]: [TOTEM ] Failed to receive the leave message. failed: 1
Apr 12 13:32:15 proxmox2 pmxcfs[860]: [dcdb] notice: members: 2/860
Apr 12 13:32:15 proxmox2 pmxcfs[860]: [status] notice: members: 2/860
Apr 12 13:32:15 proxmox2 corosync[951]: [QUORUM] This node is within the non-primary component and will NOT provide any services.
Apr 12 13:32:15 proxmox2 corosync[951]: [QUORUM] Members[1]: 2
Apr 12 13:32:15 proxmox2 corosync[951]: [MAIN ] Completed service synchronization, ready to provide service.
Apr 12 13:32:15 proxmox2 pmxcfs[860]: [status] notice: node lost quorum
Apr 12 13:32:15 proxmox2 pmxcfs[860]: [dcdb] crit: received write while not quorate - trigger resync
Apr 12 13:32:15 proxmox2 pmxcfs[860]: [dcdb] crit: leaving CPG group
Apr 12 13:32:15 proxmox2 pve-ha-lrm[1020]: unable to write lrm status file - unable to open file '/etc/pve/nodes/proxmox2/lrm_status.tmp.1020' - Permission denied
Apr 12 13:32:15 proxmox2 pvestatd[986]: storage 'omv' is not online
Apr 12 13:32:15 proxmox2 pvestatd[986]: status update time (5.135 seconds)
Apr 12 13:32:16 proxmox2 pmxcfs[860]: [dcdb] notice: start cluster connection
Apr 12 13:32:16 proxmox2 pmxcfs[860]: [dcdb] crit: cpg_join failed: 14
Apr 12 13:32:16 proxmox2 pmxcfs[860]: [dcdb] crit: can't initialize service
Apr 12 13:32:16 proxmox2 kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang: TDH <c6> TDT <41> next_to_use <41> next_to_clean <c5>buffer_info[next_to_clean]: time_stamp <1010f8c1c> next_to_watch <c6> jiffies <1010fadc0> next_to_watch.status <0>MAC Status <80083>PHY Status <796d>PHY 1000BASE-T Status <3800>PHY Extended Status <3000>PCI Status <10>
Apr 12 13:32:18 proxmox2 kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang: TDH <c6> TDT <41> next_to_use <41> next_to_clean <c5>buffer_info[next_to_clean]: time_stamp <1010f8c1c> next_to_watch <c6> jiffies <1010fb580> next_to_watch.status <0>MAC Status <80083>PHY Status <796d>PHY 1000BASE-T Status <3800>PHY Extended Status <3000>PCI Status <10>
Apr 12 13:32:20 proxmox2 kernel: e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang: TDH <c6> TDT <41> next_to_use <41> next_to_clean <c5>buffer_info[next_to_clean]: time_stamp <1010f8c1c> next_to_watch <c6> jiffies <1010fbd40> next_to_watch.status <0>MAC Status <80083>PHY Status <796d>PHY 1000BASE-T Status <3800>PHY Extended Status <3000>PCI Status <10>
Apr 12 13:32:22 proxmox2 pmxcfs[860]: [dcdb] notice: members: 2/860
Apr 12 13:32:22 proxmox2 pmxcfs[860]: [dcdb] notice: all data is up to date