How to fix this: received packet on bond0 with own address as source address ?

@spirit Thanks for the suggestion!
But as I said, first problem is that it happens occasionally, so I may think everything is ok, but one fine day cluster would not start up.
The second thing that I mentioned, is that ifupdown2 ignores 'post-up' statements on no-subscription cluster.
So.. no trust to ifupdown2 anymore.
Is there any reason to switch to ifupdown2 (except applying changes without reboot)? Is ifupdown deprecated?
 
@spirit Thanks for the suggestion!
But as I said, first problem is that it happens occasionally, so I may think everything is ok, but one fine day cluster would not start up.
The second thing that I mentioned, is that ifupdown2 ignores 'post-up' statements on no-subscription cluster.
So.. no trust to ifupdown2 anymore.
Is there any reason to switch to ifupdown2 (except applying changes without reboot)? Is ifupdown deprecated?
ifupdown is not deprecated. but a lot of new feature are coming for ifupdown2 (vxlan,....).
So I'm trying to get it really stable.
I asked you about udevm settle, because ifupdown1 is doing it before starting and not ifupdown2 .

about the post-up, can you send your /etc/network/interfaces with post-up ?
 
Ok, so I'm getting a flood of this same error, and then a Proxmox reboot, on all three of my nodes when I reboot my router?!??

I tried ifupdown2 and the problem continued. I added the sleeps to ifenslave (which helped my bonds pick up the interfaces that were being left out) - and I've confirmed both ends are set for 802.3ad/LACP. I'm stumped.

The only thing left I can think to do is disable untagged VLAN 1 on the member ports facing Proxmox boxes, which is not an idea I love, and really shouldn't be required - it also makes zero sense that that would trigger it, but only when I reboot my router (on a LACP LAG off the same switch) - but it's all I've got left.

My playtime is up and the network is in use again, so I'll have to give it a try another night. Any thoughts/insight would be greatly appreciated in the meantime.
 
Ok, so I'm getting a flood of this same error, and then a Proxmox reboot, on all three of my nodes when I reboot my router?!??
what is your router model ?

This error should mean that a sended packet by your host, is coming back to your host. (so switch is flooding to all port for example, or a network loop).

Maybe it could be a router software bug, when you reboot it. (do you have tried to hard poweroff your router to compare ?)
 
what is your router model ?

Asus RT-AC5300 w/ Merlin WRT 384.19

This error should mean that a sended packet by your host, is coming back to your host. (so switch is flooding to all port for example, or a network loop).

Yeah, but that shouldn't make Debian/Proxmox crash and burn should it?

Maybe it could be a router software bug, when you reboot it. (do you have tried to hard poweroff your router to compare ?)

Oooo, this sounds like a very good test. I'll give it a try as soon as I can and report back!
 
Yeah, but that shouldn't make Debian/Proxmox crash and burn should it?

if it's really a loop, with network amplification, it could overload corosync process. (I have see that once, as corosync only use 1core, it was 100%).
if ha is enabled, it could reboot the hosts.
 
if it's really a loop, with network amplification, it could overload corosync process. (I have see that once, as corosync only use 1core, it was 100%).
if ha is enabled, it could reboot the hosts.

Alright, just had to reboot the router due to flaky internet, so I took the opportunity to test this theory and sadly, all four boxes again have 1 minute of uptime when I get back online :-/

One of them has a bond but only one link even (for easy switch to dual links when I get around to ordering a USB NIC) - maybe I'll try taking it out of the bond configuration and see if that one still goes down.
 
I'll let you guess which one does not have a bond, lol -

post-router-reboot.png


I'm going to try disabling the untagged VLAN 1 on the member ports of one of the other units and then try another reboot tonight to see if that fixes it.