How to fix this: received packet on bond0 with own address as source address ?

spirit · Feb 19, 2020

@albert_a

for ifupdown2, you could try to add in

/lib/systemd/system/networking.service
"
ExecStartPre=/usr/sbin/udevadm settle
"

Just before
"
ExecStart=/sbin/ifup -a
"

I should force renaming of interfaces before starting the interfaces

albert_a · Feb 19, 2020

@spirit Thanks for the suggestion!
But as I said, first problem is that it happens occasionally, so I may think everything is ok, but one fine day cluster would not start up.
The second thing that I mentioned, is that ifupdown2 ignores 'post-up' statements on no-subscription cluster.
So.. no trust to ifupdown2 anymore.
Is there any reason to switch to ifupdown2 (except applying changes without reboot)? Is ifupdown deprecated?

spirit · Feb 19, 2020

albert_a said:
@spirit Thanks for the suggestion!
But as I said, first problem is that it happens occasionally, so I may think everything is ok, but one fine day cluster would not start up.
The second thing that I mentioned, is that ifupdown2 ignores 'post-up' statements on no-subscription cluster.
So.. no trust to ifupdown2 anymore.
Is there any reason to switch to ifupdown2 (except applying changes without reboot)? Is ifupdown deprecated?

ifupdown is not deprecated. but a lot of new feature are coming for ifupdown2 (vxlan,....).
So I'm trying to get it really stable.
I asked you about udevm settle, because ifupdown1 is doing it before starting and not ifupdown2 .

about the post-up, can you send your /etc/network/interfaces with post-up ?

albert_a · Feb 20, 2020

spirit said:
but a lot of new feature are coming for ifupdown2 (vxlan,....).

Great, thanks for the info!

spirit said:
about the post-up, can you send your /etc/network/interfaces with post-up ?

Sure, I've sent /etc/network/interfaces of all the nodes of that cluster in PM.

Hyacin · Aug 15, 2020

Ok, so I'm getting a flood of this same error, and then a Proxmox reboot, on all three of my nodes when I reboot my router?!??

I tried ifupdown2 and the problem continued. I added the sleeps to ifenslave (which helped my bonds pick up the interfaces that were being left out) - and I've confirmed both ends are set for 802.3ad/LACP. I'm stumped.

The only thing left I can think to do is disable untagged VLAN 1 on the member ports facing Proxmox boxes, which is not an idea I love, and really shouldn't be required - it also makes zero sense that that would trigger it, but only when I reboot my router (on a LACP LAG off the same switch) - but it's all I've got left.

My playtime is up and the network is in use again, so I'll have to give it a try another night. Any thoughts/insight would be greatly appreciated in the meantime.

spirit · Aug 15, 2020

Ok, so I'm getting a flood of this same error, and then a Proxmox reboot, on all three of my nodes when I reboot my router?!??

what is your router model ?

This error should mean that a sended packet by your host, is coming back to your host. (so switch is flooding to all port for example, or a network loop).

Maybe it could be a router software bug, when you reboot it. (do you have tried to hard poweroff your router to compare ?)

Hyacin · Aug 15, 2020

spirit said:
what is your router model ?

Asus RT-AC5300 w/ Merlin WRT 384.19

spirit said:
This error should mean that a sended packet by your host, is coming back to your host. (so switch is flooding to all port for example, or a network loop).

Yeah, but that shouldn't make Debian/Proxmox crash and burn should it?

spirit said:
Maybe it could be a router software bug, when you reboot it. (do you have tried to hard poweroff your router to compare ?)

Oooo, this sounds like a very good test. I'll give it a try as soon as I can and report back!

spirit · Aug 15, 2020

Hyacin said:
Yeah, but that shouldn't make Debian/Proxmox crash and burn should it?

if it's really a loop, with network amplification, it could overload corosync process. (I have see that once, as corosync only use 1core, it was 100%).
if ha is enabled, it could reboot the hosts.

Hyacin · Aug 19, 2020

spirit said:
if it's really a loop, with network amplification, it could overload corosync process. (I have see that once, as corosync only use 1core, it was 100%).
if ha is enabled, it could reboot the hosts.

Alright, just had to reboot the router due to flaky internet, so I took the opportunity to test this theory and sadly, all four boxes again have 1 minute of uptime when I get back online :-/

One of them has a bond but only one link even (for easy switch to dual links when I get around to ordering a USB NIC) - maybe I'll try taking it out of the bond configuration and see if that one still goes down.

Hyacin · Aug 21, 2020

I'll let you guess which one does not have a bond, lol -

I'm going to try disabling the untagged VLAN 1 on the member ports of one of the other units and then try another reboot tonight to see if that fixes it.

Search

Search

How to fix this: received packet on bond0 with own address as source address ?

spirit

Distinguished Member

albert_a

Well-Known Member

spirit

Distinguished Member

albert_a

Well-Known Member

Hyacin

Active Member

spirit

Distinguished Member

Hyacin

Active Member

spirit

Distinguished Member

Hyacin

Active Member

Hyacin

Active Member

We value your privacy