Lose Access after every Restart

First - my mistake, you have eno4, but I talked about eno5... sorry, but I think that you get the idea.

> so i disconnected 1 interface and left 1 interface connected and restarted proxmox server to verify and once again right after restart i still didn't receive a connection back

How do you disconnect the interface? Physically or just drop it down by command line? Because if you just drop it by a command, on the next reboot it will be back online. If you want to completely test the "single interface" hypothesis - disable eno4 in /etc/network/interfaces (probaly the commenting out the whole section:


Code:
auto eno4
iface eno4 inet manual

and remove it from:

Code:
bond-slaves eno3 eno4

will be enough

So, please check this hypothesis. I know that it will be frustrating and time-consuming, but if with a single interface in the bond things work (especially for both of them), then we will left a less space for the issue to hide :)

Best regards,
NT
I physically removed it from host and checked and same thing apparently. Did some additional work with my colleague today and we managed to get it working(sort of) even after a restart but only with 1 interface being used which is weird. I will reply back tomorrow with the current config maybe that way some other issues can be found here. Thanks for all the assistance so far!
 
I sent back an email yesterday but apparently i was signed out and the email was never sent here and i'm just realising since i came back for an update. Anyway....what i tried sending is that me and my colleague tried some additional steps yesterday and got it working(sort of) and that i'd put an update here with current config from the web UI. Maybe some more light could be shined on the matter. It seems though even when the proxmox server has restart it doesn't initiate request for IP unless initiated manually somehow. Please see current config below:

Code:
auto lo
iface lo inet loopback

auto ens2f0
iface ens2f0 inet manual

auto ens2f1
iface ens2f1 inet manual

auto eno1
iface eno1 inet manual

auto eno2
iface eno2 inet manual

auto eno3
iface eno3 inet manual

auto eno4
iface eno4 inet manual

auto bond0
iface bond0 inet manual
        bond-slaves eno1 eno2 eno3 eno4
        bond-mode 802.3ad
        bond-miimon 100
        #bond-mode balance-rr
        bond-xmit-hash-policy layer2+3
        bond-lcap-rate fast

auto vmbr0
iface vmbr0 inet static
        address 192.168.3.8/24
        gateway 192.168.3.10
        bridge-ports bond0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 5,7
 
Last edited:
I sent back an email yesterday but apparently i was signed out and the email was never sent here and i'm just realising since i came back for an update. Anyway....what i tried sending is that me and my colleague tried some additional steps yesterday and got it working(sort of) and that i'd put an update here with current config from the web UI. Maybe some more light could be shined on the matter. It seems though even when the proxmox server has restart it doesn't initiate request for IP unless initiated manually somehow. Please see current config below:

Code:
auto lo
iface lo inet loopback

auto ens2f0
iface ens2f0 inet manual

auto ens2f1
iface ens2f1 inet manual

auto eno1
iface eno1 inet manual

auto eno2
iface eno2 inet manual

auto eno3
iface eno3 inet manual

auto eno4
iface eno4 inet manual

auto bond0
iface bond0 inet manual
        bond-slaves eno1 eno2 eno3 eno4
        bond-mode 802.3ad
        bond-miimon 100
        #bond-mode balance-rr
        bond-xmit-hash-policy layer2+3
        bond-lcap-rate fast

auto vmbr0
iface vmbr0 inet static
        address 192.168.3.8/24
        gateway 192.168.3.10
        bridge-ports bond0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 5,7
Is it possible to dump the bond status after a reboot, while it is still not working?
Something like:
# cat /proc/net/bonding/bond0 > $SOME_TMP_FILE_1

Then bring up the connection manually and do the same command, but to another file, e.g.:

# cat /proc/net/bonding/bond0 > $SOME_TMP_FILE_2

Then, please paste the contents of the two files here and let's try to find something suspicious.

Best,
/NT
 
Is it possible to dump the bond status after a reboot, while it is still not working?
Something like:
# cat /proc/net/bonding/bond0 > $SOME_TMP_FILE_1

Then bring up the connection manually and do the same command, but to another file, e.g.:

# cat /proc/net/bonding/bond0 > $SOME_TMP_FILE_2

Then, please paste the contents of the two files here and let's try to find something suspicious.

Best,
/NT
Yes here are logs as requested.....

While it is not working it gives:
Code:
Ethernet Channel Bonding Driver: v6.8.4-2-pve

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2+3 (2)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

802.3ad info
LACP active: on
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 34:80:0d:01:56:94
Active Aggregator Info:
        Aggregator ID: 2
        Number of ports: 1
        Actor Key: 9
        Partner Key: 1
        Partner Mac Address: 00:00:00:00:00:00

Slave Interface: eno1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 34:80:0d:01:56:94
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: churned
Partner Churn State: churned
Actor Churned Count: 1
Partner Churned Count: 1
details actor lacp pdu:
    system priority: 65535
    system mac address: 34:80:0d:01:56:94
    port key: 9
    port priority: 255
    port number: 1
    port state: 69
details partner lacp pdu:
    system priority: 65535
    system mac address: 00:00:00:00:00:00
    oper key: 1
    port priority: 255
    port number: 1
    port state: 1

Slave Interface: eno2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 34:80:0d:01:56:95
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: none
Partner Churn State: churned
Actor Churned Count: 0
Partner Churned Count: 1
details actor lacp pdu:
    system priority: 65535
    system mac address: 34:80:0d:01:56:94
    port key: 9
    port priority: 255
    port number: 2
    port state: 77
details partner lacp pdu:
    system priority: 65535
    system mac address: 00:00:00:00:00:00
    oper key: 1
    port priority: 255
    port number: 1
    port state: 1

Slave Interface: eno3
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 34:80:0d:01:56:96
Slave queue ID: 0
Aggregator ID: 3
Actor Churn State: churned
Partner Churn State: churned
Actor Churned Count: 1
Partner Churned Count: 1
details actor lacp pdu:
    system priority: 65535
    system mac address: 34:80:0d:01:56:94
    port key: 9
    port priority: 255
    port number: 3
    port state: 69
details partner lacp pdu:
    system priority: 65535
    system mac address: 00:00:00:00:00:00
    oper key: 1
    port priority: 255
    port number: 1
    port state: 1

Slave Interface: eno4
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 34:80:0d:01:56:97
Slave queue ID: 0
Aggregator ID: 4
Actor Churn State: churned
Partner Churn State: churned
Actor Churned Count: 1
Partner Churned Count: 1
details actor lacp pdu:
    system priority: 65535
    system mac address: 34:80:0d:01:56:94
    port key: 9
    port priority: 255
    port number: 4
    port state: 69
details partner lacp pdu:
    system priority: 65535
    system mac address: 00:00:00:00:00:00
    oper key: 1
    port priority: 255
    port number: 1
    port state: 1

After manually bringing up connection:

Code:
Ethernet Channel Bonding Driver: v6.8.4-2-pve

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2+3 (2)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

802.3ad info
LACP active: on
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 34:80:0d:01:56:94
Active Aggregator Info:
        Aggregator ID: 1
        Number of ports: 1
        Actor Key: 9
        Partner Key: 1001
        Partner Mac Address: f4:74:70:3b:e2:53

Slave Interface: eno1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 34:80:0d:01:56:94
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 1
Partner Churned Count: 1
details actor lacp pdu:
    system priority: 65535
    system mac address: 34:80:0d:01:56:94
    port key: 9
    port priority: 255
    port number: 1
    port state: 61
details partner lacp pdu:
    system priority: 1
    system mac address: f4:74:70:3b:e2:53
    oper key: 1001
    port priority: 1
    port number: 19
    port state: 63

Slave Interface: eno2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 34:80:0d:01:56:95
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: none
Partner Churn State: churned
Actor Churned Count: 0
Partner Churned Count: 1
details actor lacp pdu:
    system priority: 65535
    system mac address: 34:80:0d:01:56:94
    port key: 9
    port priority: 255
    port number: 2
    port state: 77
details partner lacp pdu:
    system priority: 65535
    system mac address: 00:00:00:00:00:00
    oper key: 1
    port priority: 255
    port number: 1
    port state: 1

Slave Interface: eno3
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 34:80:0d:01:56:96
Slave queue ID: 0
Aggregator ID: 3
Actor Churn State: churned
Partner Churn State: churned
Actor Churned Count: 1
Partner Churned Count: 1
details actor lacp pdu:
    system priority: 65535
    system mac address: 34:80:0d:01:56:94
    port key: 9
    port priority: 255
    port number: 3
    port state: 69
details partner lacp pdu:
    system priority: 65535
    system mac address: 00:00:00:00:00:00
    oper key: 1
    port priority: 255
    port number: 1
    port state: 1

Slave Interface: eno4
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 34:80:0d:01:56:97
Slave queue ID: 0
Aggregator ID: 4
Actor Churn State: churned
Partner Churn State: churned
Actor Churned Count: 1
Partner Churned Count: 1
details actor lacp pdu:
    system priority: 65535
    system mac address: 34:80:0d:01:56:94
    port key: 9
    port priority: 255
    port number: 4
    port state: 69
details partner lacp pdu:
    system priority: 65535
    system mac address: 00:00:00:00:00:00
    oper key: 1
    port priority: 255
    port number: 1
    port state: 1
 
Last edited:
And one more question here (I'm sorry that I missed it the previous time)

> After manually bringing up connection:
How do you do it? By console command (which?) or by another way (what)?

Best,
/NT
 
Last edited:
And one more question here (I'm sorry that I missed it the previous time)

> After manually bringing up connection:
How do you do it? By console command (which?) or by another way (what)?

Best,
/NT
it is weird actually and i only noticed while we were trying to troubleshoot when you suggested using tcpdump because thats how the negotiation starts....by using the tcpdump command. So basically using the tcpdump command from console is what's bringing up the connection.
 
Hah... that really sounds strange... and interesting :)

I will be honest, I have used some AI to analyse the information provided by you. So what I found (what the AI recommends me to check):

1. ---
"Partner Mac Address: 00:00:00:00:00:00

Partner Churn State: churned

In a healthy LACP (802.3ad) setup, the "Partner Mac Address" should be the MAC of your physical switch. The fact that it is all zeros means your server has not received a single LACP Control Protocol Data Unit (PDU) from the switch.
"

IMHO - that means that you should check your switch side configuration (probably the LACP is not configured properly on the switch side)

2. ---
"Number of ports: 1

Active Aggregator ID: 2 (only eno2 is active)

Even though you have four interfaces (eno1 through eno4), they are all in different Aggregator IDs (1, 2, 3, and 4). Because they can't see a common partner, the bonding driver is forced to pick just one interface to keep the network alive, rather than "bonding" them into a single 4Gbps pipe.
"

IMHO - this leads to the same direction - the configuration on the switch

3. ---
"Critical Typo:

In your configuration, you have:
bond-lcap-rate fast

It should be:
bond-lacp-rate fast (The 'p' and 'c' are swapped).
"

And the recommended change is:

Code:
auto bond0
iface bond0 inet manual
        bond-slaves eno1 eno2 eno3 eno4
        bond-mode 802.3ad
        bond-miimon 100
        bond-lacp-rate fast            # Fixed the typo here
        bond-xmit-hash-policy layer2+3
        bond-min-links 1               # Ensures the bond stays up if at least 1 link is alive

Last but not least... when the things work, on eno1
...
details partner lacp pdu:
...
system mac address: f4:74:70:3b:e2:53
....

while on all others it is:

....
details partner lacp pdu:
...
system mac address: 00:00:00:00:00:00
....

which again leads me to the idea of a switch side missconfiguration.

I would proceed as follows:

- On the host side, I would leave the bond with only one interface (let's say eno1). I would comment on the others, even if I had the chance, I would remove their cables :)

- on the switch itself, I would put an IP address from the same network (192.168.3.x/24)

In this configuration (Linux host + bond 802.3ad + vmbr0 + eno1 (only) <--> switch with 4 ports in LACP), I will try to verify that everything works at least between the host and the switch.

When I am sure of the above, I will start adding the remaining 3 interfaces one by one, and after adding each interface, I will make sure that things continue to work.

Only when I have added all 4 interfaces to the host's bond, and I am convinced that I have a connection to the switch, I will start investigating problems with the connection to the gateway (if there are still any).

I'm looking forward to your feedback.

Best,
NT
 
Hah... that really sounds strange... and interesting :)

I will be honest, I have used some AI to analyse the information provided by you. So what I found (what the AI recommends me to check):

1. ---
"Partner Mac Address: 00:00:00:00:00:00

Partner Churn State: churned

In a healthy LACP (802.3ad) setup, the "Partner Mac Address" should be the MAC of your physical switch. The fact that it is all zeros means your server has not received a single LACP Control Protocol Data Unit (PDU) from the switch.
"

IMHO - that means that you should check your switch side configuration (probably the LACP is not configured properly on the switch side)

2. ---
"Number of ports: 1

Active Aggregator ID: 2 (only eno2 is active)

Even though you have four interfaces (eno1 through eno4), they are all in different Aggregator IDs (1, 2, 3, and 4). Because they can't see a common partner, the bonding driver is forced to pick just one interface to keep the network alive, rather than "bonding" them into a single 4Gbps pipe.
"

IMHO - this leads to the same direction - the configuration on the switch

3. ---
"Critical Typo:

In your configuration, you have:
bond-lcap-rate fast

It should be:
bond-lacp-rate fast (The 'p' and 'c' are swapped).
"

And the recommended change is:

Code:
auto bond0
iface bond0 inet manual
        bond-slaves eno1 eno2 eno3 eno4
        bond-mode 802.3ad
        bond-miimon 100
        bond-lacp-rate fast            # Fixed the typo here
        bond-xmit-hash-policy layer2+3
        bond-min-links 1               # Ensures the bond stays up if at least 1 link is alive

Last but not least... when the things work, on eno1
...
details partner lacp pdu:
...
system mac address: f4:74:70:3b:e2:53
....

while on all others it is:

....
details partner lacp pdu:
...
system mac address: 00:00:00:00:00:00
....

which again leads me to the idea of a switch side missconfiguration.

I would proceed as follows:

- On the host side, I would leave the bond with only one interface (let's say eno1). I would comment on the others, even if I had the chance, I would remove their cables :)

- on the switch itself, I would put an IP address from the same network (192.168.3.x/24)

In this configuration (Linux host + bond 802.3ad + vmbr0 + eno1 (only) <--> switch with 4 ports in LACP), I will try to verify that everything works at least between the host and the switch.

When I am sure of the above, I will start adding the remaining 3 interfaces one by one, and after adding each interface, I will make sure that things continue to work.

Only when I have added all 4 interfaces to the host's bond, and I am convinced that I have a connection to the switch, I will start investigating problems with the connection to the gateway (if there are still any).

I'm looking forward to your feedback.

Best,
NT
Ok then will do, i will try all of these next week...was out of office most of today. Thanks for the usual assistance