Turned out it was a problem on the Cisco end, I don't see where in running-config this is defined but evidently besides defining each vlan interface you also have to do "vlan XX" to activate the layer 2 vlan support for that ID.
I think should should be a pretty simple thing to configure, but I've been beating my head against it for a full day now. I think the issue is on the proxmox side rather than the switch side since I've done similar labs with vmware in the past and don't recall having such an issue.
On the Proxmox side I configured it as follows:
The ARP table on a proxmox node looks like this, it's not resolving MAC for vlan 101 and 1000:
Bizarrely, from the switch, I can ping all the hosts on every vlan, but it always shows the vlan 101/1000 addresses as being on vlan 100:
On the switch the config is pretty simple:
I've tried a lot of things to get it to work, besides rebooting everything, like instead of defining vlanXX I tried bridges with dot notation, adding a vlan-aware bridge for bond0 then defining vlans on that. Using a separate LACP bond just for 1000,101. Same behavior. Switch can always reach 1000 and 101 on vlan 100 (apparently) but hosts can't reach the switch or each-other. Vlan 17 and 100 seem to work fine otherwise.
The one thing I notice is that the MAC address for each vlan interface is the same, and maybe that's causing an issue? I tried setting individual mac addresses for each vlan in /etc/network/interfaces but that didn't seem to matter.
I think should should be a pretty simple thing to configure, but I've been beating my head against it for a full day now.
- I have 3 (new, v8.2.7) proxmox nodes in a cluster named px1, px2, px3
- I'm using a Cisco Catalyst 3650 switch
- Each Proxmox node uplinks eno1 to an access-mode port in vlan 1001 (for management)
- Each Proxmox node uplinks ens2f0,1,2,3 to a separate LACP channel-group on the Catalyst
- The LACP groups are trunks that tag every vlan
- 100 = proxmox clustering
- 17 = office lab uplink for internet
- 1000,101 = ceph pub/cluster
On the Proxmox side I configured it as follows:
Code:
auto lo
iface lo inet loopback
iface eno1 inet manual
iface eno2 inet manual
auto ens2f0
iface ens2f0 inet manual
auto ens2f1
iface ens2f1 inet manual
auto ens2f2
iface ens2f2 inet manual
auto ens2f3
iface ens2f3 inet manual
auto bond0
iface bond0 inet manual
bond-slaves ens2f0 ens2f1 ens2f2 ens2f3
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer2+3
#uplink
auto vmbr0
iface vmbr0 inet static
address 192.168.1.1/24
bridge-ports eno1
bridge-stp off
bridge-fd 0
#oob mgmt
auto vlan17
iface vlan17 inet static
address 10.0.17.40/24
gateway 10.0.17.252
vlan-raw-device bond0
#office
auto vlan100
iface vlan100 inet static
address 192.168.3.1/24
vlan-raw-device bond0
#pxcluster
auto vlan101
iface vlan101 inet static
address 192.168.4.1/24
vlan-raw-device bond0
#cphclst
auto vlan1000
iface vlan1000 inet static
address 192.168.2.1/24
vlan-raw-device bond0
#cphpub
source /etc/network/interfaces.d/*
- Vlan 17 works. I can get to the internet and hosts can ping each-other.
- Vlan 100 works. Proxmox cluster is set up and hosts can ping each-other.
- Vlan 1001 (native eno1) works. I can manage the hosts with my laptop cabled to another access port on the catalyst.
- Vlan 101 and 1000 don't work. I can't ping the switch or other hosts.
The ARP table on a proxmox node looks like this, it's not resolving MAC for vlan 101 and 1000:
Code:
Address HWtype HWaddress Flags Mask Iface
192.168.1.2 ether 4c:d9:8f:ab:4b:1e C vmbr0 #px2
192.168.1.3 ether 4c:d9:8f:ab:4e:0e C vmbr0 #px3
192.168.1.123 ether 98:e7:43:36:1f:32 C vmbr0 #laptop
192.168.1.250 ether 00:5d:73:d0:ea:f5 C vmbr0 #switch
10.0.17.41 ether b0:26:28:95:8f:88 C vlan17 #px2
10.0.17.42 ether b0:26:28:93:84:94 C vlan17 #px3
10.0.17.43 ether 00:5d:73:d0:ea:e0 C vlan17 #switch
10.0.17.252 ether 00:50:56:b4:16:ac C vlan17 #gateway
192.168.3.2 ether b0:26:28:95:8f:88 C vlan100 #px2
192.168.3.3 ether b0:26:28:93:84:94 C vlan100 #px3
192.168.3.250 ether 00:5d:73:d0:ea:d1 C vlan100 #switch
192.168.2.2 (incomplete) vlan1000 #px2
192.168.2.3 (incomplete) vlan1000 #px3
192.168.2.250 (incomplete) vlan1000 #switch
192.168.4.2 (incomplete) vlan101 #px2
192.168.4.3 (incomplete) vlan101 #px3
192.168.4.250 (incomplete) vlan101 #swtch
Bizarrely, from the switch, I can ping all the hosts on every vlan, but it always shows the vlan 101/1000 addresses as being on vlan 100:
Code:
Protocol Address Age (min) Hardware Addr Type Interface
Internet 10.0.17.40 0 b026.2893.88a8 ARPA Vlan17
Internet 10.0.17.41 53 b026.2895.8f88 ARPA Vlan17
Internet 10.0.17.42 53 b026.2893.8494 ARPA Vlan17
Internet 10.0.17.43 - 005d.73d0.eae0 ARPA Vlan17
Internet 192.168.1.1 0 d094.6664.0bec ARPA Vlan1001
Internet 192.168.1.2 0 4cd9.8fab.4b1e ARPA Vlan1001
Internet 192.168.1.3 0 4cd9.8fab.4e0e ARPA Vlan1001
Internet 192.168.1.250 - 005d.73d0.eaf5 ARPA Vlan1001
Internet 192.168.2.1 195 b026.2893.88a8 ARPA Vlan100
Internet 192.168.2.2 208 b026.2895.8f88 ARPA Vlan100
Internet 192.168.2.3 203 b026.2893.8494 ARPA Vlan100
Internet 192.168.2.250 - 005d.73d0.eadf ARPA Vlan1000
Internet 192.168.3.1 0 b026.2893.88a8 ARPA Vlan100
Internet 192.168.3.2 0 b026.2895.8f88 ARPA Vlan100
Internet 192.168.3.3 0 b026.2893.8494 ARPA Vlan100
Internet 192.168.3.250 - 005d.73d0.ead1 ARPA Vlan100
Internet 192.168.4.1 212 b026.2893.88a8 ARPA Vlan100
Internet 192.168.4.2 227 b026.2895.8f88 ARPA Vlan100
Internet 192.168.4.3 191 b026.2893.8494 ARPA Vlan100
Internet 192.168.4.250 - 005d.73d0.eac1 ARPA Vlan101
On the switch the config is pretty simple:
Code:
interface Port-channel1
description Proxmox01
switchport trunk allowed vlan 17,100,101,1000,1001
switchport mode trunk
!
interface Port-channel2
description Proxmox02
switchport trunk allowed vlan 17,100,101,1000,1001
switchport mode trunk
!
interface Port-channel3
description Proxmox03
switchport trunk allowed vlan 17,100,101,1000,1001
switchport mode trunk
!
interface GigabitEthernet1/0/1
description LACP-Proxmox01
switchport trunk allowed vlan 17,100,101,1000,1001
switchport mode trunk
channel-group 1 mode active
!
interface GigabitEthernet1/0/2
description LACP-Proxmox01
switchport trunk allowed vlan 17,100,101,1000,1001
switchport mode trunk
channel-group 1 mode active
!
interface GigabitEthernet1/0/3
description LACP-Proxmox01-Ceph
switchport trunk allowed vlan 17,100,101,1000,1001
switchport mode trunk
channel-group 1 mode active
!
interface GigabitEthernet1/0/4
description LACP-Proxmox01-Ceph
switchport trunk allowed vlan 17,100,101,1000,1001
switchport mode trunk
channel-group 1 mode active
!
interface GigabitEthernet1/0/5
description LACP-Proxmox02
switchport trunk allowed vlan 17,100,101,1000,1001
switchport mode trunk
channel-group 2 mode active
!
interface GigabitEthernet1/0/6
description LACP-Proxmox02
switchport trunk allowed vlan 17,100,101,1000,1001
switchport mode trunk
channel-group 2 mode active
!
interface GigabitEthernet1/0/7
description LACP-Proxmox02-CEPH
switchport trunk allowed vlan 17,100,101,1000,1001
switchport mode trunk
channel-group 2 mode active
!
interface GigabitEthernet1/0/8
description LACP-Proxmox02-CEPH
switchport trunk allowed vlan 17,100,101,1000,1001
switchport mode trunk
channel-group 2 mode active
!
interface GigabitEthernet1/0/9
description LACP-Proxmox03
switchport trunk allowed vlan 17,100,101,1000,1001
switchport mode trunk
channel-group 3 mode active
!
interface GigabitEthernet1/0/10
description LACP-Proxmox03
switchport trunk allowed vlan 17,100,101,1000,1001
switchport mode trunk
channel-group 3 mode active
!
interface GigabitEthernet1/0/11
description LACP-Proxmox03-CEPH
switchport trunk allowed vlan 17,100,101,1000,1001
switchport mode trunk
channel-group 3 mode active
!
interface GigabitEthernet1/0/12
description LACP-Proxmox03-CEPH
switchport trunk allowed vlan 17,100,101,1000,1001
switchport mode trunk
channel-group 3 mode active
!
interface GigabitEthernet1/0/13
switchport access vlan 1001
switchport mode access
!
interface GigabitEthernet1/0/14
switchport access vlan 1001
switchport mode access
!
interface GigabitEthernet1/0/15
switchport access vlan 1001
switchport mode access
!
interface GigabitEthernet1/0/16
switchport access vlan 1001
switchport mode access
!
interface GigabitEthernet1/0/25
switchport access vlan 17
switchport mode access
!
interface Vlan1
no ip address
shutdown
!
interface Vlan17
description office net
ip address 10.0.17.43 255.255.255.0
!
interface Vlan100
ip address 192.168.3.250 255.255.255.0
!
interface Vlan101
description ceph cluster
ip address 192.168.4.250 255.255.255.0
!
interface Vlan1000
description Ceph
ip address 192.168.2.250 255.255.255.0
!
interface Vlan1001
description OOB-MGMT
ip address 192.168.1.250 255.255.255.0
!
I've tried a lot of things to get it to work, besides rebooting everything, like instead of defining vlanXX I tried bridges with dot notation, adding a vlan-aware bridge for bond0 then defining vlans on that. Using a separate LACP bond just for 1000,101. Same behavior. Switch can always reach 1000 and 101 on vlan 100 (apparently) but hosts can't reach the switch or each-other. Vlan 17 and 100 seem to work fine otherwise.
The one thing I notice is that the MAC address for each vlan interface is the same, and maybe that's causing an issue? I tried setting individual mac addresses for each vlan in /etc/network/interfaces but that didn't seem to matter.
Last edited: