proxmox 7.0 sdn beta test

frybin · Jul 30, 2021

spirit said:
ok, got it. do you use proxmox firewall on theses nodes ? (I'm not sure from where is coming the tcp reset). The routing seem to be ok.

I don't use the proxmox firewall and have it turned off on the Datacenter and Node Level I think.

aderumier · Jul 30, 2021

frybin said:
I don't use the proxmox firewall and have it turned off on the Datacenter and Node Level I think.

Maybe try on the exit node : sysctl -w net.ipv4.conf.all.rp_filter=0

I ll be back from holiday next week, and i ll do more tests

frybin · Jul 30, 2021

aderumier said:
Maybe try on the exit node : sysctl -w net.ipv4.conf.all.rp_filter=0

I ll be back from holiday next week, and i ll do more tests

Running

Code:

sysctl -w net.ipv4.conf.all.rp_filter=0

on the exit node did not work.

spirit · Aug 2, 2021

frybin said:
Running

Code:

sysctl -w net.ipv4.conf.all.rp_filter=0

on the exit node did not work.

Hi,
I'm back from holiday.

can you try

sysctl -w net.ipv4.tcp_l3mdev_accept=1

on the exit-node, then restart ssh or pveproxy.
Then you should be able to join the exitnode ip from the vm.

(I don't known about other nodes (non exitnodes) of this cluster, do you have problem too ? because it should be routed like yours others clusters nodes.)

tisc0 · Aug 2, 2021

Hello,
Not sure if I'm supposed to push my specific problem here or create a new topic ?

Let's go, I guess you'll tell me or move it if it's not appropriate.

Last week, I was playing successfully with 2 clusters and SDN vxlan, with vnet non-vlan-aware, and subnets (let's assume I've been reading properly but maybe not perfectly the documentation ? Multiple times, though, and it's quite short).

Today, in another one, freshly and automatically installed by Scaleway (proxmox 6.4-13), and using what they call RPNv2 (supposed to be a VXLAN able to transport whatever we need in it), I get errors while trying to create vNIC in containers or VMs :

Clic ok, we're back in the config window. Clic ok again :

Here is the config in the /etc/network/interfaces of the 2 nodes in that cluster :

Bash:

auto lo
iface lo inet loopback

iface ens3f0 inet manual

iface ens3f1 inet manual
        mtu 9000

# WAN IP
auto vmbr0
iface vmbr0 inet static
        address xx.xx.xx.xx/24
        gateway xx.xx.xx.xx
        bridge-ports ens3f0
        bridge-stp off
        bridge-fd 0


# Preparing LAN interface
auto vmbr1
iface vmbr1 inet manual
        bridge-ports ens3f1
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
        mtu 8900

# Attaching a VLAN on vmbr1 - I could attach many, all given by service provider Scaleway
# This is the network used to create the cluster
auto vmbr1.2017
iface vmbr1.2017 inet static
        address 10.20.17.2/24
        mtu 8800

## I also tried with this very straight forward config, but same errors occured:
#auto ens3f1.2017
#iface ens3f1.2017 inet static
#       address 10.20.17.1/24


source /etc/network/interfaces.d/*

On the other node, it's similar, with 10.20.17.1/24 for the LAN (and its own public IP).
This network has bee used to create the cluster and enroll the nodes :

Bash:

root@mynode1:~# pvecm status
Cluster information
-------------------
Name:             ClusterV2
Config Version:   2
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Mon Aug  2 18:10:43 2021
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000001
Ring ID:          1.43
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           2 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.20.17.1 (local)
0x00000002          1 10.20.17.2

I don't get what I did wrong. Only a vnet VLAN-AWARE is working (and then not possible anymore to define subnets).

Thanks for any help, sorry if I didn't give you some crucial material to hemp your understanding, will push whatever you need.

spirit · Aug 2, 2021

@tisc0

can you send /etc/pve/sdn/*.cfg files ?

when you configure non-vlanware vnet (this should be the default anyway, until you want to propage vlan on top of vxlan), do you set any vlan tag in the vm nic options ? (this should be forbid)

tisc0 · Aug 3, 2021

Hi @spirit !
Thank you, it works. Sorry for that non-sense of mine, I indeed put a VLAN ID in the VM NIC options, and it's actually not forbidden.
Could you help too about the right value of MTU. Our service provider VLAN accept 9000, should I reduce it in the zone params or somewhere else ?
Thanks again

spirit · Aug 3, 2021

tisc0 said:
Hi @spirit !
Thank you, it works. Sorry for that non-sense of mine, I indeed put a VLAN ID in the VM NIC options, and it's actually not forbidden.

ok. gui still need support for this, I'll try to send patch soon. (and at least, send a correct error message)

tisc0 said:
Could you help too about the right value of MTU. Our service provider VLAN accept 9000, should I reduce it in the zone params or somewhere else ?
Thanks again

if you use vxlan, you need to lower 50bytes, so 8850 max. you can setup it in the zone, but it should also be done inside the guest. (default is 1500 in guest anyway)

frybin · Aug 4, 2021

spirit said:
Hi,
I'm back from holiday.

can you try

sysctl -w net.ipv4.tcp_l3mdev_accept=1

on the exit-node, then restart ssh or pveproxy.
Then you should be able to join the exitnode ip from the vm.

(I don't known about other nodes (non exitnodes) of this cluster, do you have problem too ? because it should be routed like yours others clusters nodes.)

HI @spirit It ended up working, thanks for the help. I don't have other nodes added to this cluster since I am still testing new features out.

tisc0 · Aug 9, 2021

Hi,
Trying to remove a subnet in SDN, with following error :

delete sdn subnet object failed: cannot delete subnet '10.26.0.0/24', not empty (500)

I did a great grep of the vnet it's supposed to use in /etc/pve/nodes but it seems no guest is using it.

Any idea ?

Edit: I deleted all entries directly in /etc/pve/sdn/subnets.cfg and it worked. Is the error expected behavior ?

spirit · Aug 9, 2021

tisc0 said:
Hi,
Trying to remove a subnet in SDN, with following error :

delete sdn subnet object failed: cannot delete subnet '10.26.0.0/24', not empty (500)

I did a great grep of the vnet it's supposed to use in /etc/pve/nodes but it seems no guest is using it.

Any idea ?

Edit: I deleted all entries directly in /etc/pve/sdn/subnets.cfg and it worked. Is the error expected behavior ?

pveversion -v ?

Do you have a gateway defined on the subnet ? (maybe try to remove it first, this was a bug fixed recently, don't remember when exactly)

tisc0 · Aug 9, 2021

root@ahuntz:~# pveversion -v
proxmox-ve: 6.4-1 (running kernel: 5.4.128-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-5.4: 6.4-5
pve-kernel-helper: 6.4-5
pve-kernel-5.4.128-1-pve: 5.4.128-1
pve-kernel-5.4.124-1-pve: 5.4.124-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve4~bpo10
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-3
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-network-perl: 0.6.0
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
openvswitch-switch: 2.12.3-1
proxmox-backup-client: 1.1.12-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.2-4
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.5-pve1~bpo10+1

&& No gateway were defined on the subnet

spirit · Aug 10, 2021

tisc0 said:
root@ahuntz:~# pveversion -v proxmox-ve: 6.4-1 (running kernel: 5.4.128-1-pve) pve-manager: 6.4-13 (running version: 6.4-13/9f411e79) pve-kernel-5.4: 6.4-5 pve-kernel-helper: 6.4-5 pve-kernel-5.4.128-1-pve: 5.4.128-1 pve-kernel-5.4.124-1-pve: 5.4.124-2 ceph-fuse: 12.2.11+dfsg1-2.1+b1 corosync: 3.1.2-pve1 criu: 3.11-3 glusterfs-client: 5.5-3 ifupdown: residual config ifupdown2: 3.0.0-1+pve4~bpo10 ksm-control-daemon: 1.3-1 libjs-extjs: 6.0.1-10 libknet1: 1.20-pve1 libproxmox-acme-perl: 1.1.0 libproxmox-backup-qemu0: 1.1.0-1 libpve-access-control: 6.4-3 libpve-apiclient-perl: 3.1-3 libpve-common-perl: 6.4-3 libpve-guest-common-perl: 3.1-5 libpve-http-server-perl: 3.2-3 libpve-network-perl: 0.6.0 libpve-storage-perl: 6.4-1 libqb0: 1.0.5-1 libspice-server1: 0.14.2-4~pve6+1 lvm2: 2.03.02-pve4 lxc-pve: 4.0.6-2 lxcfs: 4.0.6-pve1 novnc-pve: 1.1.0-1 openvswitch-switch: 2.12.3-1 proxmox-backup-client: 1.1.12-1 proxmox-mini-journalreader: 1.1-1 proxmox-widget-toolkit: 2.6-1 pve-cluster: 6.4-1 pve-container: 3.3-6 pve-docs: 6.4-2 pve-edk2-firmware: 2.20200531-1 pve-firewall: 4.1-4 pve-firmware: 3.2-4 pve-ha-manager: 3.1-1 pve-i18n: 2.3-1 pve-qemu-kvm: 5.2.0-6 pve-xtermjs: 4.7.0-3 qemu-server: 6.4-2 smartmontools: 7.2-pve2 spiceterm: 3.1-1 vncterm: 1.6-2 zfsutils-linux: 2.0.5-pve1~bpo10+1

&& No gateway were defined on the subnet

ok, this is a bug fixed in 0.6.1, but it's only for proxmox7. (new updates will be only provide for proxmox7 as it still beta)

https://git.proxmox.com/?p=pve-network.git;a=commit;h=34c4c6d74fd8245d21828231f895808d8649f965

tisc0 · Aug 16, 2021

Hi guyz,
Glad to talk here again

(but maybe I shouldn't?)

I just integrated 2 nodes in a cluster with SDN. The first one have been a little capricious, and a reboot (or proper firewall rulez 0 : - )) deployed my SDN conf on it.

The second one is still stuck on pending status, despite reboot, may clic on deploy, restart pve-cluster.
The IP of that node, on the right network (which is workingly pinging its pairs) is in the peer list of my ZONES.

I did not want to create /etc/network/interfaces.d/sdn manually, but maybe I should ? Thinking that might not solve the problem... (though WD40 is really helping my old truck to start when it's a bit hard ^^)

Actually I did it... and of course didn't solve the fact that while applying the network, my node is not in the list of `reloadnetworkall` tasks, but in the list in SDN hosts, pending status.

libpve-network-perl and ifupdown2 are installed.

Forgot something ?
Thanks !

spirit · Aug 16, 2021

tisc0 said:
Hi guyz,
Glad to talk here again (but maybe I shouldn't?)

I just integrated 2 nodes in a cluster with SDN. The first one have been a little capricious, and a reboot (or proper firewall rulez 0 : - )) deployed my SDN conf on it.

so, firewall problem ? can you give more details ?

tisc0 said:
The second one is still stuck on pending status, despite reboot, may clic on deploy, restart pve-cluster.
The IP of that node, on the right network (which is workingly pinging its pairs) is in the peer list of my ZONES.

I did not want to create /etc/network/interfaces.d/sdn manually, but maybe I should ? Thinking that might not solve the problem... (though WD40 is really helping my old truck to start when it's a bit hard ^^)

no, you don't need to create /etc/network/interfaces.d/sdn manually.
just to be sure, it's not created currently ?

when you apply config, do you see any error in the global "reloadnetworkall" task ?
do you see a task "SRV networking - reload" for the second node ?

tisc0 said:
Actually I did it... and of course didn't solve the fact that while applying the network, my node is not in the list of `reloadnetworkall` tasks, but in the list in SDN hosts, pending status.

libpve-network-perl and ifupdown2 are installed.

Forgot something ?

do you have "source /etc/network/interfaces.d/* " in /etc/network/interfaces ?
(if not, you could see zones in pending state, even if e /etc/network/interfaces.d/sdn is generated)

tisc0 · Aug 16, 2021

Hi @spirit,

I've been optimistic, thinking I solved the first node SDN problem with missing firewall rules. Actually, there are unconsistency in our firewall conf, since nodes without the rules supposedly needed, get the SDN deployed locally. Seems somewhere in the doc, a colleague told me cluster functioning rules are not to be defined manuallay, but managed by the cluster, agreed ?

Yes, the line 'source...' in interfaces is there, and I deleted the sdn file put manually : it's not getting updated anyway while push another deploy.
And no error in the task list while applying SDN deploy, but indeed no line for our second node.

Digging.

spirit · Aug 16, 2021

tisc0 said:
Hi @spirit,

I've been optimistic, thinking I solved the first node SDN problem with missing firewall rules. Actually, there are unconsistency in our firewall conf, since nodes without the rules supposedly needed, get the SDN deployed locally. Seems somewhere in the doc, a colleague told me cluster functioning rules are not to be defined manuallay, but managed by the cluster, agreed ?

tisc0 said:
Yes, the line 'source...' in interfaces is there, and I deleted the sdn file put manually : it's not getting updated anyway while push another deploy.
And no error in the task list while applying SDN deploy, but indeed no line for our second node.

mmm, that's strange, it's like the configuration generation is not called on the second node.
when you do "apply sdn", the node where you are logged, is calling the second node with command:
"'pvesh set /nodes/$secondnodename/network"

it's done through the pveproxy tcp/8006 port. (It really need to be open across nodes, and ssh port too)

can you try this command manually ?

tisc0 · Aug 16, 2021

"'pvesh set /nodes/$secondnodename/network"

It worked \o/
And now "apply" does work for that node too.
Thanks a lot !
But... I'd like to understand where/how it got messed. An idea ?

spirit · Aug 16, 2021

tisc0 said:
It worked \o/
And now "apply" does work for that node too.
Thanks a lot !
But... I'd like to understand where/how it got messed. An idea ?

mmm, I really don't understand, sorry ... the apply button really only launch this command for each node, nothing else ...

frybin · Aug 16, 2021

Hello,

I am back at it again with another issue.

Yesterday I was having an issue where my PVE was unable to access a OpenID inside one of it's SDNs, where I realized it could not ssh to hosts inside a SDN Network but they could ssh to it. (Seems like the PVE packets never reached the vrf_evpnzone interface) So, before posting in the forms I saw that there were pakage updates that I can apply to my host so I tried that.

I applied the updates and now, my networks are unable to communicate outside of the host again. Specifically, the VMs in the 10.2.0.0/24 network can reach out, and the packets leave the PVE Host but when the PVE host recives the reply packets on it's interface, it does not forward those packets to the SDN network esentially causing no outside network connectivity. Please let me know what else is needed to help troubleshoot this.

dpkg -l|grep frr:

Code:

ii  frr                                  7.5.1-1.1                      amd64        FRRouting suite of internet protocols (BGP, OSPF, IS-IS, ...)
ii  frr-pythontools                      7.5.1-1.1                      all          FRRouting suite - Python tools

proxmox 7.0 sdn beta test

Member

Renowned Member

Member

Distinguished Member

Active Member

Distinguished Member

Active Member

Distinguished Member

Member

Active Member

Distinguished Member

Active Member

Distinguished Member

Active Member

Distinguished Member

Active Member

Distinguished Member

Active Member

Distinguished Member

Member

We value your privacy