proxmox 7.0 sdn beta test

ok, got it. do you use proxmox firewall on theses nodes ? (I'm not sure from where is coming the tcp reset). The routing seem to be ok.
I don't use the proxmox firewall and have it turned off on the Datacenter and Node Level I think.
 
I don't use the proxmox firewall and have it turned off on the Datacenter and Node Level I think.
Maybe try on the exit node : sysctl -w net.ipv4.conf.all.rp_filter=0

I ll be back from holiday next week, and i ll do more tests
 
Running
Code:
sysctl -w net.ipv4.conf.all.rp_filter=0
on the exit node did not work.
Hi,
I'm back from holiday.

can you try

sysctl -w net.ipv4.tcp_l3mdev_accept=1

on the exit-node, then restart ssh or pveproxy.
Then you should be able to join the exitnode ip from the vm.

(I don't known about other nodes (non exitnodes) of this cluster, do you have problem too ? because it should be routed like yours others clusters nodes.)
 
Hello,
Not sure if I'm supposed to push my specific problem here or create a new topic ?

Let's go, I guess you'll tell me or move it if it's not appropriate.

Last week, I was playing successfully with 2 clusters and SDN vxlan, with vnet non-vlan-aware, and subnets (let's assume I've been reading properly but maybe not perfectly the documentation ? Multiple times, though, and it's quite short).

Today, in another one, freshly and automatically installed by Scaleway (proxmox 6.4-13), and using what they call RPNv2 (supposed to be a VXLAN able to transport whatever we need in it), I get errors while trying to create vNIC in containers or VMs :

Screenshot from 2021-08-02 17-33-47.png
Clic ok, we're back in the config window. Clic ok again :

Screenshot from 2021-08-02 17-33-59.png

Here is the config in the /etc/network/interfaces of the 2 nodes in that cluster :


Bash:
auto lo
iface lo inet loopback

iface ens3f0 inet manual

iface ens3f1 inet manual
        mtu 9000

# WAN IP
auto vmbr0
iface vmbr0 inet static
        address xx.xx.xx.xx/24
        gateway xx.xx.xx.xx
        bridge-ports ens3f0
        bridge-stp off
        bridge-fd 0


# Preparing LAN interface
auto vmbr1
iface vmbr1 inet manual
        bridge-ports ens3f1
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
        mtu 8900

# Attaching a VLAN on vmbr1 - I could attach many, all given by service provider Scaleway
# This is the network used to create the cluster
auto vmbr1.2017
iface vmbr1.2017 inet static
        address 10.20.17.2/24
        mtu 8800

## I also tried with this very straight forward config, but same errors occured:
#auto ens3f1.2017
#iface ens3f1.2017 inet static
#       address 10.20.17.1/24


source /etc/network/interfaces.d/*

On the other node, it's similar, with 10.20.17.1/24 for the LAN (and its own public IP).
This network has bee used to create the cluster and enroll the nodes :

Bash:
root@mynode1:~# pvecm status
Cluster information
-------------------
Name:             ClusterV2
Config Version:   2
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Mon Aug  2 18:10:43 2021
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000001
Ring ID:          1.43
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           2 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.20.17.1 (local)
0x00000002          1 10.20.17.2


I don't get what I did wrong. Only a vnet VLAN-AWARE is working (and then not possible anymore to define subnets).

Thanks for any help, sorry if I didn't give you some crucial material to hemp your understanding, will push whatever you need.
 
@tisc0

can you send /etc/pve/sdn/*.cfg files ?

when you configure non-vlanware vnet (this should be the default anyway, until you want to propage vlan on top of vxlan), do you set any vlan tag in the vm nic options ? (this should be forbid)
 
Hi @spirit !
Thank you, it works. Sorry for that non-sense of mine, I indeed put a VLAN ID in the VM NIC options, and it's actually not forbidden.
Could you help too about the right value of MTU. Our service provider VLAN accept 9000, should I reduce it in the zone params or somewhere else ?
Thanks again
 
Hi @spirit !
Thank you, it works. Sorry for that non-sense of mine, I indeed put a VLAN ID in the VM NIC options, and it's actually not forbidden.
ok. gui still need support for this, I'll try to send patch soon. (and at least, send a correct error message)

Could you help too about the right value of MTU. Our service provider VLAN accept 9000, should I reduce it in the zone params or somewhere else ?
Thanks again
if you use vxlan, you need to lower 50bytes, so 8850 max. you can setup it in the zone, but it should also be done inside the guest. (default is 1500 in guest anyway)
 
  • Like
Reactions: tisc0
Hi,
I'm back from holiday.

can you try

sysctl -w net.ipv4.tcp_l3mdev_accept=1

on the exit-node, then restart ssh or pveproxy.
Then you should be able to join the exitnode ip from the vm.

(I don't known about other nodes (non exitnodes) of this cluster, do you have problem too ? because it should be routed like yours others clusters nodes.)
HI @spirit It ended up working, thanks for the help. I don't have other nodes added to this cluster since I am still testing new features out.
 
  • Like
Reactions: spirit
Hi,
Trying to remove a subnet in SDN, with following error :

delete sdn subnet object failed: cannot delete subnet '10.26.0.0/24', not empty (500)

I did a great grep of the vnet it's supposed to use in /etc/pve/nodes but it seems no guest is using it.

Any idea ?

Edit: I deleted all entries directly in /etc/pve/sdn/subnets.cfg and it worked. Is the error expected behavior ?
 
Last edited:
Hi,
Trying to remove a subnet in SDN, with following error :

delete sdn subnet object failed: cannot delete subnet '10.26.0.0/24', not empty (500)

I did a great grep of the vnet it's supposed to use in /etc/pve/nodes but it seems no guest is using it.

Any idea ?

Edit: I deleted all entries directly in /etc/pve/sdn/subnets.cfg and it worked. Is the error expected behavior ?

pveversion -v ?

Do you have a gateway defined on the subnet ? (maybe try to remove it first, this was a bug fixed recently, don't remember when exactly)
 
root@ahuntz:~# pveversion -v proxmox-ve: 6.4-1 (running kernel: 5.4.128-1-pve) pve-manager: 6.4-13 (running version: 6.4-13/9f411e79) pve-kernel-5.4: 6.4-5 pve-kernel-helper: 6.4-5 pve-kernel-5.4.128-1-pve: 5.4.128-1 pve-kernel-5.4.124-1-pve: 5.4.124-2 ceph-fuse: 12.2.11+dfsg1-2.1+b1 corosync: 3.1.2-pve1 criu: 3.11-3 glusterfs-client: 5.5-3 ifupdown: residual config ifupdown2: 3.0.0-1+pve4~bpo10 ksm-control-daemon: 1.3-1 libjs-extjs: 6.0.1-10 libknet1: 1.20-pve1 libproxmox-acme-perl: 1.1.0 libproxmox-backup-qemu0: 1.1.0-1 libpve-access-control: 6.4-3 libpve-apiclient-perl: 3.1-3 libpve-common-perl: 6.4-3 libpve-guest-common-perl: 3.1-5 libpve-http-server-perl: 3.2-3 libpve-network-perl: 0.6.0 libpve-storage-perl: 6.4-1 libqb0: 1.0.5-1 libspice-server1: 0.14.2-4~pve6+1 lvm2: 2.03.02-pve4 lxc-pve: 4.0.6-2 lxcfs: 4.0.6-pve1 novnc-pve: 1.1.0-1 openvswitch-switch: 2.12.3-1 proxmox-backup-client: 1.1.12-1 proxmox-mini-journalreader: 1.1-1 proxmox-widget-toolkit: 2.6-1 pve-cluster: 6.4-1 pve-container: 3.3-6 pve-docs: 6.4-2 pve-edk2-firmware: 2.20200531-1 pve-firewall: 4.1-4 pve-firmware: 3.2-4 pve-ha-manager: 3.1-1 pve-i18n: 2.3-1 pve-qemu-kvm: 5.2.0-6 pve-xtermjs: 4.7.0-3 qemu-server: 6.4-2 smartmontools: 7.2-pve2 spiceterm: 3.1-1 vncterm: 1.6-2 zfsutils-linux: 2.0.5-pve1~bpo10+1

&& No gateway were defined on the subnet
 
root@ahuntz:~# pveversion -v proxmox-ve: 6.4-1 (running kernel: 5.4.128-1-pve) pve-manager: 6.4-13 (running version: 6.4-13/9f411e79) pve-kernel-5.4: 6.4-5 pve-kernel-helper: 6.4-5 pve-kernel-5.4.128-1-pve: 5.4.128-1 pve-kernel-5.4.124-1-pve: 5.4.124-2 ceph-fuse: 12.2.11+dfsg1-2.1+b1 corosync: 3.1.2-pve1 criu: 3.11-3 glusterfs-client: 5.5-3 ifupdown: residual config ifupdown2: 3.0.0-1+pve4~bpo10 ksm-control-daemon: 1.3-1 libjs-extjs: 6.0.1-10 libknet1: 1.20-pve1 libproxmox-acme-perl: 1.1.0 libproxmox-backup-qemu0: 1.1.0-1 libpve-access-control: 6.4-3 libpve-apiclient-perl: 3.1-3 libpve-common-perl: 6.4-3 libpve-guest-common-perl: 3.1-5 libpve-http-server-perl: 3.2-3 libpve-network-perl: 0.6.0 libpve-storage-perl: 6.4-1 libqb0: 1.0.5-1 libspice-server1: 0.14.2-4~pve6+1 lvm2: 2.03.02-pve4 lxc-pve: 4.0.6-2 lxcfs: 4.0.6-pve1 novnc-pve: 1.1.0-1 openvswitch-switch: 2.12.3-1 proxmox-backup-client: 1.1.12-1 proxmox-mini-journalreader: 1.1-1 proxmox-widget-toolkit: 2.6-1 pve-cluster: 6.4-1 pve-container: 3.3-6 pve-docs: 6.4-2 pve-edk2-firmware: 2.20200531-1 pve-firewall: 4.1-4 pve-firmware: 3.2-4 pve-ha-manager: 3.1-1 pve-i18n: 2.3-1 pve-qemu-kvm: 5.2.0-6 pve-xtermjs: 4.7.0-3 qemu-server: 6.4-2 smartmontools: 7.2-pve2 spiceterm: 3.1-1 vncterm: 1.6-2 zfsutils-linux: 2.0.5-pve1~bpo10+1

&& No gateway were defined on the subnet
ok, this is a bug fixed in 0.6.1, but it's only for proxmox7. (new updates will be only provide for proxmox7 as it still beta)

https://git.proxmox.com/?p=pve-network.git;a=commit;h=34c4c6d74fd8245d21828231f895808d8649f965
 
Hi guyz,
Glad to talk here again :) (but maybe I shouldn't?)

I just integrated 2 nodes in a cluster with SDN. The first one have been a little capricious, and a reboot (or proper firewall rulez 0 : - )) deployed my SDN conf on it.

The second one is still stuck on pending status, despite reboot, may clic on deploy, restart pve-cluster.
The IP of that node, on the right network (which is workingly pinging its pairs) is in the peer list of my ZONES.

I did not want to create /etc/network/interfaces.d/sdn manually, but maybe I should ? Thinking that might not solve the problem... (though WD40 is really helping my old truck to start when it's a bit hard ^^)

Actually I did it... and of course didn't solve the fact that while applying the network, my node is not in the list of `reloadnetworkall` tasks, but in the list in SDN hosts, pending status.

libpve-network-perl and ifupdown2 are installed.

Forgot something ?
Thanks !
 
Last edited:
Hi guyz,
Glad to talk here again :) (but maybe I shouldn't?)

I just integrated 2 nodes in a cluster with SDN. The first one have been a little capricious, and a reboot (or proper firewall rulez 0 : - )) deployed my SDN conf on it.
so, firewall problem ? can you give more details ?

The second one is still stuck on pending status, despite reboot, may clic on deploy, restart pve-cluster.
The IP of that node, on the right network (which is workingly pinging its pairs) is in the peer list of my ZONES.

I did not want to create /etc/network/interfaces.d/sdn manually, but maybe I should ? Thinking that might not solve the problem... (though WD40 is really helping my old truck to start when it's a bit hard ^^)
no, you don't need to create /etc/network/interfaces.d/sdn manually.
just to be sure, it's not created currently ?

when you apply config, do you see any error in the global "reloadnetworkall" task ?
do you see a task "SRV networking - reload" for the second node ?

Actually I did it... and of course didn't solve the fact that while applying the network, my node is not in the list of `reloadnetworkall` tasks, but in the list in SDN hosts, pending status.

libpve-network-perl and ifupdown2 are installed.

Forgot something ?

do you have "source /etc/network/interfaces.d/* " in /etc/network/interfaces ?
(if not, you could see zones in pending state, even if e /etc/network/interfaces.d/sdn is generated)
 
Hi @spirit,

I've been optimistic, thinking I solved the first node SDN problem with missing firewall rules. Actually, there are unconsistency in our firewall conf, since nodes without the rules supposedly needed, get the SDN deployed locally. Seems somewhere in the doc, a colleague told me cluster functioning rules are not to be defined manuallay, but managed by the cluster, agreed ?

Yes, the line 'source...' in interfaces is there, and I deleted the sdn file put manually : it's not getting updated anyway while push another deploy.
And no error in the task list while applying SDN deploy, but indeed no line for our second node.

Digging.
 
Hi @spirit,

I've been optimistic, thinking I solved the first node SDN problem with missing firewall rules. Actually, there are unconsistency in our firewall conf, since nodes without the rules supposedly needed, get the SDN deployed locally. Seems somewhere in the doc, a colleague told me cluster functioning rules are not to be defined manuallay, but managed by the cluster, agreed ?
Yes, the line 'source...' in interfaces is there, and I deleted the sdn file put manually : it's not getting updated anyway while push another deploy.
And no error in the task list while applying SDN deploy, but indeed no line for our second node.
mmm, that's strange, it's like the configuration generation is not called on the second node.
when you do "apply sdn", the node where you are logged, is calling the second node with command:
"'pvesh set /nodes/$secondnodename/network"

it's done through the pveproxy tcp/8006 port. (It really need to be open across nodes, and ssh port too)

can you try this command manually ?
 
"'pvesh set /nodes/$secondnodename/network"
It worked \o/
And now "apply" does work for that node too.
Thanks a lot !
But... I'd like to understand where/how it got messed. An idea ?
 
Hello,

I am back at it again with another issue.

Yesterday I was having an issue where my PVE was unable to access a OpenID inside one of it's SDNs, where I realized it could not ssh to hosts inside a SDN Network but they could ssh to it. (Seems like the PVE packets never reached the vrf_evpnzone interface) So, before posting in the forms I saw that there were pakage updates that I can apply to my host so I tried that.

I applied the updates and now, my networks are unable to communicate outside of the host again. Specifically, the VMs in the 10.2.0.0/24 network can reach out, and the packets leave the PVE Host but when the PVE host recives the reply packets on it's interface, it does not forward those packets to the SDN network esentially causing no outside network connectivity. Please let me know what else is needed to help troubleshoot this.

dpkg -l|grep frr:
Code:
ii  frr                                  7.5.1-1.1                      amd64        FRRouting suite of internet protocols (BGP, OSPF, IS-IS, ...)
ii  frr-pythontools                      7.5.1-1.1                      all          FRRouting suite - Python tools
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!