PVE update broke ipfilter for lxc secondary interfaces

Jesse Norell

Member
Jun 19, 2019
15
2
6
54
I do not find this in the bug tracker or forum so just checking if this is a known issue before filing a bug.

2 weeks ago I updated a small cluster, and found a few containers which previously worked fine were having partial networking issues. The problem ends up affecting only for containers which have multiple network interfaces which have ipfilter configured for them in pve, it seems to allow traffic to the primary interface, but block traffic to all secondary interfaces. Containers with only one interface continue to work fine with ipfilter enabled.

As an example, we have a nameserver with three interfaces/ip addrs:
Code:
id      name    bridge  firewall        vlan tag        mac address     ip address      gateway
net0    eth0    vmbr0   Yes             320             96:67:...       x.x.x.2/26      x.x.x.1
net1    eth1    vmbr0   Yes             320             34:29:...       x.x.x.4/26     
net2    eth2    vmbr0   Yes             320             A6:F9:...       x.x.x.44/26

We have firewall rules for that which allow what we need (dns, ping and ssh IN, and various things OUT), and this has been working fine for months. After the latest update, the .2 ip address was still alive (answered ping, dns queries and ssh), but the .4 address stopped answering anything. I change Firewall > Options > IP filter to No and it starts working again.

I just updated to the latest kernel and the problem persists (likely it's in the firewall rules generation piece, not the kernel...). I can provide more info if useful, but these seem to be the most relevant package versions:

Code:
# dpkg --list | grep '^ii  pve-'
ii  pve-cluster                     5.0-37                         amd64        Cluster Infrastructure for Proxmox Virtual Environment
ii  pve-container                   2.0-39                         all          Proxmox VE Container management tool
ii  pve-docs                        5.4-2                          all          Proxmox VE Documentation
ii  pve-edk2-firmware               1.20190312-1                   all          edk2 based firmware modules for virtual machines
ii  pve-firewall                    3.0-22                         amd64        Proxmox VE Firewall
ii  pve-firmware                    2.0-6                          all          Binary firmware code for the pve-kernel
ii  pve-ha-manager                  2.0-9                          amd64        Proxmox VE HA Manager
ii  pve-i18n                        1.1-4                          all          Internationalization support for Proxmox VE
ii  pve-kernel-4.15                 5.4-4                          all          Latest Proxmox VE Kernel Image
ii  pve-kernel-4.15.18-14-pve       4.15.18-39                     amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-4.15.18-15-pve       4.15.18-40                     amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-4.15.18-16-pve       4.15.18-41                     amd64        The Proxmox PVE Kernel Image
ii  pve-libspice-server1            0.14.1-2                       amd64        SPICE remote display system server library
ii  pve-manager                     5.4-6                          amd64        Proxmox Virtual Environment Management Tools
ii  pve-qemu-kvm                    3.0.1-2                        amd64        Full virtualization on x86 hardware
ii  pve-xtermjs                     3.12.0-1                       amd64        HTML/JS Shell client

I will save iptables rules to a file both when IP filter is enabled and disabled, then review and post those soon.
 
This is the iptables output with IP filter disabled (so all interfaces working) and enabled (only eth0 x.x.x.2 ip address works), on a node with only the single container 138 running. The only difference I see between the two is the inclusion of the rules in the veth138i#-OUT chains to drop anything not matching that interface's ip address, eg.:
Code:
Chain veth138i0-OUT (1 references)
 pkts bytes target     prot opt in     out     source               destination
    0     0 PVEFW-SET-ACCEPT-MARK  udp  --  *      *       0.0.0.0/0            0.0.0.0/0           [goto]  udp spt:68 dpt:67
    0     0 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0            MAC ! 96:67:CB:31:49:7E
    0     0 DROP       all  --  *      *       0.0.0.0/0            0.0.0.0/0            ! match-set PVEFW-138-ipfilter-net0-v4 src

The ipset for each interface (eg. PVEFW-138-ipfilter-net0-v4) includes the correct address for each interface (also attached).

One thing that I do find to be inconsistent/interesting is all traffic appears to be traversing the net0 interface, not net1 or net2. Eg. on the host if I run 'tcpdump -ni veth138i0' I see all traffic for .2, .4 and .44; if I run 'tcpdump -ni veth138i1' I only see broadcast traffic. Likewise from within the container, tcpdump shows all traffic on eth0, with only broadcast traffic coming in eth1 or eth2.

With traffic for all IP's only using that one interface, it would not surprise me to see the rule for PVEFW-138-ipfilter-net0-v4 to show counters increasing while traffic to .4 and .44 is dropped, but that doesn't happen, the counters stay at 0, which I don't understand why (I'm sure I just missed something in my read through the chains).

Related to this, if I ping .2, .4 or .44 from another machine then check arp entries, they all have the same mac address:
Code:
Address                  HWtype  HWaddress           Flags Mask            Iface
x.x.x.2              ether   96:67:cb:31:49:7e   C                     eth0
x.x.x.4              ether   96:67:cb:31:49:7e   C                     eth0
x.x.x.44             ether   96:67:cb:31:49:7e   C                     eth0

Unfortunately I don't have any machines on the older, working version now to compare firewall rules changes, if there even were any. Any ideas to try, or other info I can gather? Need to file a bug in the tracker?

Thanks!
 

Attachments

are those 'x.x.x.x' identical for all three containers? could you include the container and firewall configs and output of 'pve-firewall compile'?
 
are those 'x.x.x.x' identical

Yes, they are 3 ip addresses on the same subnet.

for all three containers?

I assume you mean for all three interfaces (three interfaces, one container).

could you include the container and firewall configs and output of 'pve-firewall compile'?
Sure, /etc/pve/lxc/138.conf:
Code:
arch: amd64
cores: 4
hostname: ns1.kci.net
memory: 2048
net0: name=eth0,bridge=vmbr0,firewall=1,gw=x.x.x.1,hwaddr=96:67:CB:31:49:7E,ip=x.x.x.2/26,tag=31,type=veth
net1: name=eth1,bridge=vmbr0,firewall=1,hwaddr=34:29:9E:01:D3:6F,ip=x.x.x.4/26,tag=31,type=veth
net2: name=eth2,bridge=vmbr0,firewall=1,hwaddr=A6:F9:32:E2:F5:9F,ip=x.x.x.44/26,tag=31,type=veth
onboot: 1
ostype: debian
protection: 1
rootfs: storage-volume-00:vm-138-disk-0,size=8G
swap: 2048
unprivileged: 1
lxc.mount.entry: /dev/random var/spool/postfix/dev/random none bind,ro 0 0
lxc.mount.entry: /dev/urandom var/spool/postfix/dev/urandom none bind,ro 0 0
lxc.mount.entry: /dev/random var/bind9/chroot/dev/random none bind,ro 0 0
lxc.mount.entry: /dev/null var/bind9/chroot/dev/null none bind,ro 0 0
lxc.mount.entry: /sys/kernel/debug sys/kernel/debug none bind,optional 0 0
lxc.mount.entry: /sys/kernel/security sys/kernel/security none bind,optional 0 0
lxc.mount.entry: /sys/fs/pstore sys/fs/pstore none bind,optional 0 0
lxc.mount.entry: /sys/kernel/config sys/kernel/config none bind,optional 0 0
lxc.prlimit.nofile: 1024:65536
lxc.apparmor.profile: generated
lxc.apparmor.allow_nesting: 1

/etc/pve/firewall/138.fw (with ip filter disabled) is:
Code:
[OPTIONS]

log_level_out: alert
log_level_in: alert
ipfilter: 0
enable: 1
policy_out: DROP

[RULES]

OUT DNS(ACCEPT) # DNS server needs to talk everywhere
OUT Rsync(ACCEPT) # rsync for sanesecurity signature updates -- maybe can switch to https ?
OUT SMTP(ACCEPT) -dest +srv_smtp
OUT SSH(ACCEPT) -dest +srv_ssh_backup # backup server ssh
OUT Web(ACCEPT) # web traffic for apt updates - move to proxy
OUT MySQL(ACCEPT) -dest x.x.x.100 # mysql to master
OUT Ping(ACCEPT)
IN DNS(ACCEPT)
IN Ping(ACCEPT)
IN SSH(ACCEPT) -source +management

I'll get the 'pve-firewall compile' output sanitized and include here soon (probably monday).

Thanks!
 
yes, all three interfaces. the pve-firewall output (working and broken) should show exactly what is going on.
 
ok, this is 'pve-firewall compile' output. I notice in it some rules for arp (which I did not see in the iptables output) which might explain the behavior, to wit after I enable ip filter, sometimes the change/problem is almost immediately effective, and sometimes it takes a few seconds (eg. up to 30-50 seconds at times) to exhibit - recent arp entries timing out could certainly be what causes that.
 

Attachments

Doing a little testing, the issue does seem to be related to what interface (and hence mac address?) is used for each ip address. In my case, .2 is the eth0 address, .4 is eth1 and .44 is eth2, and I forced the use of the corresponding interface via source routing (performed inside the container):
Code:
# ip rule add from x.x.x.4/32 table 101
# ip route add default via x.x.x.1 dev eth1 table 101
# ip route add x.x.x.0/26 dev eth1 table 101
# ip rule add from x.x.x.44/32 table 102
# ip route add default via x.x.x.1 dev eth2 table 102
# ip route add x.x.x.0/26 dev eth2 table 102
# echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
# echo 1 > /proc/sys/net/ipv4/conf/default/arp_filter

.2 already used eth0, so it was unchanged. Table numbers are mostly arbitrary but will use one per interface (maybe should move to a higher range, say starting at 10000?). I think the arp_filter settings were unnecessary, but seem correct.

After this, I checked (via tcpdump) that each ip address was using a different interface, and each showed a different mac address on remote machines. I then enabled ip filter for the container, and all ip addresses continued working.

I browsed some of the PVE changelog/commits for the latest release but did not determine what change affected this offhand. In any case, maybe it would be better to have PVE setup the container environment similar to what I did here, source routing traffic for each ip out the corresponding interface, rather than finding what changed and reverting to the old behavior?
 
Last edited:
I guess this broke with the addition of ARP filtering (https://git.proxmox.com/?p=pve-firewall.git;a=commit;h=401c141b36315dbbfaf88b293a44654a8610793b) ?

IMHO the root problem is that you define three interfaces with an address each in the same subnet, instead of having one interface with three addresses. with the later, you'd still need to manually edit the ipset as well to get proper ipfilter support, but then the ipfiltering is not dependent on having the right setup inside the container..

maybe we should add proper support for multiple/additional IPs on container interfaces, at least for distros that support it..
 
maybe we should add proper support for multiple/additional IPs on container interfaces
Seems like a "yes" to me. I don't remember specifically why that was setup the way it was, I think it's simply because it was easy and it worked (and didn't involve .pve-ignore files, etc.).
 
Last edited:
IMHO the root problem is that you define three interfaces with an address each in the same subnet, instead of having one interface with three addresses. with the later, you'd still need to manually edit the ipset as well to get proper ipfilter support, but then the ipfiltering is not dependent on having the right setup inside the container..

That note about manually editing the ipset was key, I just tested that and it is sufficient to get this working now. Ie. in my example, create an ipset 'ipfilter-net0' in the container which contains all 3 ip addresses (x.x.x.2, x.x.x.4, x.x.x.44), then enable ip filtering, and everything is working.

If/when adding proper support for multiple ip's to a container is done, maintaining the ipset with the known ip addresses should probably be done automatically for you, but it's a pretty simple workaround for the current implementation; is there any downside to doing this, or is it simply not what was originally envisioned?

Thanks....
 
Last edited:
That note about manually editing the ipset was key, I just tested that and it is sufficient to get this working now. Ie. in my example, create an ipset 'ipfilter-net0' in the container which contains all 3 ip addresses (x.x.x.2, x.x.x.4, x.x.x.44), then enable ip filtering, and everything is working.

If/when adding proper support for multiple ip's to a container is done, maintaining the ipset with the known ip addresses should probably be done automatically for you, but it's a pretty simple workaround for the current implementation; is there any downside to doing this, or is it simply not what was originally envisioned?

Thanks....

I think the main obstacle is getting multiple IPs on a single interface inside the container is a bit cumbersome in some distros - but I'll put investigating full support on my post-PVE-6.0 todo list ;)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!