Problem when activating the global firewall

jalet

New Member
Mar 23, 2026
3
0
1
I've got a working 6 nodes stretched Proxmox VE9.1 + Ceph cluster, with hosts split over two datacenters (3 in each). In a third datacenter I've got a Proxmox VE 9.1 virtual machine (on vsphere) which acts as proxmox + ceph tie-breaker.

Each host has 4x25 Gbits/s interfaces in LACP bond defined as below. There's no physical support for a dedicated corosync network and we don't want to sacrifice two 25 Gbits/s ports for corosync, so corosync also runs on the bond and we know it's bad, but we are reusing old hardware to do a POC so it will be fine. Below an excerpt from /etc/network/interfaces on one of the 6 nodes) :

Code:
auto bond0
iface bond0 inet manual
    bond-slaves nic0 nic1 nic2 nic3
    bond-miimon 100
    bond-mode 802.3ad
    bond-xmit-hash-policy layer3+4
    mtu 9000
    bond-lacp-rate fast
#Aggrégat LACP

auto vmbr0
iface vmbr0 inet static
    address 10.250.42.11/24
    gateway 10.250.42.1
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-789 793-841 843-4094
    bridge-pvid 1
#Bridge + Management

auto vlan790
iface vlan790 inet static
    address 10.244.31.131/25
    mtu 9000
    vlan-raw-device bond0
#Corosync

auto vlan791
iface vlan791 inet static
    address 10.244.32.11/25
    mtu 9000
    vlan-raw-device bond0
    post-up ip route add 10.244.74.0/25 via 10.244.32.1
    post-down ip route del 10.244.74.0/25
#Ceph Frontend

auto vlan792
iface vlan792 inet static
    address 10.244.32.131/25
    mtu 9000
    vlan-raw-device bond0
#Ceph Backend

All is fine and good until we want to activate the firewall with the following /etc/pve/firewall/cluster.fw file, at which time Ceph goes down in flames but it seems that the nodes remain accessible from the GUI. Hopefully it's very robust and as soon as we delete /etc/pve/firewall/cluster.fw Ceph heals correctly.

Code:
[OPTIONS]
enable: 1
policy_in: DROP
policy_out: ACCEPT
policy_forward: ACCEPT

# For SysAdmins
[IPSET management]
10.165.0.0/24

# SSH / GUI network
[IPSET cluster_mgmt]
10.250.42.0/24

# Corosync network
[IPSET cluster_sync]
10.244.31.128/25

# Ceph Public network including tie-breaker network on 3rd site
[IPSET cluster_fceph]
10.244.32.0/25
10.244.74.0/25

# Ceph Cluster network
[IPSET cluster_bceph]
10.244.32.128/25

[RULES]
IN ACCEPT -i lo
IN Ping(ACCEPT)
IN ACCEPT -m conntrack --ctstate ESTABLISHED,RELATED
# Proxmox VE management network to management network
IN ACCEPT -i vmbr0 -source +cluster_mgmt -destination +cluster_mgmt
# Corosync to Corosync network
IN ACCEPT -i vlan790@bond0 -source +cluster_sync -destination +cluster_sync
# Ceph Public to Ceph Public network (including tie-breaker)
IN ACCEPT -i vlan791@bond0 -source +cluster_fceph -destination +cluster_fceph
# Ceph Cluster network to Ceph Cluster network
IN ACCEPT -i vlan792@bond0 -source +cluster_bceph -destination +cluster_bceph
# Ceph Cluster to Ceph Public network (is this needed at all ?)
IN ACCEPT -i vlan791@bond0 -source +cluster_bceph -destination +cluster_fceph
# Ceph Public to Ceph Cluster network (is this needed at all, Ceph's doc doesn't seem to imply so ?)
IN ACCEPT -i vlan792@bond0 -source +cluster_fceph -destination +cluster_bceph

From reading the documentation I believe pve-firewall will automatically add the rules to allow sysadmins to use the GUI/SSH.

So what is missing that causes Ceph to stop working as soon as this is activated ?
 
Last edited:
@dietmar not sure, ip addr returns the names as above, so really I don't know for sure. I seem to remember having tried both but I'll try again tomorrow. Thanks for your feedback.
 
@dietmar you were mostly right, but in fact there were other problems.

It seems that the config parser is very very very picky about syntax, and it doesn't seem to match the official documentation. I'd say it's buggy...
Also I need to restart the pve-firewall service in order for the syntax errors log to land into the systemd service logs, otherwise I'm blind.

For example if I use --iface in a rule, it get ignored, and instead I must use -i

The correct "working" (it seems) configuration is as follows :

Code:
[OPTIONS]
enable: 1
policy_in: DROP
policy_out: ACCEPT
policy_forward: ACCEPT

# For SysAdmins, automatically used
[IPSET management]
10.165.0.0/24

# GUI / SSH
[IPSET cluster_mgmt]
10.250.42.0/24

# Corosync
[IPSET cluster_sync]
10.244.31.128/25

# Ceph Public network, including tie-breaker network on 3rd site
[IPSET cluster_fceph]
10.244.32.0/25
10.244.74.0/25

# Ceph Cluster network
[IPSET cluster_bceph]
10.244.32.128/25

[RULES]
# Allow all on loopback
IN ACCEPT -i lo
# Allow all ICMP
IN ACCEPT -p icmp
# From in-band management to in-band management
IN ACCEPT -i vmbr0 --source +cluster_mgmt --dest +cluster_mgmt
# From corosync to corosync
IN ACCEPT -i vlan790 --source +cluster_sync --dest +cluster_sync
# From Ceph Public to Ceph public, including tie-breaker
IN ACCEPT -i vlan791 --source +cluster_fceph --dest +cluster_fceph
# From Ceph Cluster to Ceph Cluster
IN ACCEPT -i vlan792 --source +cluster_bceph --dest +cluster_bceph
#
# Are these two really needed ???
# From Ceph Cluster to Ceph Public
IN ACCEPT -i vlan791 --source +cluster_bceph --dest +cluster_fceph
# From Ceph Public to Ceph Cluster
IN ACCEPT -i vlan792 --source +cluster_fceph --dest +cluster_bceph

Anyway thanks a lot for your help, it seems to work fine now.