Problem when activating the global firewall

jalet

New Member
Mar 23, 2026
2
0
1
I've got a working 6 nodes stretched Proxmox VE9.1 + Ceph cluster, with hosts split over two datacenters (3 in each). In a third datacenter I've got a Proxmox VE 9.1 virtual machine (on vsphere) which acts as proxmox + ceph tie-breaker.

Each host has 4x25 Gbits/s interfaces in LACP bond defined as below. There's no physical support for a dedicated corosync network and we don't want to sacrifice two 25 Gbits/s ports for corosync, so corosync also runs on the bond and we know it's bad, but we are reusing old hardware to do a POC so it will be fine. Below an excerpt from /etc/network/interfaces on one of the 6 nodes) :

Code:
auto bond0
iface bond0 inet manual
    bond-slaves nic0 nic1 nic2 nic3
    bond-miimon 100
    bond-mode 802.3ad
    bond-xmit-hash-policy layer3+4
    mtu 9000
    bond-lacp-rate fast
#Aggrégat LACP

auto vmbr0
iface vmbr0 inet static
    address 10.250.42.11/24
    gateway 10.250.42.1
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-789 793-841 843-4094
    bridge-pvid 1
#Bridge + Management

auto vlan790
iface vlan790 inet static
    address 10.244.31.131/25
    mtu 9000
    vlan-raw-device bond0
#Corosync

auto vlan791
iface vlan791 inet static
    address 10.244.32.11/25
    mtu 9000
    vlan-raw-device bond0
    post-up ip route add 10.244.74.0/25 via 10.244.32.1
    post-down ip route del 10.244.74.0/25
#Ceph Frontend

auto vlan792
iface vlan792 inet static
    address 10.244.32.131/25
    mtu 9000
    vlan-raw-device bond0
#Ceph Backend

All is fine and good until we want to activate the firewall with the following /etc/pve/firewall/cluster.fw file, at which time Ceph goes down in flames but it seems that the nodes remain accessible from the GUI. Hopefully it's very robust and as soon as we delete /etc/pve/firewall/cluster.fw Ceph heals correctly.

Code:
[OPTIONS]
enable: 1
policy_in: DROP
policy_out: ACCEPT
policy_forward: ACCEPT

# For SysAdmins
[IPSET management]
10.165.0.0/24

# SSH / GUI network
[IPSET cluster_mgmt]
10.250.42.0/24

# Corosync network
[IPSET cluster_sync]
10.244.31.128/25

# Ceph Public network including tie-breaker network on 3rd site
[IPSET cluster_fceph]
10.244.32.0/25
10.244.74.0/25

# Ceph Cluster network
[IPSET cluster_bceph]
10.244.32.128/25

[RULES]
IN ACCEPT -i lo
IN Ping(ACCEPT)
IN ACCEPT -m conntrack --ctstate ESTABLISHED,RELATED
# Proxmox VE management network to management network
IN ACCEPT -i vmbr0 -source +cluster_mgmt -destination +cluster_mgmt
# Corosync to Corosync network
IN ACCEPT -i vlan790@bond0 -source +cluster_sync -destination +cluster_sync
# Ceph Public to Ceph Public network (including tie-breaker)
IN ACCEPT -i vlan791@bond0 -source +cluster_fceph -destination +cluster_fceph
# Ceph Cluster network to Ceph Cluster network
IN ACCEPT -i vlan792@bond0 -source +cluster_bceph -destination +cluster_bceph
# Ceph Cluster to Ceph Public network (is this needed at all ?)
IN ACCEPT -i vlan791@bond0 -source +cluster_bceph -destination +cluster_fceph
# Ceph Public to Ceph Cluster network (is this needed at all, Ceph's doc doesn't seem to imply so ?)
IN ACCEPT -i vlan792@bond0 -source +cluster_fceph -destination +cluster_bceph

From reading the documentation I believe pve-firewall will automatically add the rules to allow sysadmins to use the GUI/SSH.

So what is missing that causes Ceph to stop working as soon as this is activated ?
 
Last edited:
@dietmar not sure, ip addr returns the names as above, so really I don't know for sure. I seem to remember having tried both but I'll try again tomorrow. Thanks for your feedback.