Ceph mons fail with cluster firewall rules

ben90818532

Active Member
Sep 20, 2017
23
1
43
33
Hey folks,

I'm seeing an issue when the cluster firewall is enabled with the Ceph Macro (both in and out).

1) Enable firewall

2) Immediately ceph fills with mons going down and keeps going until the firewall is disabled.

I've even tried setting the default input and output policies to ACCEPT (and disabled all other rules) and it still fails, only when I disable the firewall entirely does it start working again.

Code:
[global]
     auth client required = cephx
     auth cluster required = cephx
     auth service required = cephx
     cluster network = 192.168.7.0/24
     fsid = 174a37a3-2363-41ee-904c-57735daee0fd
     keyring = /etc/pve/priv/$cluster.$name.keyring
     mon allow pool delete = true
     osd journal size = 5120
     osd pool default min size = 2
     osd pool default size = 3
     public network = 192.168.7.0/24

[mds]
     keyring = /var/lib/ceph/mds/ceph-$id/keyring

[osd]
     keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon.server1]
     host = server1
     mon addr = 192.168.7.43:6789

[mon.server2]
     host = server2
     mon addr = 192.168.7.40:6789

[mon.server3]
     host = server3
     mon addr = 192.168.7.41:6789
 
Did you enabled and configure the Firewall in Datacenter and Node level? If yes, please show us the FW config.
 
You gave me a tip (thank you!), firewalls were enabled on all nodes except for one.

I disabled and re-enabled the firewall on each node, now for some reason it works with the firewall enabled using the Ceph macro.
 
With zero changes for the last week, I just logged in today to find Ceph broken (couldn't communicate with other nodes), I disabled the firewall and it all started working again.

What is best practice RE Proxmox hosts and ceph firewall?

I even tried complete allow all access via firewall and it wouldn't work, It only worked with a completely disabled firewall.
 
Bingo, I think I've solved it when I saw this:

https://forum.proxmox.com/threads/host-unreachable-with-firewall.49273/#post-230218

"If you remove the source address it should be any (0.0.0.0/0). If you specify one it is assumed to be /32."

In my ceph config:
cluster network = 192.168.7.0/24
public network = 192.168.7.0/24

In the Ceph firewall rule I had to set the range as 192.168.7.40/24-192.168.7.45/24

From testing I think that's solved the problem.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!