How can I configure ceph to completely disable msgr1 and use only msgr2?

patchouli

New Member
Jul 2, 2024
4
1
3
patchouliknowled.ge
Hello, I want to configure ceph to use msgr2 and not to use msgr1, to encrypt ceph traffic.

So I firstly set ms_bind_msgr1 = false and ms_bind_msgr2 = true into /etc/ceph/ceph.conf under the section [global], and changed IP addresses into v2-only addresses.
The full configuration is:
Code:
[global]
    auth_client_required = cephx
    auth_cluster_required = cephx
    auth_service_required = cephx
    cluster_network = 192.168.0.0/20
    fsid = [REDACTED]
    mon_allow_pool_delete = true
    mon_host = v2:192.168.15.0 v2:192.168.15.2 v2:192.168.15.4
    ms_bind_ipv4 = true
    ms_bind_ipv6 = false
    ms_bind_msgr1 = false
    ms_bind_msgr2 = true
    osd_pool_default_min_size = 2
    osd_pool_default_size = 3
    public_network = 192.168.0.0/20

[client]
    keyring = /etc/pve/priv/$cluster.$name.keyring

[client.crash]
    keyring = /etc/pve/ceph/$cluster.$name.keyring

[mds]
    keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.node1]
    host = node1
    mds_standby_for_name = pve
    mds_standby_replay = true

[mds.node2]
    host = node2
    mds_standby_for_name = pve
    mds_standby_replay = true

[mds.node3]
    host = node3
    mds_standby_for_name = pve
    mds_standby_replay = true

[mon.node1]
    public_addr = v2:192.168.15.0

[mon.node2]
    public_addr = v2:192.168.15.2

[mon.node3]
    public_addr = v2:192.168.15.4

I checked the health of ceph cluster showing HEALTH_OK and then I tried to create cephfs.
But it failed, showing
TASK ERROR: adding storage for CephFS '[REDACTED]' failed, check log and add manually! create storage failed: mount error: Job failed. See "journalctl -xe" for details.

I checked journalctl and I can find some lines saying:
Code:
Jan 27 02:06:01 node3 kernel: libceph: mon0 (1)192.168.15.0:6789 socket closed (con state V1_BANNER)
Jan 27 02:06:02 node3 kernel: libceph: mon0 (1)192.168.15.0:6789 socket closed (con state V1_BANNER)
Jan 27 02:06:02 node3 kernel: libceph: mon0 (1)192.168.15.0:6789 socket closed (con state V1_BANNER)
Jan 27 02:06:03 node3 kernel: libceph: mon0 (1)192.168.15.0:6789 socket closed (con state V1_BANNER)
Jan 27 02:06:04 node3 kernel: libceph: mon2 (1)192.168.15.4:6789 socket error on write
Jan 27 02:06:05 node3 kernel: libceph: mon2 (1)192.168.15.4:6789 socket error on write
Jan 27 02:06:05 node3 kernel: libceph: mon2 (1)192.168.15.4:6789 socket error on write
Jan 27 02:06:06 node3 kernel: libceph: mon2 (1)192.168.15.4:6789 socket error on write

(skipped some repeated lines)

Jan 27 02:06:56 node3 mount[185820]: mount error: no mds (Metadata Server) is up. The cluster might be laggy, or you may not be authorized
Jan 27 02:06:56 node3 kernel: ceph: No mds server is up or the cluster is laggy
Jan 27 02:06:56 node3 systemd[1]: mnt-pve-[REDACTED].mount: Mount process exited, code=exited, status=32/n/a
Jan 27 02:06:56 node3 systemd[1]: mnt-pve-[REDACTED].mount: Failed with result 'exit-code'.
Jan 27 02:06:56 node3 systemd[1]: Failed to mount mnt-pve-[REDACTED].mount - /mnt/pve/[REDACTED].
Jan 27 02:06:56 node3 pvedaemon[185675]: adding storage for CephFS '[REDACTED]' failed, check log and add manually! create storage failed: mount error: 'file-storage_cfg'-locked command timed out - aborting
Jan 27 02:06:56 node3 pvedaemon[158927]: <root@pam> end task UPID:node3:0002D54B:00191EBA:67966B6C:cephfscreate:[REDACTED]:root@pam: adding storage for CephFS '[REDACTED]' failed, check log and add manually! create storage failed: mount error: 'file-storage_cfg'-locked command timed out - aborting
Considering above logs, it seems that ceph is still using msgr1 to communicate between monitors.

How can I solve this problem?
 
Last edited: