Moving CEPH to a separate network

Jul 31, 2020
6
5
23
54
On an old installation I had PVE and CEPH in the same network. To improve performance and security, I'm currently separating networks more strict.

The first step I'm trying is to separate the cluster network from the PVE network. I was following the https://forum.proxmox.com/threads/how-to-change-ceph-internal-cluster-network.132513/post-583372 and have the same situation at the end, as the original poster had.

Details: I has migration from 10.0.20.* -> 10.0.7.* addresses. Running `ss -tulpn | grep ceph` I see many connections to both the addresses (this is just a example of some lines listing osd, mds and mgr users):

Code:
tcp   LISTEN 0      512         10.0.7.1:6801       0.0.0.0:*    users:(("ceph-osd",pid=1276995,fd=21))
tcp   LISTEN 0      512       10.0.20.21:3300       0.0.0.0:*    users:(("ceph-mon",pid=6109,fd=25))
tcp   LISTEN 0      512       10.0.20.21:6826       0.0.0.0:*    users:(("ceph-osd",pid=1276995,fd=18))
tcp   LISTEN 0      512       10.0.20.21:6819       0.0.0.0:*    users:(("ceph-osd",pid=1277894,fd=19))
tcp   LISTEN 0      512       10.0.20.21:6816       0.0.0.0:*    users:(("ceph-mds",pid=371311,fd=18))
tcp   LISTEN 0      512       10.0.20.21:6817       0.0.0.0:*    users:(("ceph-mds",pid=371311,fd=19))
tcp   LISTEN 0      512       10.0.20.21:6820       0.0.0.0:*    users:(("ceph-osd",pid=1277894,fd=22))
tcp   LISTEN 0      512       10.0.20.21:6821       0.0.0.0:*    users:(("ceph-osd",pid=1277894,fd=23))
tcp   LISTEN 0      512       10.0.20.21:6834       0.0.0.0:*    users:(("ceph-mgr",pid=6293,fd=22))
tcp   LISTEN 0      512       10.0.20.21:6835       0.0.0.0:*    users:(("ceph-mgr",pid=6293,fd=23))

I have so far mot done any changes and the ceph status is OK:
Code:
# pveceph status
  cluster:
    id:     dcf54fb4-90fa-4981-8610-be1a05c9067a
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum pm-1,pm-3,pm-2 (age 19h)
    mgr: pm-1(active, since 26h), standbys: pm-3, pm-2
    mds: 1/1 daemons up, 2 standby
    osd: 12 osds: 12 up (since 11m), 12 in (since 19h)
 
  data:
    volumes: 1/1 healthy
    pools:   8 pools, 201 pgs
    objects: 104.96k objects, 400 GiB
    usage:   1.1 TiB used, 7.6 TiB / 8.7 TiB avail
    pgs:     201 active+clean
 
  io:
    client:   0 B/s rd, 382 KiB/s wr, 0 op/s rd, 20 op/s wr

What I have recognized is, that restarting the OSDs shows the following syslog:

Code:
Nov 08 13:40:39 pm-1 systemd[1]: Stopping ceph-osd@4.service - Ceph object storage daemon osd.4...
Nov 08 13:40:39 pm-1 ceph-osd[1278745]: 2024-11-08T13:40:39.317+0100 7547708006c0 -1 received  signal: Terminated from /sbin/init  (PID: 1) UID: 0
Nov 08 13:40:39 pm-1 ceph-osd[1278745]: 2024-11-08T13:40:39.317+0100 7547708006c0 -1 osd.4 36817 *** Got signal Terminated ***
Nov 08 13:40:39 pm-1 ceph-osd[1278745]: 2024-11-08T13:40:39.317+0100 7547708006c0 -1 osd.4 36817 *** Immediate shutdown (osd_fast_shutdown=true) ***
Nov 08 13:40:40 pm-1 kernel: libceph (dcf54fb4-90fa-4981-8610-be1a05c9067a e36818): osd4 down
Nov 08 13:40:40 pm-1 systemd[1]: ceph-osd@4.service: Deactivated successfully.
Nov 08 13:40:40 pm-1 systemd[1]: Stopped ceph-osd@4.service - Ceph object storage daemon osd.4.
Nov 08 13:40:40 pm-1 systemd[1]: ceph-osd@4.service: Consumed 7.816s CPU time.
Nov 08 13:40:40 pm-1 systemd[1]: Starting ceph-osd@4.service - Ceph object storage daemon osd.4...
Nov 08 13:40:40 pm-1 systemd[1]: Started ceph-osd@4.service - Ceph object storage daemon osd.4.
Nov 08 13:40:40 pm-1 pvedaemon[1264263]: <root@pam> end task UPID:pm-1:0013B305:00E6B52B:672E06C7:srvrestart:osd.4:root@pam: OK
Nov 08 13:40:44 pm-1 ceph-osd[1291059]: 2024-11-08T13:40:44.330+0100 7209c44913c0 -1 osd.4 36817 log_to_monitors true
Nov 08 13:40:45 pm-1 ceph-osd[1291059]: 2024-11-08T13:40:45.200+0100 7209b54006c0 -1 osd.4 36817 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory
Nov 08 13:40:47 pm-1 kernel: libceph (dcf54fb4-90fa-4981-8610-be1a05c9067a e36821): osd4 up


Please help me with the following questions:
  • Is everything fine here?
  • Is it recommended to migrate the monitors, managers and metadata servers to the CEPH-only network, too?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!