On an old installation I had PVE and CEPH in the same network. To improve performance and security, I'm currently separating networks more strict.
The first step I'm trying is to separate the cluster network from the PVE network. I was following the https://forum.proxmox.com/threads/how-to-change-ceph-internal-cluster-network.132513/post-583372 and have the same situation at the end, as the original poster had.
Details: I has migration from 10.0.20.* -> 10.0.7.* addresses. Running `ss -tulpn | grep ceph` I see many connections to both the addresses (this is just a example of some lines listing osd, mds and mgr users):
I have so far mot done any changes and the ceph status is OK:
What I have recognized is, that restarting the OSDs shows the following syslog:
Please help me with the following questions:
The first step I'm trying is to separate the cluster network from the PVE network. I was following the https://forum.proxmox.com/threads/how-to-change-ceph-internal-cluster-network.132513/post-583372 and have the same situation at the end, as the original poster had.
Details: I has migration from 10.0.20.* -> 10.0.7.* addresses. Running `ss -tulpn | grep ceph` I see many connections to both the addresses (this is just a example of some lines listing osd, mds and mgr users):
Code:
tcp LISTEN 0 512 10.0.7.1:6801 0.0.0.0:* users:(("ceph-osd",pid=1276995,fd=21))
tcp LISTEN 0 512 10.0.20.21:3300 0.0.0.0:* users:(("ceph-mon",pid=6109,fd=25))
tcp LISTEN 0 512 10.0.20.21:6826 0.0.0.0:* users:(("ceph-osd",pid=1276995,fd=18))
tcp LISTEN 0 512 10.0.20.21:6819 0.0.0.0:* users:(("ceph-osd",pid=1277894,fd=19))
tcp LISTEN 0 512 10.0.20.21:6816 0.0.0.0:* users:(("ceph-mds",pid=371311,fd=18))
tcp LISTEN 0 512 10.0.20.21:6817 0.0.0.0:* users:(("ceph-mds",pid=371311,fd=19))
tcp LISTEN 0 512 10.0.20.21:6820 0.0.0.0:* users:(("ceph-osd",pid=1277894,fd=22))
tcp LISTEN 0 512 10.0.20.21:6821 0.0.0.0:* users:(("ceph-osd",pid=1277894,fd=23))
tcp LISTEN 0 512 10.0.20.21:6834 0.0.0.0:* users:(("ceph-mgr",pid=6293,fd=22))
tcp LISTEN 0 512 10.0.20.21:6835 0.0.0.0:* users:(("ceph-mgr",pid=6293,fd=23))
I have so far mot done any changes and the ceph status is OK:
Code:
# pveceph status
cluster:
id: dcf54fb4-90fa-4981-8610-be1a05c9067a
health: HEALTH_OK
services:
mon: 3 daemons, quorum pm-1,pm-3,pm-2 (age 19h)
mgr: pm-1(active, since 26h), standbys: pm-3, pm-2
mds: 1/1 daemons up, 2 standby
osd: 12 osds: 12 up (since 11m), 12 in (since 19h)
data:
volumes: 1/1 healthy
pools: 8 pools, 201 pgs
objects: 104.96k objects, 400 GiB
usage: 1.1 TiB used, 7.6 TiB / 8.7 TiB avail
pgs: 201 active+clean
io:
client: 0 B/s rd, 382 KiB/s wr, 0 op/s rd, 20 op/s wr
What I have recognized is, that restarting the OSDs shows the following syslog:
Code:
Nov 08 13:40:39 pm-1 systemd[1]: Stopping ceph-osd@4.service - Ceph object storage daemon osd.4...
Nov 08 13:40:39 pm-1 ceph-osd[1278745]: 2024-11-08T13:40:39.317+0100 7547708006c0 -1 received signal: Terminated from /sbin/init (PID: 1) UID: 0
Nov 08 13:40:39 pm-1 ceph-osd[1278745]: 2024-11-08T13:40:39.317+0100 7547708006c0 -1 osd.4 36817 *** Got signal Terminated ***
Nov 08 13:40:39 pm-1 ceph-osd[1278745]: 2024-11-08T13:40:39.317+0100 7547708006c0 -1 osd.4 36817 *** Immediate shutdown (osd_fast_shutdown=true) ***
Nov 08 13:40:40 pm-1 kernel: libceph (dcf54fb4-90fa-4981-8610-be1a05c9067a e36818): osd4 down
Nov 08 13:40:40 pm-1 systemd[1]: ceph-osd@4.service: Deactivated successfully.
Nov 08 13:40:40 pm-1 systemd[1]: Stopped ceph-osd@4.service - Ceph object storage daemon osd.4.
Nov 08 13:40:40 pm-1 systemd[1]: ceph-osd@4.service: Consumed 7.816s CPU time.
Nov 08 13:40:40 pm-1 systemd[1]: Starting ceph-osd@4.service - Ceph object storage daemon osd.4...
Nov 08 13:40:40 pm-1 systemd[1]: Started ceph-osd@4.service - Ceph object storage daemon osd.4.
Nov 08 13:40:40 pm-1 pvedaemon[1264263]: <root@pam> end task UPID:pm-1:0013B305:00E6B52B:672E06C7:srvrestart:osd.4:root@pam: OK
Nov 08 13:40:44 pm-1 ceph-osd[1291059]: 2024-11-08T13:40:44.330+0100 7209c44913c0 -1 osd.4 36817 log_to_monitors true
Nov 08 13:40:45 pm-1 ceph-osd[1291059]: 2024-11-08T13:40:45.200+0100 7209b54006c0 -1 osd.4 36817 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory
Nov 08 13:40:47 pm-1 kernel: libceph (dcf54fb4-90fa-4981-8610-be1a05c9067a e36821): osd4 up
Please help me with the following questions:
- Is everything fine here?
- Is it recommended to migrate the monitors, managers and metadata servers to the CEPH-only network, too?