New 3 Node Cluster Ceph Mon issue

Frosty81

New Member
Jun 26, 2024
4
0
1
I am new to Proxmox and have been watching videos to help me along the way. I have been able to set up 3 identical hardware nodes, and successfully cluster them together, When i try to combine them into Ceph, i am running into an issue where they seem to be joined, but nodes 2 and 3 show status stopped and i cannot get them to start. The addresses show the correct ip, but they do not list a port number like node 1 does.

For networking i have the management over a 2.5g intel port on all 3.
For clustering/migrating, i set up a :mesh" or "ring" network using the Proxmox guide for a Simple mesh network. This is done over 40gbps cards. Each node can ping the other 2 over the mesh ip addresses, so that part appears to be working. I have not set up any storage yet with regards to vm's or the ceph.

What info can i provide, or where can i start looking to try and track down the issue. I have rebuilt the nodes from scratch numerous times and each time i end up in this same spot.
 
It would be the best to post your pvereport of all the nodse if you can. make sure sensitive-information is removed if you dont want to show the report to everyone. Usually when there are problems in ceph it has these causes:

  • network is not correctly configured (not working, or not same mtu on all interfaces)
  • or ntp: you have not set an ntp server, or ntp can not be used because it is blocked to the wan (default is wan-ntp server from debian)
 
I deleted my ceph setup last night. Let me know if i need to recreate it and run a new report.

I looked into the ntp server stuff but never really came to a consensus of if i should change it from default and if so, what to change it to.
 
I reconfigured ceph and it still won't start nodes 2 and 3. I took a stab at configuring a local ntp server, but unsure how to tell if thats actually working. When i click from node to node on the Time tab, they seem to be accurate down to the second if that tab can be trusted. I have run a new pvereport post configuring the ceph again in case that offers more insight.

I see this error message spamming my System Log page on nodes 2 and 3.
"CF03 ceph-mon[26892]: 2024-06-28T23:56:37.570-0500 79d7378006c0 -1 mon.CF03@1(probing) e0 handle_probe require release 18 > 17, or missing features (have 4540138320759226367, required 0, missing 0)"
 
Last edited: