New 3 Node Cluster Ceph Mon issue

Frosty81

New Member
Jun 26, 2024
3
0
1
I am new to Proxmox and have been watching videos to help me along the way. I have been able to set up 3 identical hardware nodes, and successfully cluster them together, When i try to combine them into Ceph, i am running into an issue where they seem to be joined, but nodes 2 and 3 show status stopped and i cannot get them to start. The addresses show the correct ip, but they do not list a port number like node 1 does.

For networking i have the management over a 2.5g intel port on all 3.
For clustering/migrating, i set up a :mesh" or "ring" network using the Proxmox guide for a Simple mesh network. This is done over 40gbps cards. Each node can ping the other 2 over the mesh ip addresses, so that part appears to be working. I have not set up any storage yet with regards to vm's or the ceph.

What info can i provide, or where can i start looking to try and track down the issue. I have rebuilt the nodes from scratch numerous times and each time i end up in this same spot.
 
It would be the best to post your pvereport of all the nodse if you can. make sure sensitive-information is removed if you dont want to show the report to everyone. Usually when there are problems in ceph it has these causes:

  • network is not correctly configured (not working, or not same mtu on all interfaces)
  • or ntp: you have not set an ntp server, or ntp can not be used because it is blocked to the wan (default is wan-ntp server from debian)
 
I deleted my ceph setup last night. Let me know if i need to recreate it and run a new report.

I looked into the ntp server stuff but never really came to a consensus of if i should change it from default and if so, what to change it to.
 

Attachments

  • CF01Report.txt
    56.4 KB · Views: 0
  • CF02Report.txt
    62.3 KB · Views: 0
  • CF03Report.txt
    62.1 KB · Views: 0
I reconfigured ceph and it still won't start nodes 2 and 3. I took a stab at configuring a local ntp server, but unsure how to tell if thats actually working. When i click from node to node on the Time tab, they seem to be accurate down to the second if that tab can be trusted. I have run a new pvereport post configuring the ceph again in case that offers more insight.

I see this error message spamming my System Log page on nodes 2 and 3.
"CF03 ceph-mon[26892]: 2024-06-28T23:56:37.570-0500 79d7378006c0 -1 mon.CF03@1(probing) e0 handle_probe require release 18 > 17, or missing features (have 4540138320759226367, required 0, missing 0)"
 

Attachments

  • CF01Report_6_28.txt
    45.1 KB · Views: 0
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!