PVE 5.4 Ceph monitor all show unknow

parker0909

Well-Known Member
Aug 5, 2019
82
0
46
36
Hello All,

I got the problem in PVE 5.4 Ceph page. I found that one host showing the unknown in the Ceph monitor page, but we got the correct value in other host.
The problem also cause the Datacenter summary page seem showing the incorrect storage quota in the Datacenter dashboard page.


May i know any suggest to fix the problem?

Thank you.
Parker
 

Attachments

  • correct_ceph.png
    correct_ceph.png
    24.6 KB · Views: 63
  • unknown_incorrect_ceph.png
    unknown_incorrect_ceph.png
    28.9 KB · Views: 57
  • Ceph_size.png
    Ceph_size.png
    11.2 KB · Views: 53
  • Datacenter_Summary_page.png
    Datacenter_Summary_page.png
    33.8 KB · Views: 54
Hi,

please reduce the mon count to three.
More then three mon is wast in such small setups.
Huge setups like Cern got five monitors per cluster.
The problem the cost per mon are exponential.

The storage utilization from the data center summery can be configured if you click the gear left beside the login button.
 
Thank you.

I got the problem when i tried to remove one host in the ceph monitor page. All host became to unknown and i can do any action.

I got the "entry has no host" message when i pressed any stop or remove button.

Parker
 

Attachments

  • Entry has no host.png
    Entry has no host.png
    38.5 KB · Views: 29
Dear All,

Sorry. we got the urgent case. we found that we can add the host back after i run the command ceph mon rm xxx.
We cant to get ceph storage information now. I got the "monitor "mon.xxxx" already exists message. May i know any suggestion we can fix the problem ?

Parker
 
This sounds more like a network issue. Check your network, every node needs to access the others.
 
yes. all nodes can access to other together. The ceph osd can use for all VM ,but i can't load the ceph page and i can't add node back to mon because the UI said that the "monitor 'mon.cccs01' already exists (500) ". I think it is because we use ceph mon rm command to remove all node. May i know any method can add the node back? Thank you.




# ceph status

ERROR: command 'ceph status' failed: got timeout


# ceph osd status

ERROR: command 'ceph osd status' failed: got timeout


# ceph df

ERROR: command 'ceph df' failed: got timeout


# pveceph status
got timeout

# pveceph lspools
got timeout
 
The problem seem really urgent now . All node can get the ceph status.. I not sure how can we add node back to mon...

Thank you.

Parker
 
we go the problem when i process the ceph command didn't have any response with error Cluster connection interrupted or time out.
May i know any thing we can do to fix the problem without data lost?

Thank you.
 
As said before, there is a network problem. If no MON is available, the ceph related commands won't work. At least one MON needs to work, otherwise the data is lost. And a recovery will be almost merely impossible.
 
As said before, there is a network problem. If no MON is available, the ceph related commands won't work. At least one MON needs to work, otherwise the data is lost. And a recovery will be almost merely impossible.
Thank you. We would like anything we can do in next step.

I can confirm that the network status is normal, both hosts can connected to other. But i see that you said that data is lost. What is meaning is lost and data,it is any other method can backup currently data.

I found that there still have one mon is running but the ip address is not correct for public ip addaress.I have attached two print screen about the only mon node
May i know it is possible to use this mon to recovery?

If we cant recovery any data. May i know the reinstall the ceph cluster procedure ?

Sorry for urgent case to bother you again. Thank you.

Regards,
Parker
 

Attachments

  • conf.png
    conf.png
    24.7 KB · Views: 21
  • mon.png
    mon.png
    28.6 KB · Views: 17
May be i provide the full picture i have complete before.

1. I found that cccs02 - Node 2 show all nodes is unknown,so we try to remove this mon from the mon page.
2. after i removed the node 2, all other nodes became to show the unknown and i can't stop or remove in the Cli.
3. i tried to add back the node 2 but the ip address seem didnt correct in mon portal.
4. After that i try to use "ceph mon rm" command to remove other nodes,but it seem the problem didn't solve.
5. Finally, i cant connect the ceph cluster and all command became to time out.

I hope the problem can be solve or another alternative solution can apply. Thank you for your help.

Parker
 
if we need to restore vzdump from nfs server. May i know it is possible to restore to outside this cluster PVE and different version PVE.

we currently using the PVE 5.4 .so we would like to know it is possible to restore to PVE 6.1. Thank you.

Parker
 
Hi sir, That is quite urgent. May I know anything we can do now? Thank a lot.
Anyone can help?
 
we currently using the PVE 5.4 .so we would like to know it is possible to restore to PVE 6.1. Thank you.

yes, you can restore vzdump backups to a current Proxmox VE 6.1
 
Hi Tom,

That is the only way ?

We cannot make the ceph up and run again? As for now those VM is still running, so is it anyway can make the Ceph cluster back to work ?
And what we do before is only remove the MON which show unknown in the status , why all MON will become unknown suddenly ?

And also if we process the restore from backup, that mean all our data in this moment will be lose ? Thank a lot.
 
We cannot make the ceph up and run again?

I did not went through all your post, I just answered a single questions.

Fixing Ceph should be possible, if you cannot get it running by yourself I suggest you get in touch with our Enterprise Support team and they can analyse the issues.
 
I did not went through all your post, I just answered a single questions.

Fixing Ceph should be possible, if you cannot get it running by yourself I suggest you get in touch with our Enterprise Support team and they can analyse the issues.
Thank you for your reply. I have received the message we can recovery the ceph cluster when all mon have removed. so if there have any method can Fixing Ceph. it is very great for us. May i know some information to fixing the ceph?
we can furt
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!