Ceph completely broken - Error got timeout (500)

spamsam

Active Member
Jul 5, 2017
18
0
41
39
I have a test cluster of 3 VMs in virtualbox on my pc.

Their IPs changed as follows:
A: 172.16.0.150 > 192.168.100.1 (NOT a gateway)
A: 172.16.0.151 > 192.168.100.2
A: 172.16.0.152 > 192.168.100.2

Now I get the following message when i run 'service ceph-mon@local status':
Code:
Processor -- bind unable to bindto v2:172.16.0.150:3300/0: (99) Cannot assign requested address.

And this is driving me mad as it downed ceph on all 3 nodes with the same error.

I updated the address in the /etc/ceph/ceph.conf (which is just a symlink to /etc/pve/ceph.conf). It is as follows:
Code:
[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 192.168.100.1/24
     fsid = bbc8efc5-af69-4460-8e18-c5d5e76d0c9e
     mon_allow_pool_delete = true
     mon_host =  192.168.100.1
     ms_bind_ipv4 = true
     ms_bind_ipv6 = false
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public_network = 192.168.100.1/24

[client]
     keyring = /etc/pve/priv/$cluster.$name.keyring

[mon.localhost]
     public_addr = 192.168.100.1

I rebooted the node multiple times, same error. I scrubbed the whole system of every instance of 172.16.0.150, same error.

IT is still a fresh cluster that I am using to test ceph quincy on proxmox 7.3 prior to going into production. Tho at the moment i am weary.

Any advise here?
 
Just as further info, commands like ''ceph -s" hang indefinitely. all pveceph commands say timeout
 
Hi,

Is the above Ceph config is from /etc/pve/ceph.conf? if yes can you compare it with /etc/ceph/ceph.conf as well?

Did you see anything interesting in the ceph monitor logs?
 
Did you find any resolution for this? I am in the same boat and getting help from folks here but wondering what fixed it for you.. if it did.
 
I am using Proxmox Version 8.2.4 ( installed 3 Nodes and install CEPH, it was working.
I purchased three physical hardware same spaces and install the same way. But cant install CEPH
I have tried multiple time. still same

anyone has a solution ?

Screenshot 2024-07-31 at 1.00.28 PM.png
 

Attachments

  • Screenshot 2024-07-31 at 1.00.06 PM.png
    Screenshot 2024-07-31 at 1.00.06 PM.png
    68.1 KB · Views: 15
In the screenshot you attached look like the Ceph reef is installed! Did you re-fresh your browser? What is the output of `pveceph status`?
 
In the screenshot you attached look like the Ceph reef is installed! Did you re-fresh your browser? What is the output of `pveceph status`?
root@pve1:~# pveceph status
Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)')
command 'ceph -s' failed: exit code 1
root@pve1:~#

Did you re-fresh your browser
have tried Safari, Crome and all stall same
 
Last edited:
root@pve1:~# pveceph status
Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)')
command 'ceph -s' failed: exit code 1
root@pve1:~#

Did you re-fresh your browser
have tried Safari, Crome and all stall same
Anyone has a solution.
I feel so wasted buying three node same specs.
 
Node 1
16 x AMD Ryzen 7 5800U with Radeon Graphics
Kernel Version : Linux 6.8.8-4-pve (2024-07-26T11:15Z
Manager Version: pve-manager/8.2.4/faa83925c9641325
HDD1: 256GB M2 SSD
HDD2: 1TB SSD

Node 2
16 x AMD Ryzen 7 5800U with Radeon Graphics
Kernel Version : Linux 6.8.8-4-pve (2024-07-26T11:15Z
Manager Version: pve-manager/8.2.4/faa83925c9641325
HDD1: 256GB M2 SSD
HDD2: 1TB SSD

Node 3
16 x AMD Ryzen 7 5800U with Radeon Graphics
Kernel Version : Linux 6.8.8-4-pve (2024-07-26T11:15Z
Manager Version: pve-manager/8.2.4/faa83925c9641325
HDD1: 256GB M2 SSD
HDD2: 1TB SSD


Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Thu Aug 1 16:11:11 +08 2024 from xxx.xxx.xxx.xx on pts/1
root@pve1:~# pveceph status
command 'ceph -s' failed: got timeout
root@pve1:~#


Screenshot 2024-08-01 at 4.09.15 PM.png
Screenshot 2024-08-01 at 4.10.43 PM.png
 
Hello,

What is the output of

- `pvecm status`
- `cat /etc/ceph/ceph.conf`
 
root@pve1:~# pvecm status
Cluster information
-------------------
Name: Sync-Cluster
Config Version: 5
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Thu Aug 1 16:41:29 2024
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000001
Ring ID: 1.1ac
Quorate: Yes

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.201.50 (local)
0x00000002 1 192.168.201.60
0x00000003 1 192.168.201.70
root@pve1:~#
root@pve1:~# `cat /etc/ceph/ceph.conf`
-bash: [global]: command not found
root@pve1:~# cat /etc/ceph/ceph.conf
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 192.168.202.50/24
fsid = f290bf78-5b29-4109-8ae7-0d88e52785d2
mon_allow_pool_delete = true
mon_host = 192.168.201.50,192.168.201.60,192.168.201.70
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 192.168.202.50/24

[client]
keyring = /etc/pve/priv/$cluster.$name.keyring

root@pve1:~# cat /etc/ceph/ceph.conf
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 192.168.202.50/24
fsid = f290bf78-5b29-4109-8ae7-0d88e52785d2
mon_allow_pool_delete = true
mon_host = 192.168.201.50,192.168.201.60,192.168.201.70
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 192.168.202.50/24

[client]
keyring = /etc/pve/priv/$cluster.$name.keyringScreenshot 2024-08-01 at 4.43.56 PM.png
 
Last edited:
Both `public_network` and `cluster_network` are set to `192.168.202.X/24` while the MONs are at

Code:
192.168.201.50,192.168.201.60,192.168.201.70

(201 vs 202). Do you have any valuable data already in the Ceph cluster? If not I would suggest to destroy the MONs and create new ones using IPs which are part of the `public_network` subnet.
 
Could you please share the new config? Whats the state of ceph (`ceph -s`)? Are the mon services running `systemctl status ceph-mon@NODE_HOSTNAME.service` ?
 
Hello, same problem for me.
PVE 8.2.2 with 3 identical servers in a cluster.
Ceph installed on first node and the 2 others got time out.
Tried to purge config many times, reinstall by GUI or CLI with the same result.
Do someone find something?
 
Hello, same problem for me.
PVE 8.2.2 with 3 identical servers in a cluster.
Ceph installed on first node and the 2 others got time out.
Tried to purge config many times, reinstall by GUI or CLI with the same result.
Do someone find somethin
I give up troubleshooting, uninstalled Ceph
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!