PROXMOX Ceph Cluster node1 unable to access ceph storage

gkouros50

New Member
Mar 3, 2025
4
0
1
i've configured Ceph reef on 3 nodes and connected that cluster to an smb share with VM backups on it. now when i attempt to restore VMs to the ceph storage from the SMB share on nodes Node2 and Node3 i have no issue but Node1 the manager of the Ceph cluster i get this issue
1741039020072.png
 
i've configured Ceph reef on 3 nodes and connected that cluster to an smb share with VM backups on it. now when i attempt to restore VMs to the ceph storage from the SMB share on nodes Node2 and Node3 i have no issue but Node1 the manager of the Ceph cluster i get this issue
View attachment 83214
I would like to add that i have encountered ssh errors on my nodes as of late when i try and build a cluster. it seems that always one of my nodes will be unable to reach the console of the other nodes. in my case Node3 (when signed in via that node) is unable to connect to Node1 console. I have tried just about every fix i have read and non of them have solved it. not sure if it is relevant to this issue

1741039318543.png
 
Hi,

first of all, you are in the wrong Sub-Forum, you are here in Proxmox Datacenter Manager (PDM) you need Proxmox VE: Installation and configuration.

For SSH, have you updated your Hosts file? If not please enter all of your Nodes there on every node.

Format is IP FQDN Hostname see example:
Code:
127.0.0.1 localhost.localdomain localhost
10.22.42.9 PMX9.intern.thomas-krenn.com PMX9
10.22.42.8 PMX8.intern.thomas-krenn.com PMX8
10.22.42.7 PMX7.intern.thomas-krenn.com PMX7

Then you need to add this on all 3 for your Nodes.
 
I moved the thread.

Can node1 access the RBD pool via the web UI without any errors? Can you post some more infos? For example:
Code:
ceph -s
cat /etc/pve/ceph.conf
ceph osd pool ls
pveceph pool ls --noborder

On node 1.

Please paste the outputs inside [code][/code] tags (or use the </> button at the top of the editor. No screenshots please as those are hard to read and copy from ;)
 
I am Very new to this forum my apologies. @aaron here is the out put of those commands


Code:
root@node1:~# ceph -s
  cluster:
    id:     ad915e92-66b5-4706-b0d7-5f2fe848f164
    health: HEALTH_WARN
            4 mgr modules have recently crashed
 
  services:
    mon: 3 daemons, quorum node1,node2,node3 (age 16h)
    mgr: node2(active, since 16h), standbys: node1
    osd: 3 osds: 3 up (since 16h), 3 in (since 17h)
 
  data:
    pools:   2 pools, 33 pgs
    objects: 15.95k objects, 62 GiB
    usage:   187 GiB used, 22 TiB / 22 TiB avail
    pgs:     33 active+clean
 
  io:
    client:   0 B/s rd, 43 KiB/s wr, 0 op/s rd, 2 op/s wr
 
root@node1:~# cat /etc/pve/ceph.conf
[global]
        auth_client_required = cephx
        auth_cluster_required = cephx
        auth_service_required = cephx
        cluster_network = 10.101.11.200/24
        fsid = ad915e92-66b5-4706-b0d7-5f2fe848f164
        mon_allow_pool_delete = true
        mon_host = 10.101.11.200 10.101.11.201 10.101.11.202
        ms_bind_ipv4 = true
        ms_bind_ipv6 = false
        osd_pool_default_min_size = 2
        osd_pool_default_size = 3
        public_network = 10.101.11.200/24

[client]
        keyring = /etc/pve/priv/$cluster.$name.keyring

[client.crash]
        keyring = /etc/pve/ceph/$cluster.$name.keyring

[mon.node1]
        public_addr = 10.101.11.200

[mon.node2]
        public_addr = 10.101.11.201

[mon.node3]
        public_addr = 10.101.11.202

root@node1:~# ceph osd pool ls
.mgr
cephpool
root@node1:~# pveceph pool ls --noborder
Name     Size Min Size PG Num min. PG Num Optimal PG Num PG Autoscale Mode PG Autoscale Target Size PG Autoscale Target Ratio Crush Rule Name               %-Used Used
.mgr        3        2      1           1              1 on                                                                   replicated_rule 1.22319960382811e-07 2764800
cephpool    3        2     32                         32 on                                                                   replicated_rule  0.00854615960270166 194834045586
root@node1:~#





@smueller To clarify on all nodes i should add the other nodes in the host file?


EX:
current host file of node1
Code:
127.0.0.1 localhost.localdomain localhost
10.101.11.200 node1.sdr.net node1

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

updated file

Code:
127.0.0.1 localhost.localdomain localhost
10.101.11.200 node1.sdr.net node1
10.101.11.201 node2.sdr.net node2
10.101.11.202 node3.sdr.net node3
# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
 
All that looks good. Does the UI show any errors if you go to the `cephpool` storage and there to the `VM Disk` submenu?

Any differences between nodes?
Or to do it on the CLI: pvesm list cephpool?
 
ah yes! @aaron when i go to cephpool--> vm disk i get the error
Code:
rbd error: rbd: listing images failed: (108) Cannot send after transport endpoint shutdown (500)
but only on node1