Hyper converged Ceph cluster on Proxmox with external Ceph cluster allready conected

nejcsuhadolc

Active Member
Apr 17, 2019
10
0
41
55
Hi,

I'm trying to setup a local (Hyperconvergedd) cluster on a system that is already using external Ceph cluster.
When I clicked Install ceph on one of the nodes, this node started to experience problems with external ceph. I can not read files, but VMs still runs on external Ceph.

One of the errors I get is this one when trying to moove VM:


2020-09-20 03:23:14 starting migration of VM 118 to node 'prox-02' (10.20.0.102) 2020-09-20 03:23:18 found local disk 'local-zfs:vm-118-disk-0' (via storage) 2020-09-20 03:23:19 ERROR: Failed to sync data - rbd error: rbd: listing images failed: (2) No such file or directory 2020-09-20 03:23:19 aborting phase 1 - cleanup resources 2020-09-20 03:23:19 ERROR: migration aborted (duration 00:00:05): Failed to sync data - rbd error: rbd: listing images failed: (2) No such file or directory TASK ERROR: migration aborted

Also I can not configure the local Ceph, it seems to pick up the config of the external ceph.

Error in syslog:
Sep 20 02:54:17 prox-06 systemd[1]: Stopped Ceph cluster monitor daemon. Sep 20 02:57:07 prox-06 systemd[1]: Started Ceph cluster monitor daemon. Sep 20 02:57:33 prox-06 systemd[1]: Stopping Ceph cluster monitor daemon... Sep 20 02:57:33 prox-06 ceph-mon[8124]: 2020-09-20 02:57:33.444 7f34c33a7700 -1 received signal: Terminated from /sbin/init (PID: 1) UID: 0 Sep 20 02:57:33 prox-06 ceph-mon[8124]: 2020-09-20 02:57:33.444 7f34c33a7700 -1 mon.prox-06@0(leader) e1 *** Got Signal Terminated ***

Do you have any idea how to solve this?
 
ERROR: Failed to sync data - rbd error: rbd: listing images failed: (2) No such file or directory
Does the listing work from CLI, rbd -p <pool> ls?

Also I can not configure the local Ceph, it seems to pick up the config of the external ceph.
The local Ceph services need the /etc/ceph/ceph.conf, usually a link to /etc/pve/ceph.conf.
 
Hi Alwin


rbd returns an error:
rbd -p xenCeph ls -bash: /usr/bin/rbd: Input/output error

/etc/ceph/ceph.conf is allready a link to /etc/pve/ceph.conf.

Content of that is:
root@prox-06:~# cat /etc/pve/ceph.conf [global] auth_client_required = cephx auth_cluster_required = cephx auth_service_required = cephx fsid = 7b7357f8-a718-4f81-8f66-39aeb6cbf4dc # keyring = /etc/pve/priv/$cluster.$name.keyring keyring = /var/lib/ceph/mgr/ceph-$id/keyring mon_allow_pool_delete = true mon_host = 10.50.0.106 osd_journal_size = 5120 osd_pool_default_min_size = 2 osd_pool_default_size = 3 [osd] keyring = /var/lib/ceph/osd/ceph-$id/keyring [mon.prox-06] public_addr = 10.50.0.106
 
rbd -p xenCeph ls
That command alone will query the cluster found in /etc/ceph/ceph.conf. Did you run it on the Ceph cluster directly?
How does your /etc/pve/storage.cfg look like?
 
Hi,

I have no direct access to remote Ceph, so i can only run it on my servers. If I do, I get:

root@prox-05:~# rbd -p xenCeph ls unable to get monitor info from DNS SRV with service name: ceph-mon rbd: couldn't connect to the cluster! rbd: listing images failed: 2020-09-21 13:02:11.455 7f1c268ce0c0 -1 failed for service _ceph-mon._tcp(2) No such file or directory 2020-09-21 13:02:11.455 7f1c268ce0c0 -1 monclient: get_monmap_and_config cannot identify monitors to contact

if i run it on a working node (the one i haven't run pveceph on) i get:

root@prox-02:~# rbd -p xenCeph ls unable to get monitor info from DNS SRV with service name: ceph-mon no monitors specified to connect to. rbd: couldn't connect to the cluster! rbd: list: 2020-09-21 13:04:54.230309 7f54f701c0c0 -1 failed for service _ceph-mon._tcp (2) No such file or directory
 
In both outputs, the node can't connect to the Ceph cluster.

root@prox-06:~# cat /etc/pve/ceph.conf
For the local Ceph cluster, you will need to specify the cluster_network and public_network in the ceph.conf. Otherwise the Ceph daemons try to run on the IP found by the hostname.
mon_host = 10.50.0.106
Does this MON exist and can every node connect to it?

I have no direct access to remote Ceph, so i can only run it on my servers.
For running the CLI command to connect onto a external Ceph cluster, you will need to specify the MON address, username and keyring.
Code:
ceph -p <pool> -m <MON-IP> -n client.<username> --keyring /<path>/<to>/<file>.keyring --auth_supported cephx ls
 
For running the CLI command to connect onto a external Ceph cluster, you will need to specify the MON address, username and keyring.
Code:
ceph -p <pool> -m <MON-IP> -n client.<username> --keyring /<path>/<to>/<file>.keyring --auth_supported cephx ls
[/QUOTE]

above code seem to have some semantical problems, but this one lists content of rdb storage:

Code:
rbd list [pool]  -m 1.1.1.1 -k /etc/pve/priv/ceph/Ceph.keyring --user [username]
 
I found out this error also:

Code:
root@ve-02:~# service pvestatd status
● pvestatd.service - PVE Status Daemon
   Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2020-09-28 23:45:37 CEST; 29min ago
  Process: 182719 ExecStart=/usr/bin/pvestatd start (code=exited, status=0/SUCCESS)
 Main PID: 182728 (pvestatd)
    Tasks: 1 (limit: 9830)
   Memory: 103.4M
   CGroup: /system.slice/pvestatd.service
           └─182728 pvestatd

Sep 29 00:14:43 ve-02 pvestatd[182728]: Use of uninitialized value $used in addition (+) at /usr/sha
Sep 29 00:14:43 ve-02 pvestatd[182728]: Use of uninitialized value $avail in int at /usr/share/perl5
Sep 29 00:14:43 ve-02 pvestatd[182728]: Use of uninitialized value $used in int at /usr/share/perl5/
Sep 29 00:14:52 ve-02 pvestatd[182728]: Use of uninitialized value $free in addition (+) at /usr/sha
Sep 29 00:14:52 ve-02 pvestatd[182728]: Use of uninitialized value $used in addition (+) at /usr/sha
Sep 29 00:14:52 ve-02 pvestatd[182728]: Use of uninitialized value $avail in int at /usr/share/perl5
Sep 29 00:14:52 ve-02 pvestatd[182728]: Use of uninitialized value $used in int at /usr/share/perl5/
Sep 29 00:15:24 ve-02 pvestatd[182728]: status update time (32.127 seconds)
Sep 29 00:15:29 ve-02 pvestatd[182728]: got timeout
Sep 29 00:15:30 ve-02 pvestatd[182728]: status update time (5.479 seconds)
 
above code seem to have some semantical problems, but this one lists content of rdb storage:
Which are?

rbd list [pool] -m 1.1.1.1 -k /etc/pve/priv/ceph/Ceph.keyring --user [username]
What was the output?

Sep 29 00:14:43 ve-02 pvestatd[182728]: Use of uninitialized value $used in addition (+) at /usr/sha
It's a side-effect. The values for the usage calculation are uninitialized but shouldn't.