[SOLVED] Problem with PVE 8 + CEPH + remote cluster node (rare setup)

Whatever · Aug 17, 2023

I had 3 nodes cluster with CEPH installed (several OSDs on each node) in network 10.63.210.0/24 for PVE (1Gb) and 10.10.10.0/24 for CEPH (10Gbe).

It was OK until I added 4-th node to PVE cluster only from another network 10.63.200.0/24 (no CEPH/OSD on that node). PVE cluster is happy, CEPH cluster is HEALTY and happy as well

But PVE GUI now drives crazy:

Code:

root@063-pve-04347:~# cat /etc/pve/ceph.conf
[global]
         auth_client_required = cephx
         auth_cluster_required = cephx
         auth_service_required = cephx
         cluster_network = 10.10.10.200/24
         fsid = 72f45d1a-6561-4419-ae03-ca6c5679563a
         mon_allow_pool_delete = true
         mon_host = 10.10.10.200 10.10.10.201 10.10.10.202
         ms_bind_ipv4 = true
         ms_bind_ipv6 = false
         osd_pool_default_min_size = 2
         osd_pool_default_size = 3
         public_network = 10.10.10.200/24

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring

[mon.063-pve-04347]
         public_addr = 10.10.10.200

[mon.063-pve-04369]
         public_addr = 10.10.10.201

[mon.063-pve-04370]
         public_addr = 10.10.10.202

Code:

root@063-pve-04347:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: 063-pve-04347
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.63.210.200
  }
  node {
    name: 063-pve-04355
    nodeid: 4
    quorum_votes: 1
    ring0_addr: 10.63.200.200
  }
  node {
    name: 063-pve-04369
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.63.210.201
  }
  node {
    name: 063-pve-04370
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 10.63.210.202
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: pve-cluster-063
  config_version: 4
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

Code:

root@063-pve-04347:~# pvecm status
Cluster information
-------------------
Name:             pve-cluster-063
Config Version:   4
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Thu Aug 17 20:37:55 2023
Quorum provider:  corosync_votequorum
Nodes:            4
Node ID:          0x00000001
Ring ID:          1.70
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      4
Quorum:           3 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.63.210.200 (local)
0x00000002          1 10.63.210.201
0x00000003          1 10.63.210.202
0x00000004          1 10.63.200.200

Any ideas what can be done?

bbgeek17 · Aug 17, 2023

While there may be assumption/bug in the GUI that all nodes should contain Ceph configuration, it seems the easy way to avoid "crazy in the GUI" is to manage the cluster Ceph configuration from one of the original 3 nodes that actually contain Ceph binaries, rather than from the new node that doesnt.
As you pointed out in the topic of the thread, the setup is rare and the cost of accommodating every rare setup variation in GUI is huge and impactical.
Think of it this way - if you were a CLI person would you be viewing/changing/adding Ceph configuration from this particular node?

P.S. Creating "rare" setup requires the admin to be keenly aware of them. You, for example, provided GUI screenshots taken when managing the cluster from new non-ceph node, yet the CLI output is from one of the original nodes.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Whatever · Aug 17, 2023

bbgeek17 said:
While there may be assumption/bug in the GUI that all nodes should contain Ceph configuration, it seems the easy way to avoid "crazy in the GUI" is to manage the cluster Ceph configuration from one of the original 3 nodes that actually contain Ceph binaries, rather than from the new node that doesnt.

Sure all the screens are made from the node WITH CEPH binary installed (CEPH cluster node)

bbgeek17 · Aug 17, 2023

Whatever said:
I had 3 nodes cluster with CEPH installed (several OSDs on each node) in network 10.63.210.0/24

Whatever said:
I added 4-th node to PVE cluster only from another network 10.63.200.0/24

Whatever said:
Sure all the screens are made from the node WITH CEPH binary installed (CEPH cluster node)

every screenshot shows 10.63.200.200 in the URL field, sans one where it was cut off.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Whatever · Aug 17, 2023

Holly cow!

My bad. You made my day. Thanks!

bbgeek17 · Aug 17, 2023

Whatever said:
Holly cow!

My bad. I made my day. Thanks!

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Search

Search

[SOLVED] Problem with PVE 8 + CEPH + remote cluster node (rare setup)

Whatever

Renowned Member

bbgeek17

Distinguished Member

Whatever

Renowned Member

bbgeek17

Distinguished Member

Whatever

Renowned Member

bbgeek17

Distinguished Member