[SOLVED] Problem with PVE 8 + CEPH + remote cluster node (rare setup)

Whatever

Renowned Member
Nov 19, 2012
393
63
93
I had 3 nodes cluster with CEPH installed (several OSDs on each node) in network 10.63.210.0/24 for PVE (1Gb) and 10.10.10.0/24 for CEPH (10Gbe).

It was OK until I added 4-th node to PVE cluster only from another network 10.63.200.0/24 (no CEPH/OSD on that node). PVE cluster is happy, CEPH cluster is HEALTY and happy as well

But PVE GUI now drives crazy:
1692293577496.png

1692293658898.png

1692293699466.png


1692293760977.png


Code:
root@063-pve-04347:~# cat /etc/pve/ceph.conf
[global]
         auth_client_required = cephx
         auth_cluster_required = cephx
         auth_service_required = cephx
         cluster_network = 10.10.10.200/24
         fsid = 72f45d1a-6561-4419-ae03-ca6c5679563a
         mon_allow_pool_delete = true
         mon_host = 10.10.10.200 10.10.10.201 10.10.10.202
         ms_bind_ipv4 = true
         ms_bind_ipv6 = false
         osd_pool_default_min_size = 2
         osd_pool_default_size = 3
         public_network = 10.10.10.200/24

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring

[mon.063-pve-04347]
         public_addr = 10.10.10.200

[mon.063-pve-04369]
         public_addr = 10.10.10.201

[mon.063-pve-04370]
         public_addr = 10.10.10.202

Code:
root@063-pve-04347:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: 063-pve-04347
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.63.210.200
  }
  node {
    name: 063-pve-04355
    nodeid: 4
    quorum_votes: 1
    ring0_addr: 10.63.200.200
  }
  node {
    name: 063-pve-04369
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.63.210.201
  }
  node {
    name: 063-pve-04370
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 10.63.210.202
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: pve-cluster-063
  config_version: 4
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

Code:
root@063-pve-04347:~# pvecm status
Cluster information
-------------------
Name:             pve-cluster-063
Config Version:   4
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Thu Aug 17 20:37:55 2023
Quorum provider:  corosync_votequorum
Nodes:            4
Node ID:          0x00000001
Ring ID:          1.70
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   4
Highest expected: 4
Total votes:      4
Quorum:           3 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.63.210.200 (local)
0x00000002          1 10.63.210.201
0x00000003          1 10.63.210.202
0x00000004          1 10.63.200.200

Any ideas what can be done?
 
Last edited:
While there may be assumption/bug in the GUI that all nodes should contain Ceph configuration, it seems the easy way to avoid "crazy in the GUI" is to manage the cluster Ceph configuration from one of the original 3 nodes that actually contain Ceph binaries, rather than from the new node that doesnt.
As you pointed out in the topic of the thread, the setup is rare and the cost of accommodating every rare setup variation in GUI is huge and impactical.
Think of it this way - if you were a CLI person would you be viewing/changing/adding Ceph configuration from this particular node?

P.S. Creating "rare" setup requires the admin to be keenly aware of them. You, for example, provided GUI screenshots taken when managing the cluster from new non-ceph node, yet the CLI output is from one of the original nodes.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
  • Like
Reactions: Whatever
While there may be assumption/bug in the GUI that all nodes should contain Ceph configuration, it seems the easy way to avoid "crazy in the GUI" is to manage the cluster Ceph configuration from one of the original 3 nodes that actually contain Ceph binaries, rather than from the new node that doesnt.
Sure all the screens are made from the node WITH CEPH binary installed (CEPH cluster node)
 
I had 3 nodes cluster with CEPH installed (several OSDs on each node) in network 10.63.210.0/24
I added 4-th node to PVE cluster only from another network 10.63.200.0/24
Sure all the screens are made from the node WITH CEPH binary installed (CEPH cluster node)
every screenshot shows 10.63.200.200 in the URL field, sans one where it was cut off.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: Whatever

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!