Proxmox Ceph Networking

ScottDavis

New Member
May 23, 2024
26
4
3
I have configured an OSPFv6 high speed ring network for three nodes.

I now want ceph to use the ipv6 network, rather than the existing management network. How do I change this for each node?
 
Thank you. Modified the conf file and also removed public as was causing issue with the monitors with multiple public networks.

Just need to sort a permission issue now on one of the other nodes.
 
  • Like
Reactions: weehooey-bh
Looks like destroying the monitors after doing the final step of changing the monitor IP was a bad idea. The aim to create new monitors using the new IPv6 cluster network.

I now get the error 'Could not connect to ceph cluster despite configured monitors (500)' ... so clearly I've broke something somewhere.

The amount of time spent moving the cluster to a new network I could have rebuilt this cluster!
 
Last edited:
Is there are guide around to reinstall / clean up CEPH to save me re-installing from scratch?
Hey Scott

I do not know of any guide for re-installation. There might be. This might give you what you need to remove Ceph: https://forum.proxmox.com/threads/removing-ceph-completely.62818/

If you are just setting up your cluster, starting fresh is a solid way to go.

If you want to try to recover (even just for learning), please post your current /etc/pve/ceph.conf and the output from ceph -s.
 
  • Like
Reactions: ScottDavis
Hey Scott

I do not know of any guide for re-installation. There might be. This might give you what you need to remove Ceph: https://forum.proxmox.com/threads/removing-ceph-completely.62818/

If you are just setting up your cluster, starting fresh is a solid way to go.

If you want to try to recover (even just for learning), please post your current /etc/pve/ceph.conf and the output from ceph -s.
Would be good to learn as to why for sure ... :)

ceph_conf.png

It is a test cluster so its not an issue breaking things, but would certainly like to understand why and try fix if I can.

No output from ceph -s ... I guess that isn't a good thing. :)
 
Last edited:
Thanks for posting your ceph.conf.

I think you will have trouble routing fe80::/64. Since I am a fan of IPv6, let's keep it IPv6.

Please change your cluster network to something in fd00::/8 (the usable half of fc00::/7). It should be a random prefix, but something like what you are doing fd00:baaa:aaaa:aaad::/64 will be fine for testing.

Please change that on all three nodes' ceph.conf and give each of them an address in the new cluster network. Then, reboot each node.

After rebooting, confirm that you can ping from/to each node using its public and cluster network IP addresses.

Then report ceph -s output.
 
Last edited:
Thanks for posting your ceph.conf.

I think you will have trouble routing fe80::/64. Since I am a fan of IPv6, let's keep it IPv6.

Please change your cluster network to something in fd00::/8 (the usable half of fc00::/7). It should be a random prefix, but something like what you are doing fd00:baaa:aaaa:aaad::/64 will be fine for testing.

Please change that on all three nodes' ceph.conf and give each of them an address in the new cluster network. Then, reboot each node.

After rebooting, confirm that you can ping from/to each node using its public and cluster network IP addresses.

Then report ceph -s output.

Thank you. That is helpful.

Just reviewed the frr config which is as shown:

Code:
#NODE1
#Enable IPv6 forwarding since we are using IPv6
ipv6 forwarding

#Add our router's private address on lo (loopback)
#This address is a single address (/128) out of the subnet (/64)
#of our 'cluster' network, of which routes to individial /128s are
#distributed using OSPF
!
interface lo
        ipv6 address fd69:beef:cafe::551/128
        ipv6 ospf6 area 0.0.0.0
        ipv6 ospf6 passive
#Backup links via primary gigabit link (vmbr0)
#Cost for 1G assumptions (100 gig reference / 1 gig = 100 cost)
!
interface vmbr0
        ipv6 ospf6 area 0.0.0.0
        ipv6 ospf6 network broadcast
        ipv6 ospf6 cost 100
#Two p2p links ens19 and ens20
#Since we are using IPv6 we do not need to assign
#addresses on these links, relying on link-local addresses
#Cost for 10G assumptions (100 gig reference / 10 gig = 10 cost)
#Feel free to edit your cost as appropriate
#You can tweak these cost values to change the traffic flow
!
interface eno3
        ipv6 ospf6 area 0.0.0.0
        ipv6 ospf6 network point-to-point
        ipv6 ospf6 cost 10
!
interface eno4
        ipv6 ospf6 area 0.0.0.0
        ipv6 ospf6 network point-to-point
        ipv6 ospf6 cost 10
#OSPF router settings (unique router ID required for each router)
!
router ospf6
        ospf6 router-id 0.5.5.1
        redistribute connected
        auto-cost reference-bandwidth 100000

Three ring mesh nodes using eno3/eno4 on each node for ceph:
Code:
fd69:beef:cafe::551/128
fd69:beef:cafe::552/128
fd69:beef:cafe::553/128

Private network ipv6 address
Code:
fd69:beef:cafe::/64

Management ipv4 using eno1/eno2 on each node:
Code:
 192.168.41.0/24


So I have confirmed all nodes can ping each other over IPv6 from management IPv4 for the mesh ring network I want to use for the cluster.

My intention is to simply use eno3 and eno4 for the cluster which is easy to do if the cluster was not already built saying that I'd not see the network via the GUI as the interfaces won't have IP addresses being IPv6 so I won't be able to select them on the ceph configuration.

What is best practise here to get ceph to use the ring IPv6 network?
 
Last edited:
I think it is time to go back to the drawing board with the ceph network. Anyone used open fabric as per below?

https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server
Hey Scott

I have not used a full mesh for Ceph. They have always been switched (physical external) networks. But, if the networking is good, Ceph should not care.

Now that you have connectivity on the Ceph networks, does ceph -s give you any output? Or are all your monitors gone?
 
Hey Scott

I have not used a full mesh for Ceph. They have always been switched (physical external) networks. But, if the networking is good, Ceph should not care.

Now that you have connectivity on the Ceph networks, does ceph -s give you any output? Or are all your monitors gone?

Still have an issue with primary node and timeouts, but getting there!

ceph -s

Code:
root@pmox01-scan-hq:~# ceph -s
  cluster:
    id:     7363a620-944a-4321-ad70-d12dd688bac7
    health: HEALTH_WARN
            clock skew detected on mon.pmox03-scan-hq, mon.pmox01-scan-hq
            Degraded data redundancy: 128 pgs undersized
            17304 slow ops, oldest one blocked for 80760 sec, mon.pmox01-scan-hq has slow ops
 
  services:
    mon: 3 daemons, quorum pmox02-scan-hq,pmox03-scan-hq,pmox01-scan-hq (age 10h)
    mgr: pmox02-scan-hq(active, since 23h), standbys: pmox03-scan-hq
    osd: 4 osds: 4 up (since 22h), 4 in (since 22h); 1 remapped pgs
 
  data:
    pools:   2 pools, 129 pgs
    objects: 2 objects, 1.0 MiB
    usage:   110 MiB used, 7.0 TiB / 7.0 TiB avail
    pgs:     2/6 objects misplaced (33.333%)
             128 active+undersized
             1   active+clean+remapped
 
  • Like
Reactions: weehooey-bh
Still have an issue with primary node and timeouts, but getting there!

ceph -s

Code:
root@pmox01-scan-hq:~# ceph -s
  cluster:
    id:     7363a620-944a-4321-ad70-d12dd688bac7
    health: HEALTH_WARN
            clock skew detected on mon.pmox03-scan-hq, mon.pmox01-scan-hq
            Degraded data redundancy: 128 pgs undersized
            17304 slow ops, oldest one blocked for 80760 sec, mon.pmox01-scan-hq has slow ops
 
  services:
    mon: 3 daemons, quorum pmox02-scan-hq,pmox03-scan-hq,pmox01-scan-hq (age 10h)
    mgr: pmox02-scan-hq(active, since 23h), standbys: pmox03-scan-hq
    osd: 4 osds: 4 up (since 22h), 4 in (since 22h); 1 remapped pgs
 
  data:
    pools:   2 pools, 129 pgs
    objects: 2 objects, 1.0 MiB
    usage:   110 MiB used, 7.0 TiB / 7.0 TiB avail
    pgs:     2/6 objects misplaced (33.333%)
             128 active+undersized
             1   active+clean+remapped
Hey Scott

You will often see a clock skew after a reboot. If you are running PVE 7.x or earlier, check to see if you are running chrony it is much better than the older NTP package.

Did you have these powered off for a bit? If you did, leave them for a bit to see if they get themselves straightened out.
 
Hey Scott

You will often see a clock skew after a reboot. If you are running PVE 7.x or earlier, check to see if you are running chrony it is much better than the older NTP package.

Did you have these powered off for a bit? If you did, leave them for a bit to see if they get themselves straightened out.
I'm running 8.2.2 :)
 
Cool.

Did you have these powered off for a bit? Did ceph -s change after being online for a bit?

Where are you at now?
Nope, all been online.

Code:
root@pmox03-scan-hq:~# ceph -s
  cluster:
    id:     7363a620-944a-4321-ad70-d12dd688bac7
    health: HEALTH_WARN
            clock skew detected on mon.pmox01-scan-hq
            1/3 mons down, quorum pmox03-scan-hq,pmox01-scan-hq
            Degraded data redundancy: 2/6 objects degraded (33.333%), 1 pg degraded, 74 pgs undersized
            30053 slow ops, oldest one blocked for 89745 sec, mon.pmox01-scan-hq has slow ops
 
  services:
    mon: 3 daemons, quorum pmox03-scan-hq,pmox01-scan-hq (age 47m), out of quorum: pmox02-scan-hq
    mgr: pmox03-scan-hq(active, since 49m), standbys: pmox02-scan-hq
    osd: 4 osds: 3 up (since 17m), 3 in (since 7m); 56 remapped pgs
 
  data:
    pools:   2 pools, 129 pgs
    objects: 2 objects, 2.0 MiB
    usage:   90 MiB used, 5.2 TiB / 5.2 TiB avail
    pgs:     2/6 objects degraded (33.333%)
             73 active+undersized
             49 active+clean+remapped
             6  active+clean
             1  active+undersized+degraded

Still not great. :(
 
Nope, all been online.

Code:
root@pmox03-scan-hq:~# ceph -s
  cluster:
    id:     7363a620-944a-4321-ad70-d12dd688bac7
    health: HEALTH_WARN
            clock skew detected on mon.pmox01-scan-hq
            1/3 mons down, quorum pmox03-scan-hq,pmox01-scan-hq
            Degraded data redundancy: 2/6 objects degraded (33.333%), 1 pg degraded, 74 pgs undersized
            30053 slow ops, oldest one blocked for 89745 sec, mon.pmox01-scan-hq has slow ops
 
  services:
    mon: 3 daemons, quorum pmox03-scan-hq,pmox01-scan-hq (age 47m), out of quorum: pmox02-scan-hq
    mgr: pmox03-scan-hq(active, since 49m), standbys: pmox02-scan-hq
    osd: 4 osds: 3 up (since 17m), 3 in (since 7m); 56 remapped pgs
 
  data:
    pools:   2 pools, 129 pgs
    objects: 2 objects, 2.0 MiB
    usage:   90 MiB used, 5.2 TiB / 5.2 TiB avail
    pgs:     2/6 objects degraded (33.333%)
             73 active+undersized
             49 active+clean+remapped
             6  active+clean
             1  active+undersized+degraded

Still not great. :(
Scott

Looks like something changed. Is pmox02-scan-hq offline?

Please post the current version of /etc/pve/ceph.conf

Please also post the output of these commands:
  • ceph mon stat
  • ceph config show mon.pmox03
  • ceph config show mon.pmox01-scan-hq
  • ceph config show mon.pmox02-scan-hq
  • ceph config show mon.pmox03-scan-hq
  • ceph df
What physical NICs do you have on each node that are in use?
 
Scott

Looks like something changed. Is pmox02-scan-hq offline?

Please post the current version of /etc/pve/ceph.conf

Please also post the output of these commands:
  • ceph mon stat
  • ceph config show mon.pmox03
  • ceph config show mon.pmox01-scan-hq
  • ceph config show mon.pmox02-scan-hq
  • ceph config show mon.pmox03-scan-hq
  • ceph df
What physical NICs do you have on each node that are in use?

pmox02 is not offline. Pings, etc. fine.

Code:
root@pmox02-scan-hq:~# ceph mon stat
e5: 3 mons at {pmox01-scan-hq=[v2:10.15.15.10:3300/0,v1:10.15.15.10:6789/0],pmox02-scan-hq=[v2:10.15.15.20:3300/0,v1:10.15.15.20:6789/0],pmox03-scan-hq=[v2:10.15.15.30:3300/0,v1:10.15.15.30:6789/0]} removed_ranks: {} disallowed_leaders: {}, election epoch 266, leader 1 pmox03-scan-hq, quorum 1,2 pmox03-scan-hq,pmox01-scan-hq

Code:
root@pmox01-scan-hq:~# ceph config show mon.pmox01-scan-hq
NAME                                   VALUE                                SOURCE    OVERRIDES  IGNORES
auth_allow_insecure_global_id_reclaim  false                                mon                        
auth_client_required                   cephx                                file                      
auth_cluster_required                  cephx                                file                      
auth_service_required                  cephx                                file                      
cluster_network                        10.15.15.10/24                       file                      
daemonize                              false                                override                  
keyring                                $mon_data/keyring                    default                    
leveldb_block_size                     65536                                default                    
leveldb_cache_size                     536870912                            default                    
leveldb_compression                    false                                default                    
leveldb_log                                                                 default                    
leveldb_write_buffer_size              33554432                             default                    
mon_allow_pool_delete                  true                                 file                      
mon_host                               10.15.15.20 10.15.15.30 10.15.15.10  file                      
ms_bind_ipv4                           true                                 file                      
ms_bind_ipv6                           false                                file                      
no_config_file                         false                                override                  
osd_pool_default_min_size              2                                    file                      
osd_pool_default_size                  3                                    file                      
public_addr                            v2:10.15.15.10:0/0                   file                      
public_network                         10.15.15.10/24                       file                      
rbd_default_features                   61                                   default                    
rbd_qos_exclude_ops                    0                                    default                    
setgroup                               ceph                                 cmdline                    
setuser                                ceph                                 cmdline


No output for pmox02 monitor.

Code:
root@pmox02-scan-hq:~# ceph config show mon.pmox03-scan-hq
NAME                                   VALUE                                SOURCE    OVERRIDES  IGNORES
auth_allow_insecure_global_id_reclaim  false                                mon                        
auth_client_required                   cephx                                file                      
auth_cluster_required                  cephx                                file                      
auth_service_required                  cephx                                file                      
cluster_network                        10.15.15.10/24                       file                      
daemonize                              false                                override                  
keyring                                $mon_data/keyring                    default                    
leveldb_block_size                     65536                                default                    
leveldb_cache_size                     536870912                            default                    
leveldb_compression                    false                                default                    
leveldb_log                                                                 default                    
leveldb_write_buffer_size              33554432                             default                    
mon_allow_pool_delete                  true                                 file                      
mon_host                               10.15.15.20 10.15.15.30 10.15.15.10  file                      
ms_bind_ipv4                           true                                 file                      
ms_bind_ipv6                           false                                file                      
no_config_file                         false                                override                  
osd_pool_default_min_size              2                                    file                      
osd_pool_default_size                  3                                    file                      
public_addr                            v2:10.15.15.30:0/0                   file                      
public_network                         10.15.15.10/24                       file                      
rbd_default_features                   61                                   default                    
rbd_qos_exclude_ops                    0                                    default                    
setgroup                               ceph                                 cmdline                    
setuser                                ceph                                 cmdline

Code:
--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
ssd    7.0 TiB  7.0 TiB  118 MiB   118 MiB          0
TOTAL  7.0 TiB  7.0 TiB  118 MiB   118 MiB          0
 
--- POOLS ---
POOL  ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
.mgr   1    1  2.3 MiB        2  6.8 MiB      0    2.2 TiB
ceph   2  128      0 B        0      0 B      0    2.2 TiB


NICs are all 1GBe for both cluster and management. (This is a test cluster)
 
Last edited:
Without looking at your current /etc/pve/ceph.conf It would appear you need to remove the monitor: mon.pmox02-scan-hq

Here is a guide on how to do it manually:

https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/

After you remove that monitor, check Ceph's health and make sure the monitor is not showing up anywhere. Then, re-add a monitor on 02.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!