CephFS MDS Failover

Mihai

Renowned Member
Dec 22, 2015
104
8
83
39
Hello everyone,

There is a fully functional ceph fs running on a 3 node cluster.

It was created very simply, here is the conf related to mds:

Code:
[mds]
         keyring = /var/lib/ceph/mds/54da8900-a9db-4a57-923c-a62dbec8c82a/keyring
         mds data = /var/lib/ceph/mds/54da8900-a9db-4a57-923c-a62dbec8c82a

[mds.VMHost2]
         host = VMHost2

[mds.VMHost4]
         host = VMHost4

[mds.VMHost3]
         host = VMHost3

And this is what ceph mds stat shows:

Code:
cephfs-1/1/1 up  {0=54da8900-a9db-4a57-923c-a62dbec8c82a=up:active}

Now for those who are experienced in this kind of thing, you can probably guess what my problem is. When one of my nodes goes down, so does the ceph fs.

I am not necessarily interested in setting all MDS to be active, but at least get failover.

However, the mds is not failing over.

What settings do I need to add to make this work?

I have tried to read up on ceph fs regarding this issue but I cannot understand =(
 
strange, mds failover is automatic for me without any tuning

my ceph -w output:

"mds: cephfs-1/1/1 up {0=myhost1.lan=up:active}, 2 up:standby"

Note the 2 standby nodes.

are your sure that all mds daemons are running on your cluster ?
 
Oh wow... okay... I am only running 1 mds, the other 2 failed to start, and I have no idea why.

This is the status:

Code:
ceph-mds@54da8900-a9db-4a57-923c-a62dbec8c82a.service - Ceph metadata server daemon
   Loaded: loaded (/lib/systemd/system/ceph-mds@.service; enabled; vendor preset: enabled)
  Drop-In: /lib/systemd/system/ceph-mds@.service.d
           └─ceph-after-pve-cluster.conf
   Active: failed (Result: exit-code) since Mon 2017-12-18 14:10:05 CST; 1 weeks 1 days ago
 Main PID: 4228 (code=exited, status=1/FAILURE)

Dec 18 14:10:04 VMHost3 systemd[1]: ceph-mds@54da8900-a9db-4a57-923c-a62dbec8c82a.service: Failed with result 'exit-code'.
Dec 18 14:10:05 VMHost3 systemd[1]: ceph-mds@54da8900-a9db-4a57-923c-a62dbec8c82a.service: Service hold-off time over, scheduling restart.
Dec 18 14:10:05 VMHost3 systemd[1]: Stopped Ceph metadata server daemon.
Dec 18 14:10:05 VMHost3 systemd[1]: ceph-mds@54da8900-a9db-4a57-923c-a62dbec8c82a.service: Start request repeated too quickly.
Dec 18 14:10:05 VMHost3 systemd[1]: Failed to start Ceph metadata server daemon.
Dec 18 14:10:05 VMHost3 systemd[1]: ceph-mds@54da8900-a9db-4a57-923c-a62dbec8c82a.service: Unit entered failed state.
Dec 18 14:10:05 VMHost3 systemd[1]: ceph-mds@54da8900-a9db-4a57-923c-a62dbec8c82a.service: Failed with result 'exit-code'.


Any ideas what I can do? Can I somehow add mds? I thought they would be atuomatically created based on that config I made.

Okay one more thing, it almost looks like every host is trying to star the same mds, but not sure.
 
Last edited:
strange, mds failover is automatic for me without any tuning

my ceph -w output:

"mds: cephfs-1/1/1 up {0=myhost1.lan=up:active}, 2 up:standby"

Note the 2 standby nodes.

are your sure that all mds daemons are running on your cluster ?

Would you be willing to share the config file with the mds section so I can compare?
 
Ok I realized that I had not created the proper metadata servers that I wanted. I must have only created a single metadata server, which has the same id as the cluster, so I'll have to remove that one and run on the ones named properly.

I used this forum post as an example to create them. I must warn that there were a few other things I may have had to do.

I originally didn't know what was down, but I found this useful command:

sudo systemctl status ceph\*.service ceph\*.target

Thanks so much for your help.

Now I have to figure out how to remove that mds that starts with numbers....
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!