[SOLVED] Speeding up the failover

RobertCyberS · Feb 7, 2024

Hello,

is there any way of speeding up the failover? I have created working ceph so if 1 node fails it goes to another. I would need high reliability and low downtime so i would like to know if there is any way of speeding it up?

The cluster is made with 3 ZimaBoards nodes.

LnxBil · Feb 7, 2024

How "slow" is it up to now?

RobertCyberS · Feb 7, 2024

LnxBil said:
How "slow" is it up to now?

About 2:30 minutes.

czechsys · Feb 7, 2024

https://docs.ceph.com/en/quincy/rados/configuration/mon-osd-interaction/

fabian · Feb 7, 2024

not really, we need to have a certain grace period to ensure the failed node's watchdog has expired and triggered the fence... shortening that grace period would require shortening the watchdog timeout, which would make it more trigger happy which is not desired.

RobertCyberS · Feb 15, 2024

fabian said:
not really, we need to have a certain grace period to ensure the failed node's watchdog has expired and triggered the fence... shortening that grace period would require shortening the watchdog timeout, which would make it more trigger happy which is not desired.

How do i do it anyways? I will just learn what it does and how it does it.

fabian · Feb 15, 2024

you need to adapt the places in the code where those values are currently hard coded ( see git.proxmox.com / https://pve.proxmox.com/wiki/Developer_Documentation ).

esi_y · Feb 15, 2024

fabian said:
you need to adapt the places in the code where those values are currently hard coded ( see git.proxmox.com / https://pve.proxmox.com/wiki/Developer_Documentation ).

Shouldn't there be guidelines adhered to, similar to coding style, where e.g. such values are always consts obtained through a common helper at the least? Ideally something that has a default if not redefined in a config. Even if one is pointed to dev docs, it's not a hide and seek (where elsewhere it might be relied upon). It's so much easier to have in a helper, it's really something more important than coding style to me.

RobertCyberS · Feb 15, 2024

fabian said:
you need to adapt the places in the code where those values are currently hard coded ( see git.proxmox.com / https://pve.proxmox.com/wiki/Developer_Documentation ).

Would this also work for reef? https://docs.ceph.com/en/quincy/rados/configuration/mon-osd-interaction/

fabian · Feb 15, 2024

I am not going to give more concrete pointers here, if you understand the implications well enough to play around with it, you will also find the constants, if not, you should definitely not tinker with them anyway.

HA failover in PVE is not related to Ceph heartbeats at all.

RobertCyberS · Feb 15, 2024

fabian said:
I am not going to give more concrete pointers here, if you understand the implications well enough to play around with it, you will also find the constants, if not, you should definitely not tinker with them anyway.

HA failover in PVE is not related to Ceph heartbeats at all.

Ok, thanks for help will play with settings a bit.

esi_y · Feb 15, 2024

fabian said:
I am not going to give more concrete pointers here

Just want to say it was my - no the OP's - (unsolicited general) question inbetween. I hope I have not made it worse for the OP to get a bit of extra help.

(Can't believe CEPH has it hardcoded as well.)

esi_y · Feb 15, 2024

RobertCyberS said:
Would this also work for reef? https://docs.ceph.com/en/quincy/rados/configuration/mon-osd-interaction/

https://docs.ceph.com/en/reef/rados/configuration/mon-osd-interaction/

Search

Search

[SOLVED] Speeding up the failover

RobertCyberS

New Member

LnxBil

Distinguished Member

RobertCyberS

New Member

czechsys

Renowned Member

fabian

Proxmox Staff Member

RobertCyberS

New Member

fabian

Proxmox Staff Member

esi_y

Renowned Member

RobertCyberS

New Member

fabian

Proxmox Staff Member

RobertCyberS

New Member

esi_y

Renowned Member

esi_y

Renowned Member

We value your privacy