[SOLVED] Speeding up the failover

RobertCyberS

New Member
Feb 7, 2024
9
0
1
Hello,

is there any way of speeding up the failover? I have created working ceph so if 1 node fails it goes to another. I would need high reliability and low downtime so i would like to know if there is any way of speeding it up?

The cluster is made with 3 ZimaBoards nodes.
 
Last edited:
not really, we need to have a certain grace period to ensure the failed node's watchdog has expired and triggered the fence... shortening that grace period would require shortening the watchdog timeout, which would make it more trigger happy which is not desired.
 
not really, we need to have a certain grace period to ensure the failed node's watchdog has expired and triggered the fence... shortening that grace period would require shortening the watchdog timeout, which would make it more trigger happy which is not desired.
How do i do it anyways? I will just learn what it does and how it does it.
 
you need to adapt the places in the code where those values are currently hard coded ( see git.proxmox.com / https://pve.proxmox.com/wiki/Developer_Documentation ).

Shouldn't there be guidelines adhered to, similar to coding style, where e.g. such values are always consts obtained through a common helper at the least? Ideally something that has a default if not redefined in a config. Even if one is pointed to dev docs, it's not a hide and seek (where elsewhere it might be relied upon). It's so much easier to have in a helper, it's really something more important than coding style to me.
 
I am not going to give more concrete pointers here, if you understand the implications well enough to play around with it, you will also find the constants, if not, you should definitely not tinker with them anyway.

HA failover in PVE is not related to Ceph heartbeats at all.
 
I am not going to give more concrete pointers here, if you understand the implications well enough to play around with it, you will also find the constants, if not, you should definitely not tinker with them anyway.

HA failover in PVE is not related to Ceph heartbeats at all.
Ok, thanks for help will play with settings a bit.
 
I am not going to give more concrete pointers here

Just want to say it was my - no the OP's - (unsolicited general) question inbetween. I hope I have not made it worse for the OP to get a bit of extra help. :)

(Can't believe CEPH has it hardcoded as well.)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!