Is anybody doing HA between two geographical locations ?

Discussion in 'Proxmox VE: Installation and configuration' started by brucexx, Jun 12, 2018 at 23:49.

  1. brucexx

    brucexx Member

    Joined:
    Mar 19, 2015
    Messages:
    101
    Likes Received:
    4
    I wonder if it is possible with Proxmox to do the HA that way with Storage Replication across nodes in two different data centers. Any tried that ? Any other ideas ?

    We can get a fast and reliable link between data centers from the same data center provider.

    thank you
     
  2. wolfgang

    wolfgang Proxmox Staff Member
    Staff Member

    Joined:
    Oct 1, 2014
    Messages:
    3,356
    Likes Received:
    191
    Hi,

    no this does not work, and you will create an HF (High Failure) solution.
    HA use as Basic corosync what need low latency in the network.
    Corosync is made for local networks.
    If you have a fast link, it will not be enough
    because the latency will increase with the distance between the hosts.
     
  3. guletz

    guletz Active Member

    Joined:
    Apr 19, 2017
    Messages:
    448
    Likes Received:
    69
    Hi, I thing I have an ideea about this(is only an ideea, so be aware). The performance for disk access and network capacity is the problem.

    Let say you have data-center A, and data-center B. In A/B you could have many VM/containers. Your goal it can be, let say I want to replicate only some of them from A to B. The Proxmox cluster in A could be different from B. You can setup a lizardfs(cluster file system with replication and distributed options). You will put all yours(critical) VM/containers on the lizardfs(as s separate director/folder under datacenter storage).
    In lizardfs you will define a goal... like for any chunck I want N replica(like mirror, raid5, raid6), but in any case I want at least one to be in the datacenter B. So in the end, you will have the same VMs/containers in both data-centers(A and B). So this is the main ideea! For HA, you will need to use others tools ...
     
  4. brucexx

    brucexx Member

    Joined:
    Mar 19, 2015
    Messages:
    101
    Likes Received:
    4
    What about other solutions ? Can for example VMware do this reliably (two geographical locations) , does anybody know ? ...or it needs some crazy requirements to work.

    Thank you
     
  5. guletz

    guletz Active Member

    Joined:
    Apr 19, 2017
    Messages:
    448
    Likes Received:
    69
    Hi,
    I think that is not possible to have a 100% safety. When on datacenter A you write on a disk, then this block need to be replicated on B. So if before this block was arrive on B, let say that A is now broken. And in this case B it is not knowing that was a block sent from A. Then B will need to wait 2-10 seconds before he can start the data-center fail-over.
    I think if you want a data-center replication you will have only 2 options: asyncron replication (like zfs send/receive) and/or syncron replication (like lizardfs). Each of them have their own good/bad things.
    And in the end, be aware about split-brain scenario ;) This is the most dangerous problem.
     
  6. brucexx

    brucexx Member

    Joined:
    Mar 19, 2015
    Messages:
    101
    Likes Received:
    4
    Thank you all !!!
     
  7. hec

    hec Member

    Joined:
    Jan 8, 2009
    Messages:
    190
    Likes Received:
    5
    We have such a setup.

    This is no problem at all. You need some storage system which can do synchronous commit. We are workling with NetApp MetroCluster.
    If you write a block the block is written to the nvram on both sides. The clients get the ok back as soon both nvrams have the data.
    You can switchover to the other datacenter in less than 1min. Then you need only restart the vms or ha is restarting the vms.

    You can do such a setup with up to 200km. I think proxmox will need less distance for corosync.

    We have not such a big distance between our datacenters. Should be about 10 to 20km.

    Corosync is working without any problem. The storage is doing sync commits so everything is HA and the data is written on both sides.

    I don't understand why that should be impossible to have a cluster streched between two datacenters.

    10G or 100G between 2 DCs should be enough. The storage has 8 links with 8G.

    As i know you need less than 2ms for corosync. Can somebody from the proxmox team confirm this?

    2ms are no problem with 10G.

    For sure you need to think about split brain. This can be done with a monitoring node in a different location. Some LTE connection is also a good thing for checking split brain. You can also do this with storage fencing like vmware is doing this.
     
  8. guletz

    guletz Active Member

    Joined:
    Apr 19, 2017
    Messages:
    448
    Likes Received:
    69
    ... and if you have only one commit, and then your inter data-center link is broken before the 2nd commit can be donne? As I say this is the problem, because in a such case you can loose some data.
    So any synchronized replication tool over 2 data-center is not fail-safe. In some cases some small data failures could be acceptable, but for sure is not for any case. As a dummy example let say you have a ERP db, you have a client that make a important deal ... he buy 1000 of X products. Then before you can finish your DB transactions for him, you are offline with your primary data-center. Then after less then 0.5 min (as you say) the backup data-center is now the primary. So the DB will see one unfinished transaction -> rollback and .... I guess you have a big problem.

    So as I learn from my own errors, you can not have a fail-safe tool with only 2 hosts/data-center. Only a 3 hosts/datacenter can be ok using any synchronization tool.
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice