Proxmox 4.2 DRBD: Node does not reconnect after reboot/connection loss

Discussion in 'Proxmox VE: Installation and configuration' started by Jospeh Huber, Sep 30, 2016.

Tags:
  1. Jospeh Huber

    Jospeh Huber Member

    Joined:
    Apr 18, 2016
    Messages:
    52
    Likes Received:
    2
    Hello all,

    I am new to DRBD but not new to ProxMox ;-)
    We have a 3 Node DRBD9 Cluster setup with Proxmox 4.2 like it is described in the Wiki article here https://pve.proxmox.com/wiki/DRBD9.
    MyVersions: proxmox-ve: 4.2-64 (running kernel: 4.4.16-1-pve),drbdmanage: 0.97-1

    The drbd9 storage is available and I have two LXC containers with HA in it. HA-Migration and Fail-Over works as expected.
    But if one node gets restarted or a connection loss happens it never connects to DRDB again.
    I have also setup the "post-up drbdadm adjust all" in /etc/network/interfaces.
    In the Wiki is described that a "drbdadm adjust all or drbdadm adjust-with-progress all" should do the job... but not for me. It does nothing even if it is manualy invoked
    Also, I did not find anything here https://www.drbd.org/en/doc/users-guide-90/s-node-failure.
    If I recreate the VMs out of a backup everything is fine again, but I think this is not the way to solve the problem ;-)

    Any ideas?

    P.S. My plan is, when problem is solved, to operate some smaller Testsystems and if it works i like to use it in production.

    Here some data:

    Code:
    root@vmhost2:~# drbd-overview
      0:.drbdctrl/0      Connected(3*)                       Secondary(3*)                             UpTo(vmhost2)/UpTo(vmhost5,vmhost1)
      1:.drbdctrl/1      Connected(3*)                       Secondary(3*)                             UpTo(vmhost2)/UpTo(vmhost5,vmhost1)
    100:vm-108-disk-1/0  Conn(vmhost5,vmhost2)/C'ng(vmhost1) Prim(vmhost2)/Unkn(vmhost1)/Seco(vmhost5) UpTo(vmhost2)/Inco(vmhost1)/UpTo(vmhost5)
    101:vm-132-disk-1/0  Conn(vmhost2,vmhost5)/C'ng(vmhost1) Seco(vmhost2)/Unkn(vmhost1)/Prim(vmhost5) UpTo(vmhost2)/Inco(vmhost1)/UpTo(vmhost5)
    
    root@vmhost1:~#  drbdmanage list-nodes
    +---------------------------------------------------------------------------------------------------------+
    | Name    | Pool Size | Pool Free |                                                               | State |
    |---------------------------------------------------------------------------------------------------------|
    | vmhost1 |    510976 |    500756 |                                                               |    ok |
    | vmhost2 |    510976 |    506734 |                                                               |    ok |
    | vmhost5 |    510976 |    500756 |                                                               |    ok |
    +---------------------------------------------------------------------------------------------------------+
    
    A) The disconnected node:
    drbdsetup status
    .drbdctrl role:Secondary
      volume:0 disk:UpToDate
      volume:1 disk:UpToDate
      vmhost2 role:Secondary
        volume:0 peer-disk:UpToDate
        volume:1 peer-disk:UpToDate
      vmhost5 role:Secondary
        volume:0 peer-disk:UpToDate
        volume:1 peer-disk:UpToDate
    
    vm-108-disk-1 role:Secondary
      disk:Inconsistent
      vmhost2 connection:StandAlone
      vmhost5 connection:StandAlone
    
    vm-132-disk-1 role:Secondary
      disk:Outdated
      vmhost2 connection:StandAlone
      vmhost5 connection:StandAlone
    
    
    
    B) The connected Node
    root@vmhost2:~# drbdsetup status
    .drbdctrl role:Secondary
      volume:0 disk:UpToDate
      volume:1 disk:UpToDate
      vmhost1 role:Secondary
        volume:0 peer-disk:UpToDate
        volume:1 peer-disk:UpToDate
      vmhost5 role:Secondary
        volume:0 peer-disk:UpToDate
        volume:1 peer-disk:UpToDate
    
    vm-108-disk-1 role:Primary
      disk:UpToDate
      vmhost1 connection:Connecting
      vmhost5 role:Secondary
        peer-disk:UpToDate
    
    vm-132-disk-1 role:Secondary
      disk:UpToDate
      vmhost1 connection:Connecting
      vmhost5 role:Primary
        peer-disk:UpToDate
    
     
  2. Jospeh Huber

    Jospeh Huber Member

    Joined:
    Apr 18, 2016
    Messages:
    52
    Likes Received:
    2
  3. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,045
    Likes Received:
    459
    if you update to the current version, this should no longer happen
     
  4. Jospeh Huber

    Jospeh Huber Member

    Joined:
    Apr 18, 2016
    Messages:
    52
    Likes Received:
    2
    OK I will try.

    But in BugZilla it is not marked as resolved ...
     
    titux likes this.
  5. Jospeh Huber

    Jospeh Huber Member

    Joined:
    Apr 18, 2016
    Messages:
    52
    Likes Received:
    2
    Confirmed: After Upgrading to 4.3 the reconnect works after reboot.
    Solved :)!
     
  6. Jospeh Huber

    Jospeh Huber Member

    Joined:
    Apr 18, 2016
    Messages:
    52
    Likes Received:
    2
    Problem occured again after a reboot. The system with drbd was up for 65 days.

    proxmox-ve: 4.3-66 (running kernel: 4.4.19-1-pve)
    ...
    drbdmanage: 0.97.3-1


    I have to execute on all Nodes:
    drbdmanage export-res "*";drbdadm adjust all

    Then it is reconnecting again.

    => Fortunately not reproducible after several reboots...
     
    #6 Jospeh Huber, Dec 5, 2016
    Last edited: Dec 6, 2016
  7. Jospeh Huber

    Jospeh Huber Member

    Joined:
    Apr 18, 2016
    Messages:
    52
    Likes Received:
    2
    After several reboots and upgrades i cannot get some disks synced of my 3 Nodes Cluster connecting again.

    I have tried several different approaches but nothing helps:
    Code:
    drbdmanage list-nodes
    +--------------------------------------------------------------------------------------------------+
    | Name    | Pool Size | Pool Free |                                                        | State |
    |--------------------------------------------------------------------------------------------------|
    | vmhost1 |    510976 |    366727 |                                                        |    ok |
    | vmhost2 |    510976 |    365858 |                                                        |    ok |
    | vmhost5 |    510976 |    370917 |                                                        |    ok |
    +--------------------------------------------------------------------------------------------------+
    
    Node 1
    drbdsetup status vm-103-disk-1
    vm-103-disk-1 role:Secondary
      disk:Inconsistent
      vmhost2 connection:Connecting
      vmhost5 connection:Connecting
    
    Node 2
    vm-103-disk-1 role:Secondary
      disk:UpToDate
      vmhost1 connection:StandAlone
      vmhost5 role:Primary
        peer-disk:UpToDate
    
    Node 3
    vm-103-disk-1 role:Primary
      disk:UpToDate
      vmhost1 connection:StandAlone
      vmhost2 role:Secondary
        peer-disk:UpToDate
    
    I tried a manual split-brain recovery... but nothing helps (drbdmanage export-res "*";drbdadm adjust all)
    Any ideas?
    stalenode: drbdadm disconnect vm-103-disk-1
    stalenode: drbdadm connect --discard-my-data vm-103-disk-1
    goodnode:
    drbdadm connect vm-103-disk-1`

    It seems that I have also some stale data in my configuration... can't fix this!
    /var/lib/drbd.d/drbdmanage_vm-107-disk-1.res:2: in resource vm-107-disk-1:
    # executed on all three nodes ...
    drbdmanage remove-resource vm-107-disk-1 --force
    drbdmanage export-res "*";drbdadm adjust-with-progress all
    WARNING:root:Could not read configuration file '/etc/drbdmanaged.cfg'
    Operation completed successfully
    /var/lib/drbd.d/drbdmanage_vm-107-disk-1.res:2: in resource vm-107-disk-1:
    There is no 'on' section for hostname 'vmhost1' named in the connection-mesh
     
    #7 Jospeh Huber, Dec 13, 2016
    Last edited: Dec 13, 2016
  8. titux

    titux New Member

    Joined:
    Jan 30, 2015
    Messages:
    16
    Likes Received:
    0
    I have same problem, same version 4.3, connection lost and status connecting... outdated.. never reconnects..
    I will try to upgrade to version 4.4 and see what happens...
     
  9. titux

    titux New Member

    Joined:
    Jan 30, 2015
    Messages:
    16
    Likes Received:
    0
    Sad to say still problems exist my resource is in StandAlone even after upgrading my nodes to Promox 4.4-12. DRBDmanage license is back to GPL status so please Proxmox help...
    One thing I can see is that drbdmanage was not update to latest version..
     
  10. e100

    e100 Active Member
    Proxmox VE Subscriber

    Joined:
    Nov 6, 2010
    Messages:
    1,227
    Likes Received:
    21
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice