1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Proxmox 4.2 DRBD: Node does not reconnect after reboot/connection loss

Discussion in 'Proxmox VE: Installation and configuration' started by Jospeh Huber, Sep 30, 2016.

Tags:
  1. Jospeh Huber

    Jospeh Huber New Member

    Joined:
    Apr 18, 2016
    Messages:
    27
    Likes Received:
    2
    Hello all,

    I am new to DRBD but not new to ProxMox ;-)
    We have a 3 Node DRBD9 Cluster setup with Proxmox 4.2 like it is described in the Wiki article here https://pve.proxmox.com/wiki/DRBD9.
    MyVersions: proxmox-ve: 4.2-64 (running kernel: 4.4.16-1-pve),drbdmanage: 0.97-1

    The drbd9 storage is available and I have two LXC containers with HA in it. HA-Migration and Fail-Over works as expected.
    But if one node gets restarted or a connection loss happens it never connects to DRDB again.
    I have also setup the "post-up drbdadm adjust all" in /etc/network/interfaces.
    In the Wiki is described that a "drbdadm adjust all or drbdadm adjust-with-progress all" should do the job... but not for me. It does nothing even if it is manualy invoked
    Also, I did not find anything here https://www.drbd.org/en/doc/users-guide-90/s-node-failure.
    If I recreate the VMs out of a backup everything is fine again, but I think this is not the way to solve the problem ;-)

    Any ideas?

    P.S. My plan is, when problem is solved, to operate some smaller Testsystems and if it works i like to use it in production.

    Here some data:

    Code:
    root@vmhost2:~# drbd-overview
      0:.drbdctrl/0      Connected(3*)                       Secondary(3*)                             UpTo(vmhost2)/UpTo(vmhost5,vmhost1)
      1:.drbdctrl/1      Connected(3*)                       Secondary(3*)                             UpTo(vmhost2)/UpTo(vmhost5,vmhost1)
    100:vm-108-disk-1/0  Conn(vmhost5,vmhost2)/C'ng(vmhost1) Prim(vmhost2)/Unkn(vmhost1)/Seco(vmhost5) UpTo(vmhost2)/Inco(vmhost1)/UpTo(vmhost5)
    101:vm-132-disk-1/0  Conn(vmhost2,vmhost5)/C'ng(vmhost1) Seco(vmhost2)/Unkn(vmhost1)/Prim(vmhost5) UpTo(vmhost2)/Inco(vmhost1)/UpTo(vmhost5)
    
    root@vmhost1:~#  drbdmanage list-nodes
    +---------------------------------------------------------------------------------------------------------+
    | Name    | Pool Size | Pool Free |                                                               | State |
    |---------------------------------------------------------------------------------------------------------|
    | vmhost1 |    510976 |    500756 |                                                               |    ok |
    | vmhost2 |    510976 |    506734 |                                                               |    ok |
    | vmhost5 |    510976 |    500756 |                                                               |    ok |
    +---------------------------------------------------------------------------------------------------------+
    
    A) The disconnected node:
    drbdsetup status
    .drbdctrl role:Secondary
      volume:0 disk:UpToDate
      volume:1 disk:UpToDate
      vmhost2 role:Secondary
        volume:0 peer-disk:UpToDate
        volume:1 peer-disk:UpToDate
      vmhost5 role:Secondary
        volume:0 peer-disk:UpToDate
        volume:1 peer-disk:UpToDate
    
    vm-108-disk-1 role:Secondary
      disk:Inconsistent
      vmhost2 connection:StandAlone
      vmhost5 connection:StandAlone
    
    vm-132-disk-1 role:Secondary
      disk:Outdated
      vmhost2 connection:StandAlone
      vmhost5 connection:StandAlone
    
    
    
    B) The connected Node
    root@vmhost2:~# drbdsetup status
    .drbdctrl role:Secondary
      volume:0 disk:UpToDate
      volume:1 disk:UpToDate
      vmhost1 role:Secondary
        volume:0 peer-disk:UpToDate
        volume:1 peer-disk:UpToDate
      vmhost5 role:Secondary
        volume:0 peer-disk:UpToDate
        volume:1 peer-disk:UpToDate
    
    vm-108-disk-1 role:Primary
      disk:UpToDate
      vmhost1 connection:Connecting
      vmhost5 role:Secondary
        peer-disk:UpToDate
    
    vm-132-disk-1 role:Secondary
      disk:UpToDate
      vmhost1 connection:Connecting
      vmhost5 role:Primary
        peer-disk:UpToDate
    
     
  2. Jospeh Huber

    Jospeh Huber New Member

    Joined:
    Apr 18, 2016
    Messages:
    27
    Likes Received:
    2
  3. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    2,366
    Likes Received:
    302
    if you update to the current version, this should no longer happen
     
  4. Jospeh Huber

    Jospeh Huber New Member

    Joined:
    Apr 18, 2016
    Messages:
    27
    Likes Received:
    2
    OK I will try.

    But in BugZilla it is not marked as resolved ...
     
    titux likes this.
  5. Jospeh Huber

    Jospeh Huber New Member

    Joined:
    Apr 18, 2016
    Messages:
    27
    Likes Received:
    2
    Confirmed: After Upgrading to 4.3 the reconnect works after reboot.
    Solved :)!
     
  6. Jospeh Huber

    Jospeh Huber New Member

    Joined:
    Apr 18, 2016
    Messages:
    27
    Likes Received:
    2
    Problem occured again after a reboot. The system with drbd was up for 65 days.

    proxmox-ve: 4.3-66 (running kernel: 4.4.19-1-pve)
    ...
    drbdmanage: 0.97.3-1


    I have to execute on all Nodes:
    drbdmanage export-res "*";drbdadm adjust all

    Then it is reconnecting again.

    => Fortunately not reproducible after several reboots...
     
    #6 Jospeh Huber, Dec 5, 2016
    Last edited: Dec 6, 2016
  7. Jospeh Huber

    Jospeh Huber New Member

    Joined:
    Apr 18, 2016
    Messages:
    27
    Likes Received:
    2
    After several reboots and upgrades i cannot get some disks synced of my 3 Nodes Cluster connecting again.

    I have tried several different approaches but nothing helps:
    Code:
    drbdmanage list-nodes
    +--------------------------------------------------------------------------------------------------+
    | Name    | Pool Size | Pool Free |                                                        | State |
    |--------------------------------------------------------------------------------------------------|
    | vmhost1 |    510976 |    366727 |                                                        |    ok |
    | vmhost2 |    510976 |    365858 |                                                        |    ok |
    | vmhost5 |    510976 |    370917 |                                                        |    ok |
    +--------------------------------------------------------------------------------------------------+
    
    Node 1
    drbdsetup status vm-103-disk-1
    vm-103-disk-1 role:Secondary
      disk:Inconsistent
      vmhost2 connection:Connecting
      vmhost5 connection:Connecting
    
    Node 2
    vm-103-disk-1 role:Secondary
      disk:UpToDate
      vmhost1 connection:StandAlone
      vmhost5 role:Primary
        peer-disk:UpToDate
    
    Node 3
    vm-103-disk-1 role:Primary
      disk:UpToDate
      vmhost1 connection:StandAlone
      vmhost2 role:Secondary
        peer-disk:UpToDate
    
    I tried a manual split-brain recovery... but nothing helps (drbdmanage export-res "*";drbdadm adjust all)
    Any ideas?
    stalenode: drbdadm disconnect vm-103-disk-1
    stalenode: drbdadm connect --discard-my-data vm-103-disk-1
    goodnode:
    drbdadm connect vm-103-disk-1`

    It seems that I have also some stale data in my configuration... can't fix this!
    /var/lib/drbd.d/drbdmanage_vm-107-disk-1.res:2: in resource vm-107-disk-1:
    # executed on all three nodes ...
    drbdmanage remove-resource vm-107-disk-1 --force
    drbdmanage export-res "*";drbdadm adjust-with-progress all
    WARNING:root:Could not read configuration file '/etc/drbdmanaged.cfg'
    Operation completed successfully
    /var/lib/drbd.d/drbdmanage_vm-107-disk-1.res:2: in resource vm-107-disk-1:
    There is no 'on' section for hostname 'vmhost1' named in the connection-mesh
     
    #7 Jospeh Huber, Dec 13, 2016
    Last edited: Dec 13, 2016
  8. titux

    titux New Member

    Joined:
    Jan 30, 2015
    Messages:
    16
    Likes Received:
    0
    I have same problem, same version 4.3, connection lost and status connecting... outdated.. never reconnects..
    I will try to upgrade to version 4.4 and see what happens...
     
  9. titux

    titux New Member

    Joined:
    Jan 30, 2015
    Messages:
    16
    Likes Received:
    0
    Sad to say still problems exist my resource is in StandAlone even after upgrading my nodes to Promox 4.4-12. DRBDmanage license is back to GPL status so please Proxmox help...
    One thing I can see is that drbdmanage was not update to latest version..
     
  10. e100

    e100 Active Member
    Proxmox VE Subscriber

    Joined:
    Nov 6, 2010
    Messages:
    1,215
    Likes Received:
    17

Share This Page