Kernel panic with Proxmox 4.1-13 + DRBD 9.0.0

Discussion in 'Proxmox VE: Installation and configuration' started by nicorac, Feb 23, 2016.

Tags:
  1. nicorac

    nicorac New Member

    Joined:
    Apr 28, 2011
    Messages:
    12
    Likes Received:
    1
    Hello everyone.
    I've setup a 2-nodes Proxmox environment with DRBD backed storage.

    Code:
    # pveversion -v
    proxmox-ve: 4.1-37 (running kernel: 4.2.8-1-pve)
    pve-manager: 4.1-13 (running version: 4.1-13/cfb599fb)
    pve-kernel-4.2.6-1-pve: 4.2.6-36
    pve-kernel-4.2.8-1-pve: 4.2.8-37
    lvm2: 2.02.116-pve2
    corosync-pve: 2.3.5-2
    libqb0: 1.0-1
    pve-cluster: 4.0-32
    qemu-server: 4.0-55
    pve-firmware: 1.1-7
    libpve-common-perl: 4.0-48
    libpve-access-control: 4.0-11
    libpve-storage-perl: 4.0-40
    pve-libspice-server1: 0.12.5-2
    vncterm: 1.2-1
    pve-qemu-kvm: 2.5-5
    pve-container: 1.0-44
    pve-firewall: 2.0-17
    pve-ha-manager: 1.0-21
    ksm-control-daemon: 1.2-1
    glusterfs-client: 3.5.2-2+deb8u1
    lxc-pve: 1.1.5-7
    lxcfs: 0.13-pve3
    cgmanager: 0.39-pve1
    criu: 1.6.0-1
    zfsutils: 0.6.5-pve7~jessie
    drbdmanage: 0.91-1
    DRBD is not integrated with Proxmox (since it's still experimental, as I understand), I've manually configured it following these docs:
    https://pve.proxmox.com/wiki/DRBD
    https://pve.proxmox.com/wiki/DRBD9

    NOTE: I came from an old Proxmox 1.9 setup and jumped to 4.1 by rebuilding everything from scratch (no DRBD metadata upgrade, everything was recreated). I only used old config files as a reference, adapting them to the new DRBD9 style.

    Well, it worked great for about 3 weeks, then one of the nodes crashed with a kernel panic.
    After restarting it and rebuilding the DRBD resource, I upgraded both the kernel (actually 4.2.8) and server BIOS... after a week the other node crashed (so it's not hardware or VM related).

    After some trial&error I found that doing heavy IO traffic on the DRBD device will quickly lead to kernel panic. I've moved all the VMs to node A then attached a serial console to node B and caused it to crash:
    (full log here: http://pastebin.com/8HEvR42w)
    Code:
    http://pastebin.com/8HEvR42w
    
    [ 2678.632647] drbd r0/0 drbd0: LOGIC BUG for enr=98347
    [ 2678.637678] drbd r0/0 drbd0: LOGIC BUG for enr=98347
    [ 2679.045598] ------------[ cut here ]------------
    [ 2679.050273] kernel BUG at /home/dietmar/pve4-devel/pve-kernel/drbd-9.0.0/drbd/lru_cache.c:571!
    [ 2679.058981] invalid opcode: 0000 [#1] SMP
    [ 2679.063148] Modules linked in: ip_set ip6table_filter ip6_tables drbd_transport_tcp(O) softdog drbd(O) libcrc32c nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_comment xt_conntrack xt_multiport iptable_filter iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables nfnetlink_log nfnetlink zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO) ipmi_ssif amdkfd amd_iommu_v2 radeon gpio_ich ttm snd_pcm coretemp hpilo snd_timer drm_kms_helper snd drm kvm_intel soundcore i2c_algo_bit kvm psmouse 8250_fintek input_leds ipmi_si shpchp pcspkr ipmi_msghandler serio_raw acpi_power_meter lpc_ich i7core_edac edac_core mac_hid vhost_net vhost macvtap macvlan autofs4 hid_generic usbmouse usbkbd usbhid hid pata_acpi tg3 e1000e(O) ptp pps_core hpsa
    [ 2679.149400] CPU: 0 PID: 2171 Comm: drbd_a_r0 Tainted: P  IO  4.2.8-1-pve #1
    [ 2679.157313] Hardware name: HP ProLiant ML350 G6, BIOS D22 08/16/2015
    [ 2679.163733] task: ffff8800dcb9a580 ti: ffff8801fccd8000 task.ti: ffff8801fccd8000
    [ 2679.171295] RIP: 0010:[<ffffffffc0ad25a0>]  [<ffffffffc0ad25a0>] lc_put+0x90/0xa0 [drbd]
    [ 2679.179499] RSP: 0018:ffff8801fccdbb08  EFLAGS: 00010046
    [ 2679.184867] RAX: 0000000000000000 RBX: 000000000001802b RCX: ffff880207770f90
    [ 2679.192078] RDX: ffff88020b8d4000 RSI: ffff880207770f90 RDI: ffff8800e20eb680
    [ 2679.199288] RBP: ffff8801fccdbb08 R08: 0000000000000270 R09: 0000000000000000
    [ 2679.206499] R10: ffff8800b0d8d460 R11: 0000000000000000 R12: ffff8800362a9800
    [ 2679.213711] R13: 0000000000000000 R14: 000000000001802b R15: 0000000000000001
    [ 2679.220923] FS:  0000000000000000(0000) GS:ffff880217400000(0000) knlGS:0000000000000000
    [ 2679.229100] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [ 2679.234906] CR2: 00007f506e540123 CR3: 0000000001e0d000 CR4: 00000000000026f0
    [ 2679.242117] Stack:
    [ 2679.244147]  ffff8801fccdbb58 ffffffffc0acf2ea 0000000000000046 ffff8800362a9ab0
    [ 2679.251682]  ffff8801fccdbb68 ffff8800b0d8d028 ffff8800362a9800 ffff8800b0d8d038
    [ 2679.259216]  0000000000000800 0000000000001000 ffff8801fccdbb68 ffffffffc0acf7f0
    [ 2679.266751] Call Trace:
    [ 2679.269229]  [<ffffffffc0acf2ea>] put_actlog+0x6a/0x120 [drbd]
    [ 2679.275131]  [<ffffffffc0acf7f0>] drbd_al_complete_io+0x30/0x40 [drbd]
    [ 2679.281735]  [<ffffffffc0ac9a42>] drbd_req_destroy+0x442/0x880 [drbd]
    [ 2679.288246]  [<ffffffff81735350>] ? tcp_recvmsg+0x390/0xb90
    [ 2679.293881]  [<ffffffffc0aca385>] mod_rq_state+0x505/0x7c0 [drbd]
    [ 2679.300131]  [<ffffffffc0aca934>] __req_mod+0x214/0x8d0 [drbd]
    [ 2679.310587]  [<ffffffffc0ad441b>] tl_release+0x1db/0x320 [drbd]
    [ 2679.321185]  [<ffffffffc0ab8232>] got_BarrierAck+0x32/0xc0 [drbd]
    [ 2679.331962]  [<ffffffffc0ac8670>] drbd_ack_receiver+0x160/0x5c0 [drbd]
    [ 2679.343127]  [<ffffffffc0ad29a0>] ? w_complete+0x20/0x20 [drbd]
    [ 2679.353619]  [<ffffffffc0ad2a04>] drbd_thread_setup+0x64/0x120 [drbd]
    [ 2679.364599]  [<ffffffffc0ad29a0>] ? w_complete+0x20/0x20 [drbd]
    [ 2679.375114]  [<ffffffff8109b1fa>] kthread+0xea/0x100
    [ 2679.384655]  [<ffffffff8109b110>] ? kthread_create_on_node+0x1f0/0x1f0
    [ 2679.395851]  [<ffffffff81809e5f>] ret_from_fork+0x3f/0x70
    [ 2679.405802]  [<ffffffff8109b110>] ? kthread_create_on_node+0x1f0/0x1f0
    [ 2679.416870] Code: 89 42 08 48 89 56 10 48 89 7e 18 48 89 07 83 6f 64 01 f0 80 a7 90 00 00 00 f7 f0 80 a7 90 00 00 00 fe 8b 46 20 5d c3 0f 0b 0f 0b <0f> 0b 0f 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
    [ 2679.446124] RIP  [<ffffffffc0ad25a0>] lc_put+0x90/0xa0 [drbd]
    [ 2679.456617]  RSP <ffff8801fccdbb08>
    [ 2679.464764] ---[ end trace b1f10fd6ac931718 ]---
    [ 2693.529503] ------------[ cut here ]------------
    [ 2693.538650] WARNING: CPU: 7 PID: 0 at kernel/watchdog.c:311 watchdog_overflow_callback+0x84/0xa0()
    [ 2693.552118] Watchdog detected hard LOCKUP on cpu 7
    [ 2693.556788] Modules linked in: ip_set ip6table_filter ip6_tables drbd_transport_tcp(O) softdog drbd(O) libcrc32c nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_comment xt_conntrack xt_multiport iptable_filter iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables nfnetlink_log nfnetlink zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO) ipmi_ssif amdkfd amd_iommu_v2 radeon gpio_ich ttm snd_pcm coretemp hpilo snd_timer[ 2693.637982] ------------[ cut here ]------------
    
    The first 4 lines show that DRBD has crashed.

    This is my DRBD config:
    Code:
    global {
      usage-count yes;
    }
    
    common {
      handlers {
      }
    
      startup {
      }
    
      options {
      }
    
      disk {
      resync-rate 50M;
      c-max-rate 50M;
      }
    
      net {
      }
    }

    Code:
    resource r0 {
      protocol C;
      startup {
      wfc-timeout  0;
      degr-wfc-timeout 60;
      become-primary-on both;
      }
      net {
      cram-hmac-alg sha1;
      shared-secret "myPassword4Drbd";
      allow-two-primaries;
      after-sb-0pri discard-zero-changes;
      after-sb-1pri discard-secondary;
      after-sb-2pri disconnect;
      sndbuf-size 0;
      rcvbuf-size 0;
      max-buffers  8000;
      max-epoch-size 8000;
      }
      on A {
      node-id 0;
      device /dev/drbd0;
      disk /dev/sdb1;
      address 10.0.0.1:7788;
      meta-disk internal;
      }
      on B {
      node-id 1;
      device /dev/drbd0;
      disk /dev/sdb1;
      address 10.0.0.2:7788;
      meta-disk internal;
      }
      disk {
      # no-disk-barrier and no-disk-flushes should be applied only to systems with non-volatile (battery backed) controller caches.
      # Follow links for more information:
      # http://www.drbd.org/users-guide-8.3/s-throughput-tuning.html#s-tune-disable-barriers
      # http://www.drbd.org/users-guide/s-throughput-tuning.html#s-tune-disable-barriers
      no-disk-barrier;
      no-disk-flushes;
      }
    }
    
    

    As a workaround I downgraded host B to kernel 4.2.2-1-pve then started the IO traffic again.
    It's running since 30 minutes ago without a crash (kernel 4.2.8-1-pve crashed after 3/4 minutes).

    Do anyone has an idea of what's wrong?
     
  2. nicorac

    nicorac New Member

    Joined:
    Apr 28, 2011
    Messages:
    12
    Likes Received:
    1
    Relevant PANIC log lines (DRBD 9.0.0 with kernel 4.2.8-1-pve):
    Code:
    [ 2678.632647] drbd r0/0 drbd0: LOGIC BUG for enr=98347
    [ 2678.637678] drbd r0/0 drbd0: LOGIC BUG for enr=98347
    [ 2679.045598] ------------[ cut here ]------------
    [ 2679.050273] kernel BUG at /home/dietmar/pve4-devel/pve-kernel/drbd-9.0.0/drbd/lru_cache.c:571!
    [ 2679.058981] invalid opcode: 0000 [#1] SMP
    [ 2679.063148] Modules linked in: ip_set ip6table_filter ip6_tables drbd_transport_tcp(O) softdog drbd(O) libcrc32c nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_comment xt_conntrack xt_multiport iptable_filter iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables nfnetlink_log nfnetlink zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO) ipmi_ssif amdkfd amd_iommu_v2 radeon gpio_ich ttm snd_pcm coretemp hpilo snd_timer drm_kms_helper snd drm kvm_intel soundcore i2c_algo_bit kvm psmouse 8250_fintek input_leds ipmi_si shpchp pcspkr ipmi_msghandler serio_raw acpi_power_meter lpc_ich i7core_edac edac_core mac_hid vhost_net vhost macvtap macvlan autofs4 hid_generic usbmouse usbkbd usbhid hid pata_acpi tg3 e1000e(O) ptp pps_core hpsa
    [ 2679.149400] CPU: 0 PID: 2171 Comm: drbd_a_r0 Tainted: P          IO    4.2.8-1-pve #1
    [ 2679.157313] Hardware name: HP ProLiant ML350 G6, BIOS D22 08/16/2015
    [ 2679.163733] task: ffff8800dcb9a580 ti: ffff8801fccd8000 task.ti: ffff8801fccd8000
    [ 2679.171295] RIP: 0010:[<ffffffffc0ad25a0>]  [<ffffffffc0ad25a0>] lc_put+0x90/0xa0 [drbd]
    [ 2679.179499] RSP: 0018:ffff8801fccdbb08  EFLAGS: 00010046
    [ 2679.184867] RAX: 0000000000000000 RBX: 000000000001802b RCX: ffff880207770f90
    [ 2679.192078] RDX: ffff88020b8d4000 RSI: ffff880207770f90 RDI: ffff8800e20eb680
    [ 2679.199288] RBP: ffff8801fccdbb08 R08: 0000000000000270 R09: 0000000000000000
    [ 2679.206499] R10: ffff8800b0d8d460 R11: 0000000000000000 R12: ffff8800362a9800
    [ 2679.213711] R13: 0000000000000000 R14: 000000000001802b R15: 0000000000000001
    [ 2679.220923] FS:  0000000000000000(0000) GS:ffff880217400000(0000) knlGS:0000000000000000
    [ 2679.229100] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    [ 2679.234906] CR2: 00007f506e540123 CR3: 0000000001e0d000 CR4: 00000000000026f0
    [ 2679.242117] Stack:
    [ 2679.244147]  ffff8801fccdbb58 ffffffffc0acf2ea 0000000000000046 ffff8800362a9ab0
    [ 2679.251682]  ffff8801fccdbb68 ffff8800b0d8d028 ffff8800362a9800 ffff8800b0d8d038
    [ 2679.259216]  0000000000000800 0000000000001000 ffff8801fccdbb68 ffffffffc0acf7f0
    [ 2679.266751] Call Trace:
    [ 2679.269229]  [<ffffffffc0acf2ea>] put_actlog+0x6a/0x120 [drbd]
    [ 2679.275131]  [<ffffffffc0acf7f0>] drbd_al_complete_io+0x30/0x40 [drbd]
    [ 2679.281735]  [<ffffffffc0ac9a42>] drbd_req_destroy+0x442/0x880 [drbd]
    [ 2679.288246]  [<ffffffff81735350>] ? tcp_recvmsg+0x390/0xb90
    [ 2679.293881]  [<ffffffffc0aca385>] mod_rq_state+0x505/0x7c0 [drbd]
    [ 2679.300131]  [<ffffffffc0aca934>] __req_mod+0x214/0x8d0 [drbd]
    [ 2679.310587]  [<ffffffffc0ad441b>] tl_release+0x1db/0x320 [drbd]
    [ 2679.321185]  [<ffffffffc0ab8232>] got_BarrierAck+0x32/0xc0 [drbd]
    [ 2679.331962]  [<ffffffffc0ac8670>] drbd_ack_receiver+0x160/0x5c0 [drbd]
    [ 2679.343127]  [<ffffffffc0ad29a0>] ? w_complete+0x20/0x20 [drbd]
    [ 2679.353619]  [<ffffffffc0ad2a04>] drbd_thread_setup+0x64/0x120 [drbd]
    [ 2679.364599]  [<ffffffffc0ad29a0>] ? w_complete+0x20/0x20 [drbd]
    [ 2679.375114]  [<ffffffff8109b1fa>] kthread+0xea/0x100
    [ 2679.384655]  [<ffffffff8109b110>] ? kthread_create_on_node+0x1f0/0x1f0
    [ 2679.395851]  [<ffffffff81809e5f>] ret_from_fork+0x3f/0x70
    [ 2679.405802]  [<ffffffff8109b110>] ? kthread_create_on_node+0x1f0/0x1f0
    [ 2679.416870] Code: 89 42 08 48 89 56 10 48 89 7e 18 48 89 07 83 6f 64 01 f0 80 a7 90 00 00 00 f7 f0 80 a7 90 00 00 00 fe 8b 46 20 5d c3 0f 0b 0f 0b <0f> 0b 0f 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
    [ 2679.446124] RIP  [<ffffffffc0ad25a0>] lc_put+0x90/0xa0 [drbd]
    [ 2679.456617]  RSP <ffff8801fccdbb08>
    [ 2679.464764] ---[ end trace b1f10fd6ac931718 ]---
    [ 2693.529503] ------------[ cut here ]------------
     
    #2 nicorac, Feb 23, 2016
    Last edited: Feb 23, 2016
  3. nicorac

    nicorac New Member

    Joined:
    Apr 28, 2011
    Messages:
    12
    Likes Received:
    1
    Update: sadly the kernel panic happens with kernel 4.2.2-1-pve too.
    The host B, containing the VM with the high IO traffic test, just crashed...

    I'd like to try DRBD 9.0.1, released 2 weeks ago.
    Is there any documentation available on how to build my custom modules?
     
  4. nicorac

    nicorac New Member

    Joined:
    Apr 28, 2011
    Messages:
    12
    Likes Received:
    1
    An updated kernel has been built by Dietmar (thanks!) with DRBD 9.0.1 and it's available in pvetest repo.

    Sadly it still crashes with a kernel panic at the same source line... :(
     
  5. nicorac

    nicorac New Member

    Joined:
    Apr 28, 2011
    Messages:
    12
    Likes Received:
    1
    Since I've had near to no answers about this kernel panic, I've downgraded my DRBD resources to 8.4 and they work flawlessly since 4 days.

    As a future reference, I've put my downgrade notes down to a blog post here:
    http://coolsoft.altervista.org/en/b...rnel-panic-downgrade-drbd-resources-drbd-9-84

    PS: doesn't a kernel panic need some further investigation?
    I hoped for a little bit more interest on it by Proxmox team...
     
    olivierboro likes this.
  6. jeanlau

    jeanlau Member

    Joined:
    May 2, 2014
    Messages:
    45
    Likes Received:
    4
    Hello :)

    +1 for me

    So I'm very disappointed because I have two choices :
    1. Downgrade DRBD to version 8.4 and live in the fear that one day an update breaks everything
    2. Downgrade Proxmox to 3.4 and know that my system is not up to date and I won't have the new functionalities

    Not decided yet
     
  7. cesarpk

    cesarpk Member

    Joined:
    Mar 31, 2012
    Messages:
    770
    Likes Received:
    2
    Hi

    I am interesting in know if the compilation of DRBD 8.4.7-1 is stable with the lastest kernel version of PVE.
     
  8. nicorac

    nicorac New Member

    Joined:
    Apr 28, 2011
    Messages:
    12
    Likes Received:
    1
    My hosts never crashed since then (28 Feb 2016) :)
     
  9. cesarpk

    cesarpk Member

    Joined:
    Mar 31, 2012
    Messages:
    770
    Likes Received:
    2
    Many thanks nicorac.

    And let me to do a question:
    Do you have the kernel pve-kernel-4.2.8-1-pve_4.2.8-41 installed?

    Best regards
     
  10. nicorac

    nicorac New Member

    Joined:
    Apr 28, 2011
    Messages:
    12
    Likes Received:
    1
    Code:
    # pveversion -v
    proxmox-ve: 4.1-37 (running kernel: 4.2.8-1-pve)
    pve-manager: 4.1-13 (running version: 4.1-13/cfb599fb)
    pve-kernel-4.2.8-1-pve: 4.2.8-38
    lvm2: 2.02.116-pve2
    corosync-pve: 2.3.5-2
    libqb0: 1.0-1
    pve-cluster: 4.0-32
    qemu-server: 4.0-55
    pve-firmware: 1.1-7
    libpve-common-perl: 4.0-48
    libpve-access-control: 4.0-11
    libpve-storage-perl: 4.0-40
    pve-libspice-server1: 0.12.5-2
    vncterm: 1.2-1
    pve-qemu-kvm: 2.5-5
    pve-container: 1.0-44
    pve-firewall: 2.0-17
    pve-ha-manager: 1.0-21
    ksm-control-daemon: 1.2-1
    glusterfs-client: 3.5.2-2+deb8u1
    lxc-pve: 1.1.5-7
    lxcfs: 0.13-pve3
    cgmanager: 0.39-pve1
    criu: 1.6.0-1
    zfsutils: 0.6.5-pve7~jessie
    drbdmanage: 0.91-1
    Code:
    # modinfo /lib/modules/4.2.8-1-pve/kernel/drivers/block/drbd/drbd.ko
    filename:  /lib/modules/4.2.8-1-pve/kernel/drivers/block/drbd/drbd.ko
    alias:  block-major-147-*
    license:  GPL
    version:  8.4.7-1
    description:  drbd - Distributed Replicated Block Device v8.4.7-1
    author:  Philipp Reisner <phil@linbit.com>, Lars Ellenberg <lars@linbit.com>
    srcversion:  E0AC696C0098FEFA7D9977C
    depends:  libcrc32c
    vermagic:  4.2.8-1-pve SMP mod_unload modversions
    parm:  minor_count:Approximate number of drbd devices (1-255) (uint)
    parm:  disable_sendpage:bool
    parm:  allow_oos:DONT USE! (bool)
    parm:  proc_details:int
    parm:  enable_faults:int
    parm:  fault_rate:int
    parm:  fault_count:int
    parm:  fault_devs:int
    parm:  usermode_helper:string
     
  11. cesarpk

    cesarpk Member

    Joined:
    Mar 31, 2012
    Messages:
    770
    Likes Received:
    2
    MMmmm.... is excelent !!!, thanks nicorac.... :)

    For a future caution using DRBD 9.1 or higher versions, let me to do a questions:

    Knowing that DRBD 9.x isn't stable today, do you know if with "PVE 4.x" and "DRBD 9.x" is possible to do a configuration at style "DRBD 8.x" in several PVE nodes?, ie, with connections NIC-to-NIC (also know as a connection of crossover cable).

    This question is important for me because I have several PVE nodes in a PVE Cluster, and the NICs are for exclusive use of DRBD, but here, between the PVE nodes, i have NICs of 1 and 10 gbps.

    And the last question, as I have not practiced with PVE 4.x, i guess that the initial configuration is possible using the PVE GUI, and the final tuning with a text editor, right?

    Anyway, again many thanks for the guide that you have shown and for help me.

    Best regards
     
  12. e100

    e100 Active Member
    Proxmox VE Subscriber

    Joined:
    Nov 6, 2010
    Messages:
    1,227
    Likes Received:
    23
    The main advantage to drbd9 with Proxmox is each VM disk you create is an independent DRBD resource.

    You can still manually configure drbd9 just like drbd8 if you want. Drbdmanage is just a new tool that makes managing drbd resources easier (well easier once they get all the bugs shaken out of it)

    A better solution, IMHO, would be to setup multiple drbdmanage clusters within your Proxmox cluster. For example if you had four servers A,B,C and D you could make A and B one drbdmanage cluster and C and D another. That way you can still take advantage of the Proxmox integration.

    You can use whatever nic you want for drbd communication, Drbdmanage even had a command to change the IP of a node should you decide to switch to another nic.

    I was under the impression that drbd 8.x did not work in newer linux kernels. Great to know that's at least not entirely true. I think Proxmox is planning to move to a 4.4 kernel soon not sure if that will be a problem for drbd 8.x.

    It's also worth mentioning.... DRBD9 seems to work just fine. The problems I've had have been with Drbdmanage and known kernel bugs with Infiniband.
     
    Bob K Mertz likes this.
  13. nicorac

    nicorac New Member

    Joined:
    Apr 28, 2011
    Messages:
    12
    Likes Received:
    1
    My setup is a dual-node with whole DRBD mirrored disks, connected with dedicated NICs and a crossover cable.
    I have near-to-no experience with multi-node DRBD, sorry.

    Same to me, except for kernel panics ;)
     
  14. cesarpk

    cesarpk Member

    Joined:
    Mar 31, 2012
    Messages:
    770
    Likes Received:
    2
    e100, first of all, it is a pleasure to greet you, after all this time.

    Oh, then if i want extend a virtual disk by the PVE GUI, PVE in first step extend the DRBD resource, and after, the resize the virtual disk?

    ... if i am not wrong, i understand that if a DRBD resourse is created by the PVE GUI, we can have only a DRBD resource, and into this resource several logical volumes, where each logical volume is equivalent to a virtual harddisk.

    On the other hand, if i am right, as i like to use once a week the command "drbdadm verify ...", and as PVE create a large DRBD resource, the verification may take a long time, then, for this reason, i would like to create manually the DRBD resource with only the size of the virtual disk that i will need to use. Of this mode, the verification of the DRBD resource will not take a long time.

    Anyway, is part of my strategy create at least two DRBD resources per each peer of PVE nodes (in different partitions), and each PVE node save the virtual disks in his respective DRBD resource. Of this mode, i will can solve easily any problem that come.

    Oh, OK, it is good to know it.

    Many thanks for the clarification, i guess that with my strategy, i will need to create the DRBD resources to hand and not by PVE GUI.... right?

    Oh... I like

    I hope that you are wrong..... :-(

    For this reason i prefer to use NICs of 10 gbps Intel or broadcom for the replcation of the DRBD resources, with dual NICs in a board i can configure bonding in mode balance-rr, with jumbo frames, and get a real speed of 20 gbps for these connections, all this with a configuration very easy to do.
     
    #14 cesarpk, Apr 13, 2016
    Last edited: Apr 13, 2016
  15. cesarpk

    cesarpk Member

    Joined:
    Mar 31, 2012
    Messages:
    770
    Likes Received:
    2
    Oh, ok, sorry, It was just a question for my knowledge.

    Anyway, thank you very much for your kind attention.
     
  16. e100

    e100 Active Member
    Proxmox VE Subscriber

    Joined:
    Nov 6, 2010
    Messages:
    1,227
    Likes Received:
    23
    No
    You would create the drbdmanage clusters then everything else you would do in the GUI.

    Proxmox creates a drbdmanage resource with a single volume for each individual virtual disk you create.
     
  17. cesarpk

    cesarpk Member

    Joined:
    Mar 31, 2012
    Messages:
    770
    Likes Received:
    2
    Oh, Ok, thanks, i believe that this will be good for my strategy.

    Then, will be the DRBD resource the same size that the volume that will be used for the virtual disk?, if correct, when i resize the virtual disk by the PVE GUI, also will be changed the size of the DRBD resource, right? (and all in hot.... ? ? ?)

    As far as I know, isn't possible change the size of a DRBD resource in hot, at least in DRBD 8.x versions, right?

    Best regards
     
  18. e100

    e100 Active Member
    Proxmox VE Subscriber

    Joined:
    Nov 6, 2010
    Messages:
    1,227
    Likes Received:
    23
    I'm not sure if its working as I've not tried but I think in DRBD9 with drbdmanage it can be done online.
    limited info but see:
    man drbdmanage-resize-volume
     
  19. cesarpk

    cesarpk Member

    Joined:
    Mar 31, 2012
    Messages:
    770
    Likes Received:
    2
    With my practices in DRBD 8.x, it is possible when all LVs (logical volumes) are unmounted, always that LVM be in top of DRBD.

    But, i believe using the PVE GUI for create a DRBD resource, "DRBD will be in top of LVM", that if it is correct ( according to the DRBD9 Proxmox Wiki introduction... http://pve.proxmox.com/wiki/DRBD9#Introduction ), then:

    1- The DRBD resource will has all the size of the disk รณ partition.
    2- If the previous point is correct, by the PVE GUI will be very easy resize a LV (logical volume), ie, the virtual disk.
    3- The disadvantage of this configuration created by the PVE GUI is that when you need to do a verification of DRBD resources, it will take a long time.
    4- If we have several scheduled tasks, as:

    a) The backups of our VMs that minimally should be done every day (until now without options of incremental or differential).
    b) The verification of all DRBD resources (at least once by week if the information is critical).
    c) The hardware verification of blocks of disk in the RAIDs created (at least once by week if the information is critical, besides, the RAIDs controllers as LSI , Adaptec, etc. has this options, and his manufacturers recommend their use periodically).

    As all these verification tasks require much time, i think that the PVE GUI for the creation of DRBD resources must be modified, ie, that each DRBD resource has the same size of the virtual disk, of this mode, the verification of all DRBD resources will take less time to complete.

    The disadvantage of my latest point, is that it will be impossible resize a virtual disk in hot, because the LV (logical volume) must be unmounted (talking within the context of DRBD), but i preffer loss this function for save time in the verifications (as always i can resize the DBD resource in offline mode, it isn't problem for me).

    I will be asking to the developers team if them can change the form in that the DRBD resource are created.

    Best regards
     
    #19 cesarpk, Apr 15, 2016
    Last edited: Apr 15, 2016
  20. cesarpk

    cesarpk Member

    Joined:
    Mar 31, 2012
    Messages:
    770
    Likes Received:
    2
    Recently i have read that with drbdmanage will be possible to do grow a DRBD resource when the developers finish the code, and only in a role secondary, although at the moment the option is enabled but not working (read in a forum with date of dec/2015), then, i don't know if now really works such function.

    But if such function only works in secondary roles, it can't be used while the VM is turned on.... :-(

    Best regards
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice