DRBD problems after failed upgrade 4.4 -> 5.0

deegan · Jul 9, 2017

So to start of this. I have a two node proxmox cluster at home running on a couple of HP G7 servers, PX1 and PX2. Both have identical setups, where two SAS drives for the system are in raid1 and then 5 more drives are in raid5. All this is configured on the RAID controller, so nothing wierd going on.

When I configured this cluster I followed the directions on the old DRBD9 page on the proxmox wiki, it is since then not available but a cached version is: https://web.archive.org/web/20160118181917/https://pve.proxmox.com/wiki/DRBD9. I have not deviated from that setup at all.

So my problems came when I upgraded to 5.0 a bit hastily, there where no real mentions of drbd changes so I asumed that it would not be a problem. I upgraded PX1, rebooted and pvecm was showing all good signs. I tried to migrate the machines over to that system from PX2 but then noticed that drbd was not showing up on PX1. I tried to fix it by doing the drbdmanage init but to no help. Considering I didn't want to have a broken system I decided to just reinstall 4.4 on PX1 and get it back online with DRBD. However this was easier said than done.

PX1 = 192.168.2.8
PX2 = 192.168.2.9

Code:

root@px2:~# drbdmanage init 192.168.2.8
WARNING:root:Could not read configuration file '/etc/drbdmanaged.cfg'

You are going to initialize a new drbdmanage cluster.
CAUTION! Note that:
  * Any previous drbdmanage cluster information may be removed
  * Any remaining resources managed by a previous drbdmanage installation
    that still exist on this system will no longer be managed by drbdmanage

Confirm:

  yes/no: yes
Empty drbdmanage control volume initialized on '/dev/drbd0'.
Empty drbdmanage control volume initialized on '/dev/drbd1'.
Operation completed successfully
root@px2:~# drbdmanage add-node px1 192.168.2.8
WARNING:root:Could not read configuration file '/etc/drbdmanaged.cfg'
Operation completed successfully
Operation completed successfully

Executing join command using ssh.
IMPORTANT: The output you see comes from px1
IMPORTANT: Your input is executed on px1
WARNING:root:Could not read configuration file '/etc/drbdmanaged.cfg'
You are going to join an existing drbdmanage cluster.
CAUTION! Note that:
  * Any previous drbdmanage cluster information may be removed
  * Any remaining resources managed by a previous drbdmanage installation
    that still exist on this system will no longer be managed by drbdmanage

Confirm:

  yes/no: yes

Error: Cannot connect to the drbdmanaged process using DBus
The DBus subsystem returned the following error description:
org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
Error: Attempt to execute the join command remotelyfailed

Join command for node px1:
drbdmanage join -p 6999 192.168.2.8 1 px2 192.168.2.8 0 UahfZH2v8ZdpLZzEWWqU

I've seen this error on several forums but no concrete explanation as to what is wrong, only that "D-bus is the problem.".

Code:

root@px2:~# drbdadm status .drbdctrl
.drbdctrl role:Secondary
  volume:0 disk:UpToDate
  volume:1 disk:UpToDate
  px1 connection:StandAlone

root@px1:~# drbdadm status .drbdctrl
.drbdctrl role:Secondary
  volume:0 disk:Inconsistent
  volume:1 disk:Inconsistent
  px2 connection:Connecting

So this shows that apparantly they are not connected, how wierd. If i do "service drbd start" on both nodes this will change.

Code:

root@px2:~# drbdadm status .drbdctrl
.drbdctrl role:Secondary
  volume:0 disk:UpToDate
  volume:1 disk:UpToDate
  px1 role:Secondary
    volume:0 peer-disk:UpToDate
    volume:1 peer-disk:UpToDate

root@px1:~# drbdadm status
.drbdctrl role:Secondary
  volume:0 disk:UpToDate
  volume:1 disk:UpToDate
  px2 role:Secondary
    volume:0 peer-disk:UpToDate
    volume:1 peer-disk:UpToDate

Example of a VM:

Code:

root@px2:~# drbdadm status
....
vm-112-disk-1 role:Secondary
  disk:UpToDate
  px1 connection:Connecting
....
root@px2:~# drbdadm status vm-112-disk-1
'vm-112-disk-1' not defined in your config (for this host).

So I'm not sure what to do next. Obviously the nodes are connected, but for whatever reason the lvm drdbpool is not showing up, or just not the content being synced. I'm a bit confused about that, and looking at drbdmanage list-nodes we see this.

Code:

root@px2:~# drbdmanage list-nodes
WARNING:root:Could not read configuration file '/etc/drbdmanaged.cfg'
+------------------------------------------------------------------------------------------------------------+
| Name | Pool Size | Pool Free |                                                                     | State |
|------------------------------------------------------------------------------------------------------------|
| px1  |   unknown |   unknown |                                                                     |    ok |
| px2  |    409600 |    187023 |                                                                     |    ok |
+------------------------------------------------------------------------------------------------------------+

Code:

root@px1:~# drbdmanage list-nodes
WARNING:root:Could not read configuration file '/etc/drbdmanaged.cfg'
+------------------------------------------------------------------------------------------------------------+
| Name | Pool Size | Pool Free |                                                                     | State |
|------------------------------------------------------------------------------------------------------------|
| px1  |   unknown |   unknown |                                                                     |    ok |
| px2  |    409600 |    187023 |                                                                     |    ok |
+-----------------------------------------------------------------------------------------------------------

this has been peddling a bit between being "offline" and just "unknown". I'm content with it being online, but don't like that the drbdpool lvm is not showing up. My guess here is that the .drbdctrl needs to be updated with whatever info px2 has about the partition? No idea how to force that though.

Moving on to the effects this has on the Proxmox virtualization parts. All my VM's are happily working and writing data as normal. However I can not perform any new backups, install new VMs or stop/start VMs. If i reboot a VM it will just reboot, but if i shutdown and start it Proxmox seems to be confused and moves the machine over to PX1 and I can't do much more with it.

Here's the output from a backup:

Code:

INFO: starting new backup job: vzdump 111 --compress lzo --storage samba --node px2 --remove 0 --mode snapshot
INFO: Starting Backup of VM 111 (qemu)
INFO: status = running
INFO: update VM 111: -lock backup
INFO: VM Name: mail
INFO: include disk 'scsi0' 'drbd1:vm-111-disk-1' 100G
ERROR: Backup of VM 111 failed - drbd error: Object not found
INFO: Backup job finished with errors
TASK ERROR: job errors

Also it appears as if the volumes are no longer visible to DRBD. This is from PX2 which is where all my VMs are running right now.

Code:

root@px2:~# drbdmanage list-volumes
WARNING:root:Could not read configuration file '/etc/drbdmanaged.cfg'
No resources defined
root@px2:~# drbdmanage list-resources
WARNING:root:Could not read configuration file '/etc/drbdmanaged.cfg'
No resources defined

Proxmox shows drbd1 on both nodes because /etc/pve/storage.conf is still the same, but it reports the wrong info saying its 400gb/400gb used.

So where do I start in this mess? I'm contempt at rebooting PX2 because if something goes wrong there my only resort are 2-3 day old backups from before I did the inital upgrade, one server is a personal mailserver so I could of course imapsync everything out of there but I'd rather be safer than sorry.

Finaly here's some LVM output from both servers.

PX1

Code:

root@px1:~# pvscan ; lvscan
  PV /dev/sdb1   VG drbdpool   lvm2 [410.10 GiB / 9.80 GiB free]
  PV /dev/sda3   VG pve        lvm2 [68.08 GiB / 8.43 GiB free]
  Total: 2 [478.18 GiB] / in use: 2 [478.18 GiB] / in no VG: 0 [0   ]
  ACTIVE            '/dev/drbdpool/lvol0' [104.00 MiB] inherit
  ACTIVE            '/dev/drbdpool/drbdthinpool' [400.00 GiB] inherit
  ACTIVE            '/dev/drbdpool/vm-100-disk-1_00' [32.01 GiB] inherit
  ACTIVE            '/dev/drbdpool/vm-101-disk-1_00' [32.01 GiB] inherit
  ACTIVE            '/dev/drbdpool/vm-102-disk-1_00' [32.01 GiB] inherit
  ACTIVE            '/dev/drbdpool/vm-103-disk-1_00' [32.01 GiB] inherit
  ACTIVE            '/dev/drbdpool/vm-104-disk-1_00' [64.02 GiB] inherit
  ACTIVE            '/dev/drbdpool/vm-105-disk-1_00' [32.01 GiB] inherit
  ACTIVE            '/dev/drbdpool/vm-108-disk-1_00' [32.01 GiB] inherit
  ACTIVE            '/dev/drbdpool/vm-110-disk-1_00' [32.01 GiB] inherit
  ACTIVE            '/dev/drbdpool/vm-111-disk-1_00' [100.02 GiB] inherit
  ACTIVE            '/dev/drbdpool/vm-112-disk-1_00' [32.01 GiB] inherit
  ACTIVE            '/dev/drbdpool/vm-109-disk-1_00' [32.01 GiB] inherit
  ACTIVE            '/dev/drbdpool/vm-113-disk-2_00' [32.01 GiB] inherit
  ACTIVE            '/dev/drbdpool/vm-106-disk-1_00' [32.01 GiB] inherit
  ACTIVE            '/dev/drbdpool/vm-107-disk-1_00' [32.01 GiB] inherit
  ACTIVE            '/dev/drbdpool/vm-115-disk-1_00' [32.01 GiB] inherit
  ACTIVE            '/dev/drbdpool/.drbdctrl_0' [4.00 MiB] inherit
  ACTIVE            '/dev/drbdpool/.drbdctrl_1' [4.00 MiB] inherit
  ACTIVE            '/dev/pve/swap' [8.00 GiB] inherit
  ACTIVE            '/dev/pve/root' [17.00 GiB] inherit
  ACTIVE            '/dev/pve/data' [34.58 GiB] inherit

PX2

Code:

root@px2:~# pvscan ; lvscan
  PV /dev/sdb1   VG drbdpool   lvm2 [410.10 GiB / 9.90 GiB free]
  PV /dev/sda3   VG pve        lvm2 [68.21 GiB / 8.43 GiB free]
  Total: 2 [478.31 GiB] / in use: 2 [478.31 GiB] / in no VG: 0 [0   ]
  ACTIVE            '/dev/drbdpool/drbdthinpool' [400.00 GiB] inherit
  ACTIVE            '/dev/drbdpool/vm-100-disk-1_00' [32.01 GiB] inherit
  ACTIVE            '/dev/drbdpool/vm-101-disk-1_00' [32.01 GiB] inherit
  ACTIVE            '/dev/drbdpool/vm-102-disk-1_00' [32.01 GiB] inherit
  ACTIVE            '/dev/drbdpool/vm-103-disk-1_00' [32.01 GiB] inherit
  ACTIVE            '/dev/drbdpool/vm-104-disk-1_00' [64.02 GiB] inherit
  ACTIVE            '/dev/drbdpool/vm-105-disk-1_00' [32.01 GiB] inherit
  ACTIVE            '/dev/drbdpool/vm-108-disk-1_00' [32.01 GiB] inherit
  ACTIVE            '/dev/drbdpool/vm-110-disk-1_00' [32.01 GiB] inherit
  ACTIVE            '/dev/drbdpool/vm-111-disk-1_00' [100.02 GiB] inherit
  ACTIVE            '/dev/drbdpool/vm-112-disk-1_00' [32.01 GiB] inherit
  ACTIVE            '/dev/drbdpool/vm-109-disk-1_00' [32.01 GiB] inherit
  ACTIVE            '/dev/drbdpool/vm-113-disk-2_00' [32.01 GiB] inherit
  ACTIVE            '/dev/drbdpool/vm-106-disk-1_00' [32.01 GiB] inherit
  ACTIVE            '/dev/drbdpool/vm-107-disk-1_00' [32.01 GiB] inherit
  ACTIVE            '/dev/drbdpool/vm-115-disk-1_00' [32.01 GiB] inherit
  ACTIVE            '/dev/drbdpool/.drbdctrl_0' [4.00 MiB] inherit
  ACTIVE            '/dev/drbdpool/.drbdctrl_1' [4.00 MiB] inherit
  ACTIVE            '/dev/pve/swap' [8.00 GiB] inherit
  ACTIVE            '/dev/pve/root' [17.00 GiB] inherit
  ACTIVE            '/dev/pve/data' [34.71 GiB] inherit

--- edit---

I forgot to include the configuration file drbdctrl.res

Code:

root@px2:~# cat /etc/drbd.d/drbdctrl.res
resource .drbdctrl {
    net {
        cram-hmac-alg       sha256;
        shared-secret       "UahfZH2v8ZdpLZzEWWqU";
        allow-two-primaries no;
    }
    volume 0 {
        device      minor 0;
        disk        /dev/drbdpool/.drbdctrl_0;
        meta-disk   internal;
    }
    volume 1 {
        device      minor 1;
        disk        /dev/drbdpool/.drbdctrl_1;
        meta-disk   internal;
    }
    on px2 {
        node-id     0;
        address     ipv4 192.168.2.9:6999;
    }
    on px1 {
        node-id     1;
        address     ipv4 192.168.2.8:6999;
    }
    connection-mesh {
        hosts px2 px1;
        net {
            protocol C;
        }
    }
}

wolfgang · Jul 10, 2017

deegan said:
there where no real mentions of drbd changes so I asumed that it would not be a problem.

As you noticed the wiki was delete. This is because DRBD is not supported by Proxmox anymore. Linbit self take care of the repository and support.

This was discussed many times and also posted on this forum.

Search

Search

DRBD problems after failed upgrade 4.4 -> 5.0

deegan

New Member

wolfgang

Proxmox Retired Staff

We value your privacy