So to start of this. I have a two node proxmox cluster at home running on a couple of HP G7 servers, PX1 and PX2. Both have identical setups, where two SAS drives for the system are in raid1 and then 5 more drives are in raid5. All this is configured on the RAID controller, so nothing wierd going on.
When I configured this cluster I followed the directions on the old DRBD9 page on the proxmox wiki, it is since then not available but a cached version is: https://web.archive.org/web/20160118181917/https://pve.proxmox.com/wiki/DRBD9. I have not deviated from that setup at all.
So my problems came when I upgraded to 5.0 a bit hastily, there where no real mentions of drbd changes so I asumed that it would not be a problem. I upgraded PX1, rebooted and pvecm was showing all good signs. I tried to migrate the machines over to that system from PX2 but then noticed that drbd was not showing up on PX1. I tried to fix it by doing the drbdmanage init but to no help. Considering I didn't want to have a broken system I decided to just reinstall 4.4 on PX1 and get it back online with DRBD. However this was easier said than done.
PX1 = 192.168.2.8
PX2 = 192.168.2.9
I've seen this error on several forums but no concrete explanation as to what is wrong, only that "D-bus is the problem.".
So this shows that apparantly they are not connected, how wierd. If i do "service drbd start" on both nodes this will change.
Example of a VM:
So I'm not sure what to do next. Obviously the nodes are connected, but for whatever reason the lvm drdbpool is not showing up, or just not the content being synced. I'm a bit confused about that, and looking at drbdmanage list-nodes we see this.
this has been peddling a bit between being "offline" and just "unknown". I'm content with it being online, but don't like that the drbdpool lvm is not showing up. My guess here is that the .drbdctrl needs to be updated with whatever info px2 has about the partition? No idea how to force that though.
Moving on to the effects this has on the Proxmox virtualization parts. All my VM's are happily working and writing data as normal. However I can not perform any new backups, install new VMs or stop/start VMs. If i reboot a VM it will just reboot, but if i shutdown and start it Proxmox seems to be confused and moves the machine over to PX1 and I can't do much more with it.
Here's the output from a backup:
Also it appears as if the volumes are no longer visible to DRBD. This is from PX2 which is where all my VMs are running right now.
Proxmox shows drbd1 on both nodes because /etc/pve/storage.conf is still the same, but it reports the wrong info saying its 400gb/400gb used.
So where do I start in this mess? I'm contempt at rebooting PX2 because if something goes wrong there my only resort are 2-3 day old backups from before I did the inital upgrade, one server is a personal mailserver so I could of course imapsync everything out of there but I'd rather be safer than sorry.
Finaly here's some LVM output from both servers.
PX1
PX2
--- edit---
I forgot to include the configuration file drbdctrl.res
When I configured this cluster I followed the directions on the old DRBD9 page on the proxmox wiki, it is since then not available but a cached version is: https://web.archive.org/web/20160118181917/https://pve.proxmox.com/wiki/DRBD9. I have not deviated from that setup at all.
So my problems came when I upgraded to 5.0 a bit hastily, there where no real mentions of drbd changes so I asumed that it would not be a problem. I upgraded PX1, rebooted and pvecm was showing all good signs. I tried to migrate the machines over to that system from PX2 but then noticed that drbd was not showing up on PX1. I tried to fix it by doing the drbdmanage init but to no help. Considering I didn't want to have a broken system I decided to just reinstall 4.4 on PX1 and get it back online with DRBD. However this was easier said than done.
PX1 = 192.168.2.8
PX2 = 192.168.2.9
Code:
root@px2:~# drbdmanage init 192.168.2.8
WARNING:root:Could not read configuration file '/etc/drbdmanaged.cfg'
You are going to initialize a new drbdmanage cluster.
CAUTION! Note that:
* Any previous drbdmanage cluster information may be removed
* Any remaining resources managed by a previous drbdmanage installation
that still exist on this system will no longer be managed by drbdmanage
Confirm:
yes/no: yes
Empty drbdmanage control volume initialized on '/dev/drbd0'.
Empty drbdmanage control volume initialized on '/dev/drbd1'.
Operation completed successfully
root@px2:~# drbdmanage add-node px1 192.168.2.8
WARNING:root:Could not read configuration file '/etc/drbdmanaged.cfg'
Operation completed successfully
Operation completed successfully
Executing join command using ssh.
IMPORTANT: The output you see comes from px1
IMPORTANT: Your input is executed on px1
WARNING:root:Could not read configuration file '/etc/drbdmanaged.cfg'
You are going to join an existing drbdmanage cluster.
CAUTION! Note that:
* Any previous drbdmanage cluster information may be removed
* Any remaining resources managed by a previous drbdmanage installation
that still exist on this system will no longer be managed by drbdmanage
Confirm:
yes/no: yes
Error: Cannot connect to the drbdmanaged process using DBus
The DBus subsystem returned the following error description:
org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
Error: Attempt to execute the join command remotelyfailed
Join command for node px1:
drbdmanage join -p 6999 192.168.2.8 1 px2 192.168.2.8 0 UahfZH2v8ZdpLZzEWWqU
I've seen this error on several forums but no concrete explanation as to what is wrong, only that "D-bus is the problem.".
Code:
root@px2:~# drbdadm status .drbdctrl
.drbdctrl role:Secondary
volume:0 disk:UpToDate
volume:1 disk:UpToDate
px1 connection:StandAlone
root@px1:~# drbdadm status .drbdctrl
.drbdctrl role:Secondary
volume:0 disk:Inconsistent
volume:1 disk:Inconsistent
px2 connection:Connecting
So this shows that apparantly they are not connected, how wierd. If i do "service drbd start" on both nodes this will change.
Code:
root@px2:~# drbdadm status .drbdctrl
.drbdctrl role:Secondary
volume:0 disk:UpToDate
volume:1 disk:UpToDate
px1 role:Secondary
volume:0 peer-disk:UpToDate
volume:1 peer-disk:UpToDate
root@px1:~# drbdadm status
.drbdctrl role:Secondary
volume:0 disk:UpToDate
volume:1 disk:UpToDate
px2 role:Secondary
volume:0 peer-disk:UpToDate
volume:1 peer-disk:UpToDate
Example of a VM:
Code:
root@px2:~# drbdadm status
....
vm-112-disk-1 role:Secondary
disk:UpToDate
px1 connection:Connecting
....
root@px2:~# drbdadm status vm-112-disk-1
'vm-112-disk-1' not defined in your config (for this host).
So I'm not sure what to do next. Obviously the nodes are connected, but for whatever reason the lvm drdbpool is not showing up, or just not the content being synced. I'm a bit confused about that, and looking at drbdmanage list-nodes we see this.
Code:
root@px2:~# drbdmanage list-nodes
WARNING:root:Could not read configuration file '/etc/drbdmanaged.cfg'
+------------------------------------------------------------------------------------------------------------+
| Name | Pool Size | Pool Free | | State |
|------------------------------------------------------------------------------------------------------------|
| px1 | unknown | unknown | | ok |
| px2 | 409600 | 187023 | | ok |
+------------------------------------------------------------------------------------------------------------+
Code:
root@px1:~# drbdmanage list-nodes
WARNING:root:Could not read configuration file '/etc/drbdmanaged.cfg'
+------------------------------------------------------------------------------------------------------------+
| Name | Pool Size | Pool Free | | State |
|------------------------------------------------------------------------------------------------------------|
| px1 | unknown | unknown | | ok |
| px2 | 409600 | 187023 | | ok |
+-----------------------------------------------------------------------------------------------------------
this has been peddling a bit between being "offline" and just "unknown". I'm content with it being online, but don't like that the drbdpool lvm is not showing up. My guess here is that the .drbdctrl needs to be updated with whatever info px2 has about the partition? No idea how to force that though.
Moving on to the effects this has on the Proxmox virtualization parts. All my VM's are happily working and writing data as normal. However I can not perform any new backups, install new VMs or stop/start VMs. If i reboot a VM it will just reboot, but if i shutdown and start it Proxmox seems to be confused and moves the machine over to PX1 and I can't do much more with it.
Here's the output from a backup:
Code:
INFO: starting new backup job: vzdump 111 --compress lzo --storage samba --node px2 --remove 0 --mode snapshot
INFO: Starting Backup of VM 111 (qemu)
INFO: status = running
INFO: update VM 111: -lock backup
INFO: VM Name: mail
INFO: include disk 'scsi0' 'drbd1:vm-111-disk-1' 100G
ERROR: Backup of VM 111 failed - drbd error: Object not found
INFO: Backup job finished with errors
TASK ERROR: job errors
Also it appears as if the volumes are no longer visible to DRBD. This is from PX2 which is where all my VMs are running right now.
Code:
root@px2:~# drbdmanage list-volumes
WARNING:root:Could not read configuration file '/etc/drbdmanaged.cfg'
No resources defined
root@px2:~# drbdmanage list-resources
WARNING:root:Could not read configuration file '/etc/drbdmanaged.cfg'
No resources defined
Proxmox shows drbd1 on both nodes because /etc/pve/storage.conf is still the same, but it reports the wrong info saying its 400gb/400gb used.
So where do I start in this mess? I'm contempt at rebooting PX2 because if something goes wrong there my only resort are 2-3 day old backups from before I did the inital upgrade, one server is a personal mailserver so I could of course imapsync everything out of there but I'd rather be safer than sorry.
Finaly here's some LVM output from both servers.
PX1
Code:
root@px1:~# pvscan ; lvscan
PV /dev/sdb1 VG drbdpool lvm2 [410.10 GiB / 9.80 GiB free]
PV /dev/sda3 VG pve lvm2 [68.08 GiB / 8.43 GiB free]
Total: 2 [478.18 GiB] / in use: 2 [478.18 GiB] / in no VG: 0 [0 ]
ACTIVE '/dev/drbdpool/lvol0' [104.00 MiB] inherit
ACTIVE '/dev/drbdpool/drbdthinpool' [400.00 GiB] inherit
ACTIVE '/dev/drbdpool/vm-100-disk-1_00' [32.01 GiB] inherit
ACTIVE '/dev/drbdpool/vm-101-disk-1_00' [32.01 GiB] inherit
ACTIVE '/dev/drbdpool/vm-102-disk-1_00' [32.01 GiB] inherit
ACTIVE '/dev/drbdpool/vm-103-disk-1_00' [32.01 GiB] inherit
ACTIVE '/dev/drbdpool/vm-104-disk-1_00' [64.02 GiB] inherit
ACTIVE '/dev/drbdpool/vm-105-disk-1_00' [32.01 GiB] inherit
ACTIVE '/dev/drbdpool/vm-108-disk-1_00' [32.01 GiB] inherit
ACTIVE '/dev/drbdpool/vm-110-disk-1_00' [32.01 GiB] inherit
ACTIVE '/dev/drbdpool/vm-111-disk-1_00' [100.02 GiB] inherit
ACTIVE '/dev/drbdpool/vm-112-disk-1_00' [32.01 GiB] inherit
ACTIVE '/dev/drbdpool/vm-109-disk-1_00' [32.01 GiB] inherit
ACTIVE '/dev/drbdpool/vm-113-disk-2_00' [32.01 GiB] inherit
ACTIVE '/dev/drbdpool/vm-106-disk-1_00' [32.01 GiB] inherit
ACTIVE '/dev/drbdpool/vm-107-disk-1_00' [32.01 GiB] inherit
ACTIVE '/dev/drbdpool/vm-115-disk-1_00' [32.01 GiB] inherit
ACTIVE '/dev/drbdpool/.drbdctrl_0' [4.00 MiB] inherit
ACTIVE '/dev/drbdpool/.drbdctrl_1' [4.00 MiB] inherit
ACTIVE '/dev/pve/swap' [8.00 GiB] inherit
ACTIVE '/dev/pve/root' [17.00 GiB] inherit
ACTIVE '/dev/pve/data' [34.58 GiB] inherit
PX2
Code:
root@px2:~# pvscan ; lvscan
PV /dev/sdb1 VG drbdpool lvm2 [410.10 GiB / 9.90 GiB free]
PV /dev/sda3 VG pve lvm2 [68.21 GiB / 8.43 GiB free]
Total: 2 [478.31 GiB] / in use: 2 [478.31 GiB] / in no VG: 0 [0 ]
ACTIVE '/dev/drbdpool/drbdthinpool' [400.00 GiB] inherit
ACTIVE '/dev/drbdpool/vm-100-disk-1_00' [32.01 GiB] inherit
ACTIVE '/dev/drbdpool/vm-101-disk-1_00' [32.01 GiB] inherit
ACTIVE '/dev/drbdpool/vm-102-disk-1_00' [32.01 GiB] inherit
ACTIVE '/dev/drbdpool/vm-103-disk-1_00' [32.01 GiB] inherit
ACTIVE '/dev/drbdpool/vm-104-disk-1_00' [64.02 GiB] inherit
ACTIVE '/dev/drbdpool/vm-105-disk-1_00' [32.01 GiB] inherit
ACTIVE '/dev/drbdpool/vm-108-disk-1_00' [32.01 GiB] inherit
ACTIVE '/dev/drbdpool/vm-110-disk-1_00' [32.01 GiB] inherit
ACTIVE '/dev/drbdpool/vm-111-disk-1_00' [100.02 GiB] inherit
ACTIVE '/dev/drbdpool/vm-112-disk-1_00' [32.01 GiB] inherit
ACTIVE '/dev/drbdpool/vm-109-disk-1_00' [32.01 GiB] inherit
ACTIVE '/dev/drbdpool/vm-113-disk-2_00' [32.01 GiB] inherit
ACTIVE '/dev/drbdpool/vm-106-disk-1_00' [32.01 GiB] inherit
ACTIVE '/dev/drbdpool/vm-107-disk-1_00' [32.01 GiB] inherit
ACTIVE '/dev/drbdpool/vm-115-disk-1_00' [32.01 GiB] inherit
ACTIVE '/dev/drbdpool/.drbdctrl_0' [4.00 MiB] inherit
ACTIVE '/dev/drbdpool/.drbdctrl_1' [4.00 MiB] inherit
ACTIVE '/dev/pve/swap' [8.00 GiB] inherit
ACTIVE '/dev/pve/root' [17.00 GiB] inherit
ACTIVE '/dev/pve/data' [34.71 GiB] inherit
--- edit---
I forgot to include the configuration file drbdctrl.res
Code:
root@px2:~# cat /etc/drbd.d/drbdctrl.res
resource .drbdctrl {
net {
cram-hmac-alg sha256;
shared-secret "UahfZH2v8ZdpLZzEWWqU";
allow-two-primaries no;
}
volume 0 {
device minor 0;
disk /dev/drbdpool/.drbdctrl_0;
meta-disk internal;
}
volume 1 {
device minor 1;
disk /dev/drbdpool/.drbdctrl_1;
meta-disk internal;
}
on px2 {
node-id 0;
address ipv4 192.168.2.9:6999;
}
on px1 {
node-id 1;
address ipv4 192.168.2.8:6999;
}
connection-mesh {
hosts px2 px1;
net {
protocol C;
}
}
}
Last edited: