I've been trying to build a reliable HA cluster based on Proxmox 4.1. I have two identical servers. The first one should serve all the VMs and containers. In case the first server is offline for any reason, the second server should continue running the VMs until the first server is back.
Proxmox 4.1 is installed on both nodes:
- Both use a dedicated Ethernet port for networking
- Both use a separate 10GE ethernet Point-to-Point connection between the two servers for DRBD
- DRBD 8.4.x (kernel module is 9.0.0 as per /proc/drbd) to provide a real-time replicated volume available on both nodes for the VM disks
- LVM Volume Group is created on top of the DRBD device
DRBD synchronisation works so far. Both DRBD nodes are primary, configuration as per Proxmox Wiki (DRBD article).
I've created a VM on server1 and started it. To test if HA works, I've removed the ethernet cable. Shortly after, server1 reboots, but nothing else happens. The DRBD volume is now inconsistent on server2 and is synced automatically.
As per wiki article, I've installed a third node (a simple PC) with Proxmox 4.1 and added it into the cluster as well. When removing the ethernet cable on server1, it will reboot after a while. server2 seems to notice that server1 is down and displays the VM and seems to try to start it, but without success. Status is "error". lvdisplay on server2 states LV status "NOT available". What does that mean, and what can I do? May I use any other setup for the desired HA cluster?
Any recommendations are highly welcome. I'd like to update the wiki articles as soon as I've got a running setup, as most articles still cover Proxmox 3.x, and they cannot be applied to the current versions anymore.
Proxmox 4.1 is installed on both nodes:
- Both use a dedicated Ethernet port for networking
- Both use a separate 10GE ethernet Point-to-Point connection between the two servers for DRBD
- DRBD 8.4.x (kernel module is 9.0.0 as per /proc/drbd) to provide a real-time replicated volume available on both nodes for the VM disks
- LVM Volume Group is created on top of the DRBD device
DRBD synchronisation works so far. Both DRBD nodes are primary, configuration as per Proxmox Wiki (DRBD article).
I've created a VM on server1 and started it. To test if HA works, I've removed the ethernet cable. Shortly after, server1 reboots, but nothing else happens. The DRBD volume is now inconsistent on server2 and is synced automatically.
As per wiki article, I've installed a third node (a simple PC) with Proxmox 4.1 and added it into the cluster as well. When removing the ethernet cable on server1, it will reboot after a while. server2 seems to notice that server1 is down and displays the VM and seems to try to start it, but without success. Status is "error". lvdisplay on server2 states LV status "NOT available". What does that mean, and what can I do? May I use any other setup for the desired HA cluster?
Any recommendations are highly welcome. I'd like to update the wiki articles as soon as I've got a running setup, as most articles still cover Proxmox 3.x, and they cannot be applied to the current versions anymore.