I'm working at a MSP which supports some 2 Node directattached storage VMware Clusters. I'm a Proxmox enthusiast and trying to lead our Business to turn away from Broadcom.
An older of my work mates stated following the docs our normal architecture (2 Server and MSA) is not feasible with Proxmox. SOOO challenge accepted and i rolled out a little PoC and I like to share my Playbook with you. I know that a 2 Node Cluster for HA could cause Split Brains but if Storage and VM Loads flow through the same Bond there are quiet big problems allready out of scope. - I hope this could help someone - enjoy!
An older of my work mates stated following the docs our normal architecture (2 Server and MSA) is not feasible with Proxmox. SOOO challenge accepted and i rolled out a little PoC and I like to share my Playbook with you. I know that a 2 Node Cluster for HA could cause Split Brains but if Storage and VM Loads flow through the same Bond there are quiet big problems allready out of scope. - I hope this could help someone - enjoy!
Code:
# Playbook PoC 2 Node HA PVE Cluster with shared iSCSI with GFS2 and Corosnyc with "TWO-NODE"
# By tscret, 06.01.2025
### Information Sources
=> https://forum.proxmox.com/threads/pve-7-x-cluster-setup-of-shared-lvm-lv-with-msa2040-sas-partial-howto.57536/
=> https://manpages.debian.org/unstable/corosync/votequorum.5.en.html
# Architecture:
# 2 Node as Nested Virtualisation
# Syno DS1515+ as iSCSI Portal with two LUNS - DSM 6
# To proof => Two Node Cluster with an DAC or iSCSI Storage with Loadbalancing and HA - able for Thinprovisioning and Snapshot (QCOW2)
# Define LUN on Syno
> iSCSI Manager / Target
>> <Create> Name: poc | IQN: iqn.2000-01.com.synology:LAB-NAS01.Target-1.ae07f0977a - <NEXT> 0 Map later - <NEXT> - <Apply>
> iSCSI Manager / LUN
>> <Create> Name: lun-gfs2 | Location: Volume 1 | Total capacity: 500 GB | Space Allocation: Thin Provisioning - <NEXT> Map Later - <NEXT> - <Apply>
>> Select lun-gfs2 - <Action> <Edit> / Mapping - Select poc <ok>
>> !!!! Enable Multi Sessions Target
#Setup two VM Nodes (Nested Virtualisation)
# 6 vCPU (HOST) - 16 GB RAM - 64 GB Disk - 1 NIC on vlan120 - ISO Installer PVE 8.3 - TAG plb_ignore_vm
> Asterix VMID: 991 on LAB-PVE02 10.144.21.238/23
> Obelix VMID: 992 on LAB-PVE01 10.144.21.239/23
# Setup both host on searchdomain test.lan
# Root Password: <CHANGEME>
# Post Install Proxmox Helper Script
> Change Repos and Update
$ apt update && apt upgrade -y
$ reboot now
# Install openvswitch-switch (I just prefer OVS over Linux Bridge)
$ apt install openvswitch-switch -y
> Create Cluster (Test)
> Join Obelix to Cluster
> Create iSCSI Storage for GFS2
# Change iSCSI to node.startup automatic
$ nano /etc/iscsi/iscsid.conf # change node.startup to automatic
$ service iscsid restart
# Determine Disks
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 64G 0 disk
├─sda1 8:1 0 1007K 0 part
├─sda2 8:2 0 512M 0 part
└─sda3 8:3 0 63.5G 0 part
├─pve-swap 252:0 0 7.9G 0 lvm [SWAP]
├─pve-root 252:1 0 25.9G 0 lvm /
├─pve-data_tmeta 252:2 0 1G 0 lvm
│ └─pve-data 252:4 0 19.8G 0 lvm
└─pve-data_tdata 252:3 0 19.8G 0 lvm
└─pve-data 252:4 0 19.8G 0 lvm
sdb 8:16 0 1G 0 disk
sdc 8:32 0 500G 0 disk
sr0 11:0 1 1.3G 0 rom
# Edit /etc/pve/corosync.conf
$ cp /etc/pve/corosync.conf /etc/pve/corosync.conf.new
$ nano /etc/pve/corosync.conf.new
Edit quroum section:
quorum {
provider: corosync_votequorum
two_node: 1
}
>>>>>>>>>>>>
$ cp /etc/pve/corosync.conf /etc/pve/corosync.bak
$ mv /etc/pve/corosync.conf.new /etc/pve/corosync.conf
$ systemctl status corosync
# Check Quorum
$ pvecm status
Cluster information
-------------------
Name: Test
Config Version: 2
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Fri Jan 3 22:45:32 2025
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000001
Ring ID: 1.1b
Quorate: Yes
Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 2
Quorum: 1
Flags: 2Node Quorate WaitForAll
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.144.21.238 (local)
0x00000002 1 10.144.21.239
# Make GFS2 Filesystem and Mount
>>>>>>>>>>> Bashscript
hosts="10.144.21.238 10.144.21.239"
for host in $hosts; do ssh $host 'apt install dlm-controld gfs2-utils -y'; done
for host in $hosts; do ssh $host 'mkdir /etc/dlm; echo protocol=tcp >> /etc/dlm/dlm.conf; echo enable_fencing=0 >> /etc/dlm/dlm.conf; systemctl restart dlm'; done
#Copy Snippet and run
read -p "Bitte gib den Clustername ein (Standard: Datacenter): " clustername
clustername="${clustername:-Datacenter}"
read -p "Bitte gib den Mount-Pfad ein (Standard: /mnt/pve/iscsi-gfs2): " mnt
mnt="${mnt:-/mnt/pve/iscsi-gfs2}"
read -p "Bitte gib das LUN Device an (Standard: /dev/sdb): " lun
lun="${lun:-/dev/sdb}"
num_hosts=$(echo $hosts | wc -w)
mkfs.gfs2 -t $clustername:iscsi-gfs2 -j $num_hosts -J 128 $lun
uuid=$(blkid $lun | sed -n 's/.*UUID=\"\([^\"]*\)\".*/\1/p')
cat > "/etc/systemd/system/gfs2mount.service" <<EOT
[Unit]
Description=Mount GFS2 Service
After=iscsid.service dlm.service network.target iscsi.service multi-user.target
Requires=iscsid.service dlm.service iscsi.service
[Service]
Type=oneshot
ExecStartPre=/usr/bin/bash -c 'while ! lsblk -o NAME,UUID | grep -q "$uuid"; do sleep 5; done'
ExecStart=/usr/bin/mount -t gfs2 /dev/disk/by-uuid/$uuid $mnt
ExecStop=/usr/bin/umount $mnt
RemainAfterExit=true
[Install]
WantedBy=multi-user.target
EOT
for host in $hosts; do scp "/etc/systemd/system/gfs2mount.service" $host:/etc/systemd/system/; ssh $host "mkdir -p $mnt; systemctl daemon-reload; systemctl enable gfs2mount.service; systemctl start gfs2mount.service"; done
cat >> /etc/pve/storage.cfg << EOT
dir: GFS2
path $mnt
content rootdir,images
prune-backups keep-all=1
shared 1
EOT
>>>>>>>>>>>>>>>>>>> End of Bashscript
# Lead out Tests
(x) Mount at boot beforde HA Start
(x) Read / Write into GFS2
(x) HA Failover by Powerouttage of an Node
(x) HA Recover after both Nodes Online