[TUTORIAL] PVE 7.x Cluster Setup of shared LVM/LV with MSA2040 SAS [partial howto]

FrancisS · Apr 26, 2022

Hello,

With " Before=pve-guests.service" the node startup work fine.

But when I shutdown/restart the node, the HA VMs do not stop and the gfs2 filesystem cannot umount... (fuser say filesystem in use by kvm).

I tested the command "pvesh create /nodes/localhost/stopall" used by the "pve-guests.service" to stop, HA VM do not stop.

What is the service who start/stop HA VMs ?

I need to run "lvmshare.service" before the VMs startup (pve-guests.service) and the HA VMs startup.

Best regards.

Francis

FrancisS · Apr 27, 2022

Hello,

I changing "Before=pve-guests.service" by " Before=pve-ha-lrm.service pve-guests.service" all work fine I can start/stop the nodes.

[Unit]
Description=LVM locking LVs and mount LVs start and stop
Documentation=man:lvmlockd(8)
After=lvmlocks.service lvmlockd.service sanlock.service dlm.service
Before=pve-ha-lrm.service pve-guests.service

[Service]
Type=oneshot
RemainAfterExit=yes

# start lockspaces LVs and mount LVs
ExecStart=/usr/bin/bash -c "/usr/sbin/vgs --noheadings -o name -S vg_shared=yes | xargs /usr/sbin/lvchange -asy"
ExecStart=/usr/bin/bash -c "/usr/sbin/lvs --noheadings -o lv_path -S vg_shared=yes | xargs mount"

# stop lockspaces LVs after umount LVs
ExecStop=/usr/bin/bash -c "/usr/sbin/lvs --noheadings -o lv_path -S vg_shared=yes | xargs umount"
ExecStop=/usr/bin/bash -c "/usr/sbin/vgs --noheadings -o name -S vg_shared=yes | xargs /usr/sbin/lvchange -an"

[Install]
WantedBy=multi-user.target

Best regards.

Francis

DC-CA1 · Jan 28, 2023

@Glowsome questions for you

per your experiment , Can we have to link enable in proxmox that will not cause issue to DLM / GFS2 Services ?
i think i had read something in the past that corosync didnt supported 2 network simultanously

other question
if it cant , do you think the option for Qdevice can do the job as it will maintain the pvecm service alive ?

Glowsome · Jan 29, 2023

datacentx said:
@Glowsome questions for you

per your experiment , Can we have to link enable in proxmox that will not cause issue to DLM / GFS2 Services ?
i think i had read something in the past that corosync didnt supported 2 network simultanously

other question
if it cant , do you think the option for Qdevice can do the job as it will maintain the pvecm service alive ?

My setup is running with a single (well bonded) interface in active-backup config network.
I did not see a/the need to work with a separate network just for cluster information.

Glowsome · Jul 30, 2023

A little thing i encountered in my setup :
After a reboot (needed after upgrading PVE7 -> PVE8) i got errors from DLM, and it basically meant that it was getting blocked (sk_err 110/0 was seen on consoles).

Then i remembered i had tightened security by turning on firewalls on the individual nodes, and only allowing a strict set of services like 2 months ago.

Apparently when you turn on a/the firewall it will not kill already established connections, so everything functioned up untill i rebooted

So lesson learned .. IF you enable the firewall on a clusternode, make sure you allow the port DLM uses (tcp/21064) !!!

- Glowsome

floh8 · Jul 31, 2023

Dear @Glowsome,
why do you implement it so complicated? Would not the use of OCFS2 on the Block storage in combination with DIR Storage do the trick. Or do u need the speed gain from lvm?

Glowsome · Jul 31, 2023

floh8 said:
Dear @Glowsome,
why do you implement it so complicated? Would not the use of OCFS2 on the Block storage in combination with DIR Storage do the trick. Or do u need the speed gain from lvm?

This whole thing started as a project to get around limitations i encountered.
and as a whole, the usage of OCFS2 is oracle, and after the took over MySQL, i dread being locked in by using this.

Due to the nature it grew.. and grew ... and maybe yes its turned into a beast .. but still all in all i'm not after something like a speed-gain.
Concider it a proof of concept, and its still surviving over and over again fulfilling my needs since i stepped in using PVE 5.x
Latest is i migrated all 4 nodes to PVE 8.x.

- Glowsome

Glowsome · Aug 2, 2023

FrancisS said:
Hello,

About the "sleep 10" added int the unitfile "lvmlockd.service" you do not have to modify the file.

You can create a directory "/etc/systemd/lvmlockd.service.d/" and create a file "sleep.conf" (exemple) with the content

[Service]
ExecStartPre=/usr/bin/sleep 10

or you can create a directory "/etc/systemd/dlm.service.d/" and create a file "sleep.conf"

[Service]
ExecStartPost=/usr/bin/sleep 10

An update can overwrite the file "lvmlockd.service" or "dlm.service" no problem.

Best regards.

Francis

Hi, might be a late reply on this, but if i read correctly then dropping an addition in:

/etc/systemd/lvmlockd.service.d/

This means you are not adding, but actually overriding.
(for reference) see :
https://access.redhat.com/documenta...tion_assembly_working-with-systemd-unit-files

So the correct place to drop these additions (IMHO) would be in:

Code:

/etc/systemd/system/servicename.service.d/*

Where in my case, the additions like the sleep timings, aswell as the additional dependancies introduced like After=
Should be dropped there in contrary of your own suggested context.

We want to add conditions or , not override existing ones.

FrancisS · Sep 13, 2023

Hello Glowsome

Yes sorry the correct path for the file is "/etc/systemd/system/lvmlockd.service.d/".

Where in my case, the additions like the sleep timings, aswell as the additional dependancies introduced like After=
Should be dropped there in contrary of your own suggested context.

I do not understand.

Best regards.

Francis

spirit · Dec 23, 2023

Glowsome said:
This whole thing started as a project to get around limitations i encountered.
and as a whole, the usage of OCFS2 is oracle, and after the took over MySQL, i dread being locked in by using this.

Due to the nature it grew.. and grew ... and maybe yes its turned into a beast .. but still all in all i'm not after something like a speed-gain.
Concider it a proof of concept, and its still surviving over and over again fulfilling my needs since i stepped in using PVE 5.x
Latest is i migrated all 4 nodes to PVE 8.x.

- Glowsome

Hi Glowsome. Did you have ocfs2 experience previously. I remember to have give it a try in 2016, and it's was a nightmare with locking and kernel bug.

How does gfs2 compare ? is it stable ? do you have encounter locking bug, and fs corruption ?

I'm using ceph in production, but I have customers with big san array coming from vmware, and thin provising + snapshot is really missing with simple lvm volumes.

Glowsome · Dec 24, 2023

spirit said:
Hi Glowsome. Did you have ocfs2 experience previously. I remember to have give it a try in 2016, and it's was a nightmare with locking and kernel bug.

How does gfs2 compare ? is it stable ? do you have encounter locking bug, and fs corruption ?

I'm using ceph in production, but I have customers with big san array coming from vmware, and thin provising + snapshot is really missing with simple lvm volumes.

Hi there,

To go top-down in answering your questions:

Did you have ocfs2 experience previously

No

How does gfs2 compare ? is it stable ? do you have encounter locking bug, and fs corruption ?

Its stable as far as i can tell, i have not had any issues with it going down on me, nor locks, nor FS-corruption.
(i mean if i were having above i would have searched for solutions and reported/updated the tutorial i wrote)

As you can see the tutorial was written like years ago (2019) , and the setup has stood thru time on my end (with the noticable additional tools like Ansible to keep stuff in order after updates)
If i remember correctly i started out on like PVE 6.4 , and current is now PVE 8.1.3, and i'm still running it ... even after having rotated in new servers ( going from HP DL360Gen7 -> HP DL360Gen9) and having extended the MSA2040 i am using from 1 shelf to 2 shelfs (48x 600Gb disks)

- Glowsome

spirit · Dec 24, 2023

Glowsome said:
Hi there,

To go top-down in answering your questions:

No

Its stable as far as i can tell, i have not had any issues with it going down on me, nor locks, nor FS-corruption.
(i mean if i were having above i would have searched for solutions and reported/updated the tutorial i wrote)

As you can see the tutorial was written like years ago (2019) , and the setup has stood thru time on my end (with the noticable additional tools like Ansible to keep stuff in order after updates)
If i remember correctly i started out on like PVE 6.4 , and current is now PVE 8.1.3, and i'm still running it ... even after having rotated in new servers ( going from HP DL360Gen7 -> HP DL360Gen9) and having extended the MSA2040 i am using from 1 shelf to 2 shelfs (48x 600Gb disks)

- Glowsome

Thanks Glowsome !
Last question: if a node crash, does it impact other nodes ? (in term of locking/lag ) ? I have seen this kind of report on redhat mailing in the past.

I have found another way (that redhat ovirt is using), to have thin provising && snapshot:
They use qcow2 on top on cluster kvm, without any filesystem. They create small lvm volume, and they monitor used size and increase them dynamically with a service (like pvestatd on proxmox) when size reach a threshold. (and snasphot is working with qcow2 )
(but discard is not working, and shrinking is impossible too)
https://github.com/oVirt/vdsm/blob/master/doc/thin-provisioning.md

It's a little bit more complex than gfs2...so if gfs2 is really stable, it'll give it a try

I would like to add an official storage plugin in proxmox to handle san / thinprovision/snapshot, as I have a lot customer migrating from vmware, and they mostly use san block storage.
(It should be easy to add the vgchange,change, mount,... in the storage plugin directly)

Glowsome · Dec 25, 2023

spirit said:
Thanks Glowsome !
Last question: if a node crash, does it impact other nodes ? (in term of locking/lag ) ? I have seen this kind of report on redhat mailing in the past.

I have found another way (that redhat ovirt is using), to have thin provising && snapshot:
They use qcow2 on top on cluster kvm, without any filesystem. They create small lvm volume, and they monitor used size and increase them dynamically with a service (like pvestatd on proxmox) when size reach a threshold. (and snasphot is working with qcow2 )
(but discard is not working, and shrinking is impossible too)
https://github.com/oVirt/vdsm/blob/master/doc/thin-provisioning.md

It's a little bit more complex than gfs2...so if gfs2 is really stable, it'll give it a try

I would like to add an official storage plugin in proxmox to handle san / thinprovision/snapshot, as I have a lot customer migrating from vmware, and they mostly use san block storage.
(It should be easy to add the vgchange,change, mount,... in the storage plugin directly)

Hi there,

Again going top-down as to your questions:

Last question: if a node crash, does it impact other nodes ? (in term of locking/lag ) ? I have seen this kind of report on redhat mailing in the past.

if a node crashes / or is poison-pilled/STONITH'ed the rest functions without issues after.
The crashed node gets removed from a/the lockspace, and thus is no longer apart of it.

I have tested it by just hard-resetting a node, and everything keeps working on the other nodes, so again ... from my perspective it solid.
Even had full power outage due to some $%%^&-head digging thu a powercable in the street, again after it was restored everything came up fine - without issues.

One thing i noticed is that it will take a bit longer for the VM's/LXC's to come up after, as a check is run, still they came up without issue every single time.

I have found another way (that redhat ovirt is using), to have thin provising && snapshot:

I never have explored this, so i cannot compare/answer to this.

- Glowsome

ivanovs.artjoms · Jan 16, 2024

Glowsome said:

Just an update from my end, it seems the location (since a bit ( now running 7.0.14 PVE) ) of the unitfile has changed, and introduced some issues so i had to adapt my Ansible playbook.

Code:

---
# ./roles/proxmox/tasks/main.yml

- name: remove mcp repository to reset content
  ansible.builtin.file:
    path: /etc/apt/sources.list.d/mcp.list
    state: absent

- name: Add HP repository into sources list using specified filename (Debian 10)
  ansible.builtin.apt_repository:
    repo: deb http://downloads.linux.hpe.com/SDR/repo/mcp buster/current non-free
    state: present
    filename: mcp
  when:
    - ansible_facts['distribution'] == "Debian"
    - ansible_facts['distribution_major_version'] == "10"

- name: Add HP repository into sources list using specified filename (Debian 11)
  ansible.builtin.apt_repository:
    repo: deb http://downloads.linux.hpe.com/SDR/repo/mcp bullseye/current non-free
    state: present
    filename: mcp
  when:
    - ansible_facts['distribution'] == "Debian"
    - ansible_facts['distribution_major_version'] == "11"

- name: Add ProxMox free repository into sources list using specified filename (Debian 10)
  ansible.builtin.apt_repository:
    repo: deb http://download.proxmox.com/debian buster pve-no-subscription
    state: present
    filename: pve-install-repo
  when:
    - ansible_facts['distribution'] == "Debian"
    - ansible_facts['distribution_major_version'] == "10"

- name: Remove ProxMox Enterprise repository from sources list using specified filename (Debian 10)
  ansible.builtin.apt_repository:
    repo: deb https://enterprise.proxmox.com/debian/pve buster pve-enterprise
    state: absent
    filename: pve-enterprise
  when:
    - ansible_facts['distribution'] == "Debian"
    - ansible_facts['distribution_major_version'] == "10"

- name: Remove ProxMox Enterprise repository from sources list using specified filename (Debian 11)
  ansible.builtin.apt_repository:
    repo: deb https://enterprise.proxmox.com/debian/pve bullseye pve-enterprise
    state: absent
    filename: pve-enterprise
  when:
    - ansible_facts['distribution'] == "Debian"
    - ansible_facts['distribution_major_version'] == "11"

- name: Add ProxMox free repository into sources list using specified filename (Debian 11)
  ansible.builtin.apt_repository:
    repo: deb http://download.proxmox.com/debian/pve bullseye pve-no-subscription
    state: present
    filename: pve-install-repo
  when:
    - ansible_facts['distribution'] == "Debian"
    - ansible_facts['distribution_major_version'] == "10"

- name: Remove ProxMox Enterprise repository from sources list using specified filename (Debian 10)
  ansible.builtin.apt_repository:
    repo: deb https://enterprise.proxmox.com/debian/pve buster pve-enterprise
    state: absent
    filename: pve-enterprise
  when:
    - ansible_facts['distribution'] == "Debian"
    - ansible_facts['distribution_major_version'] == "10"

- name: Remove ProxMox Enterprise repository from sources list using specified filename (Debian 11)
  ansible.builtin.apt_repository:
    repo: deb https://enterprise.proxmox.com/debian/pve bullseye pve-enterprise
    state: absent
    filename: pve-enterprise
  when:
    - ansible_facts['distribution'] == "Debian"
    - ansible_facts['distribution_major_version'] == "11"

- name: Add ProxMox free repository into sources list using specified filename (Debian 11)
  ansible.builtin.apt_repository:
    repo: deb http://download.proxmox.com/debian/pve bullseye pve-no-subscription
    state: present
    filename: pve-install-repo
  when:
    - ansible_facts['distribution'] == "Debian"
    - ansible_facts['distribution_major_version'] == "11"

- name: Remove ProxMox Enterprise repository from sources list using specified filename (Debian 11)
  ansible.builtin.apt_repository:
    repo: deb https://enterprise.proxmox.com/debian/pve bullseye pve-enterprise
    state: absent
    filename: pve-enterprise
  when:
    - ansible_facts['distribution'] == "Debian"
    - ansible_facts['distribution_major_version'] == "11"

- name: Register hostname to determine if its part of a cluster
  ansible.builtin.command: 'hostname --fqdn'
  register: nodename

#- name: Print information about nodename
#  ansible.builtin.debug:
#    var: nodename.stdout

- name: Install additional packages needed for ProxMox Cluster environment
  ansible.builtin.apt:
    name:
      - lvm2-lockd
      - dlm-controld
      - gfs2-utils
    state: present
  when: nodename.stdout is regex("^node0?\.*.")

- name: Update apt-get repo and cache
  ansible.builtin.apt:
    update_cache: yes
    force_apt_get: yes
    cache_valid_time: 3600

- name: Upgrade all apt packages
  ansible.builtin.apt:
    upgrade: dist
    force_apt_get: yes

- name: Check if a reboot is needed for ProxMox boxes
  ansible.builtin.stat:
    path: /var/run/reboot-required
  register: check_reboot

- name: Print information about reboot
  ansible.builtin.debug:
    var: check_reboot

- name: Ensure customised dlm.conf is present
  ansible.builtin.template:
    src: 'dlm.conf.j2'
    dest: '/etc/dlm/dlm.conf'
    mode: 0600
  when: nodename.stdout is regex("^node0?\.*.")

- name: Ensure lvm.conf contains lvmlockd = 1
  ansible.builtin.template:
    src: 'lvm.conf.j2'
    dest: '/etc/lvm/lvm.conf'
    mode: 0600
  when: nodename.stdout is regex("^node0?\.*.")

- name: Ensure shared volumes and mountpoint definition file is present
  ansible.builtin.template:
    src: 'lvmshared.conf.j2'
    dest: '/etc/lvm/lvmshared.conf'
    mode: 0600
  when: nodename.stdout is regex("^node0?\.*.")

- name: Ensure the mountscript for shared volume is available
  ansible.builtin.template:
    src: lvmmount.sh.j2
    dest: '/usr/local/share/lvmmount.sh'
    mode: 0700
  when: nodename.stdout is regex("^node0?\.*.")

- name: Ensure Systemd service for shared volumes is present
  ansible.builtin.template:
    src: 'lvshared.service.j2'
    dest: '/usr/lib/systemd/system/lvshared.service'
    mode: 0644
  when: nodename.stdout is regex("^node0?\.*.")

- name: Remove possible wrong location of After=lvshared.service
  ansible.builtin.lineinfile:
    path: /lib/systemd/system/pve-guests.service
    regexp: '^After=lvshared.service'
    state: absent
  when: nodename.stdout is regex("^node0?\.*.")

- name: Ensure Systemd service pve-guests has an After=lvshared.service entry
  ansible.builtin.lineinfile:
    path: /lib/systemd/system/pve-guests.service
    regexp: '^After=lvshared.service'
    insertafter: '^After=pve-ha-crm.service.*'
    line: After=lvshared.service
    mode: 0644
  when: nodename.stdout is regex("^node0?\.*.")

- name: Force systemd to reread configs (2.4 and above)
  ansible.builtin.systemd:
    daemon_reload: yes

- name: Check /root/.ssh/authorised_keys
  ansible.builtin.stat:
    path: /root/.ssh/authorized_keys
    get_checksum: no
  register: ssh_authorized_keys_stat

- name: Delete /root/.ssh/authorised_keys when not a symlink or not linked correctly
  ansible.builtin.file:
    path: /root/.ssh/authorised_keys
    state: absent
  when:
    - ssh_authorized_keys_stat.stat.islnk is not defined or ssh_authorized_keys_stat.stat.lnk_target != "/etc/pve/priv/authorized_keys"

- name: Symlink /root/.ssh/authorized_keys to /etc/pve/priv/authorized_keys
  ansible.builtin.file:
    src: /etc/pve/priv/authorized_keys
    dest: /root/.ssh/authorized_keys
    owner: root
    state: link
  when:
    - ssh_authorized_keys_stat.stat.islnk is not defined or ssh_authorized_keys_stat.stat.lnk_target != "/etc/pve/priv/authorized_keys"

- name: Check /etc/ssh/ssh_known_hosts
  ansible.builtin.stat:
    path: /etc/ssh/ssh_known_hosts
    get_checksum: no
  register: ssh_known_hosts_stat

- name: Delete /etc/ssh/ssh_known_hosts when not a symlink or not linked correctly
  ansible.builtin.file:
    path: /etc/ssh/ssh_known_hosts
    state: absent
  when:
    - ssh_known_hosts_stat.stat.islnk is not defined or ssh_known_hosts_stat.stat.lnk_target != "/etc/pve/priv/known_hosts"

- name: Symlink /etc/ssh/ssh_known_hosts to /etc/pve/priv/known_hosts
  ansible.builtin.file:
    src: /etc/pve/priv/known_hosts
    dest: /etc/ssh/ssh_known_hosts
    owner: root
    state: link
  when:
    - ssh_known_hosts_stat.stat.islnk is not defined or ssh_known_hosts_stat.stat.lnk_target != "/etc/pve/priv/known_hosts"

- name: Add nodes to known_hosts
  ansible.builtin.known_hosts:
    path: /etc/pve/priv/known_hosts
    name: '{{ item.name }}'
    key: '{{ item.name }} {{ item.key }}'
  loop: '{{ my_node_keys }}'
  no_log: true
  when: nodename.stdout is regex("^node0?\.*.")

- name: Check if /root/.ssh/ssh_known_hosts
  ansible.builtin.stat:
    path: /root/.ssh/known_hosts
    get_checksum: no
  register: root_known_hosts_stat

- name: Delete /root/.ssh/known_hosts when not a symlink or not linked correctly
  ansible.builtin.file:
    path: /root/.ssh/known_hosts
    state: absent
  when:
    - root_known_hosts_stat.stat.islnk is not defined or root_known_hosts_stat.stat.lnk_target != "/etc/pve/priv/known_hosts"

- name: Symlink /root/.ssh/known_hosts to /etc/pve/priv/known_hosts
  ansible.builtin.file:
    src: /etc/pve/priv/known_hosts
    dest: /root/.ssh/known_hosts
    owner: root
    state: link
  when:
    - root_known_hosts_stat.stat.islnk is not defined or root_known_hosts_stat.stat.lnk_target != "/etc/pve/priv/known_hosts"

- name: Set up Node authorized keys
  ansible.posix.authorized_key:
    manage_dir: no
    path: /etc/pve/priv/authorized_keys
    user: root
    state: present
    key: '{{ item.key }}'
  loop: '{{ my_node_keys }}'
  no_log: true
  when: nodename.stdout is regex("^node0?\.*.")

- name: Add keys to ssh_known_hosts
  ansible.builtin.known_hosts:
    path: /etc/pve/priv/known_hosts
    name: '{{ item.name }}'
    key: '{{ item.name }} {{ item.key }}'
  loop: '{{ my_host_keys }}'
  no_log: true
  when: nodename.stdout is regex("^node0?\.*.")

The playbook will also correct incorrect placement of the systemd unitfile dependancy in the pve-guests unitfile.

Hello,

Does this setup work fine on latest proxmox 8? Did you had any problems with GFS2? did you made any changes? I am planning to migrate from HyperV and IBM SAN storage to Proxmox.

Glowsome · Jan 18, 2024

ivanovs.artjoms said:
Hello,

Does this setup work fine on latest proxmox 8? Did you had any problems with GFS2? did you made any changes? I am planning to migrate from HyperV and IBM SAN storage to Proxmox.

Still running it as latest post, now with PVE8.1.4

- Glowsome

vamp · Feb 12, 2024

@Glowsome

Hello there,

You able to help me a bit? I try to figure out, what my problem... I dont use Proxmox, but the goal are similar with your conf.

So, attach a iSCSI device to my nodes (3 node)

install dlm, corosync, sanlock and gfs2 packages (userspace and kernel modules)

configure corosync

Code:

# Please read the corosync.conf.5 manual page
totem {
        version: 2

        # Set name of the cluster
        cluster_name: gitlab_storage

        # crypto_cipher and crypto_hash: Used for mutual node authentication.
        # If you choose to enable this, then do remember to create a shared
        # secret with "corosync-keygen".
        # enabling crypto_cipher, requires also enabling of crypto_hash.
        # crypto works only with knet transport
        crypto_cipher: none
        crypto_hash: none
}

logging {
        # Log the source file and line where messages are being
        # generated. When in doubt, leave off. Potentially useful for
        # debugging.
        fileline: off
        # Log to standard error. When in doubt, set to yes. Useful when
        # running in the foreground (when invoking "corosync -f")
        to_stderr: yes
        # Log to a log file. When set to "no", the "logfile" option
        # must not be set.
        to_logfile: yes
        logfile: /var/log/cluster/corosync.log
        # Log to the system log daemon. When in doubt, set to yes.
        to_syslog: yes
        # Log debug messages (very verbose). When in doubt, leave off.
        debug: off
        # Log messages with time stamps. When in doubt, set to hires (or on)
        #timestamp: hires
        logger_subsys {
                subsys: QUORUM
                debug: off
        }
}

quorum {
        # Enable and configure quorum subsystem (default: off)
        # see also corosync.conf.5 and votequorum.5
        provider: corosync_votequorum

}

nodelist {
        # Change/uncomment/add node sections to match cluster configuration

        node {
                # Hostname of the node
                name: node1
                # Cluster membership node identifier
                nodeid: 1
                # Address of first link
                ring0_addr: 10.51.38.66
                # When knet transport is used it's possible to define up to 8 links
                #ring1_addr: 192.168.1.1
        }
        node {
#               # Hostname of the node
                name: node2
#               # Cluster membership node identifier
                nodeid: 2
#               # Address of first link
                ring0_addr: 10.51.38.69
#               # When knet transport is used it's possible to define up to 8 links
#               #ring1_addr: 192.168.1.2
        }
        node {
#               # Hostname of the node
                name: node3
#               # Cluster membership node identifier
                nodeid: 3
#               # Address of first link
                ring0_addr: 10.51.38.92
#               # When knet transport is used it's possible to define up to 8 links
#               #ring1_addr: 192.168.1.2
        }
        # ...
}

also set dlm.conf

Code:

log_debug=1
daemon_debug=1
protocol=tcp

device wd /usr/sbin/fence_sanlock path=/dev/fence/leases
connect wd node=node1 host_id=1
connect wd node=node2 host_id=2
connect wd node=node3 host_id=3
unfence wd

My problem is that dlm.service can not start... It say "starting" but after systemd default timeout (1 min 30 sec) are stopped. The interesting part are come now! dlm.service original are:

Code:

[Unit]
Description=dlm control daemon
Requires=corosync.service sys-kernel-config.mount
After=corosync.service sys-kernel-config.mount

[Service]
OOMScoreAdjust=-1000
Type=notify
NotifyAccess=main
EnvironmentFile=/etc/sysconfig/dlm
ExecStartPre=/sbin/modprobe dlm
ExecStart=/usr/sbin/dlm_controld --foreground $DLM_CONTROLD_OPTS
#ExecStopPost=/sbin/modprobe -r dlm

# If dlm_controld doesn't stop, there are active lockspaces.
# Killing it will just get the node fenced.
SendSIGKILL=no

[Install]
WantedBy=multi-user.target

if i change from

Code:

Type=notify

to

Code:

Type=simple

it work well, i able to mount the filesystem etc... so it mean it wait a notification message to set service "active" that never come ... I search whole net, but not found any hint, what is the problem...

Here the dlm.service output, when it timeout.

Code:

Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 node_config 3
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 found /dev/misc/dlm-control minor 122
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 found /dev/misc/dlm-monitor minor 121
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 found /dev/misc/dlm_plock minor 120
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 /sys/kernel/config/dlm/cluster/comms: opendir failed: 2
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 /sys/kernel/config/dlm/cluster/spaces: opendir failed: 2
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 set log_debug 1
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 set mark 0
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 set protocol 1
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 set /proc/sys/net/core/rmem_default 4194304
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 set /proc/sys/net/core/rmem_max 4194304
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 set recover_callbacks 1
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 cmap totem.cluster_name = 'gitlab_storage'
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 set cluster_name gitlab_storage
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 /dev/misc/dlm-monitor fd 13
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 cluster quorum 1 seq 508 nodes 3
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 cluster node 1 added seq 508
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 set_configfs_node 1 10.51.38.66 local 0 mark 0
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 cluster node 2 added seq 508
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 set_configfs_node 2 10.51.38.69 local 1 mark 0
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 cluster node 3 added seq 508
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 set_configfs_node 3 10.51.38.92 local 0 mark 0
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 cpg_join dlm:controld ...
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 setup_cpg_daemon 15
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 dlm:controld conf 3 1 0 memb 1 2 3 join 2 left 0
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 daemon joined 1
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 daemon joined 2
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 daemon joined 3
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 dlm:controld ring 1:508 3 memb 1 2 3
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 receive_protocol 1 max 3.1.1.0 run 3.1.1.0
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 daemon node 1 prot max 0.0.0.0 run 0.0.0.0
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 daemon node 1 save max 3.1.1.0 run 3.1.1.0
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 run protocol from nodeid 1
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 daemon run 3.1.1 max 3.1.1 kernel run 1.1.1 max 1.1.1
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 plocks 16
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 receive_fence_clear from 1 for 2 result 0 flags 6
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 clear_startup_nodes 3
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 fence_in_progress_unknown 0 recv
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 receive_protocol 3 max 3.1.1.0 run 3.1.1.0
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 daemon node 3 prot max 0.0.0.0 run 0.0.0.0
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 daemon node 3 save max 3.1.1.0 run 3.1.1.0
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 receive_protocol 2 max 3.1.1.0 run 0.0.0.0
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 daemon node 2 prot max 0.0.0.0 run 0.0.0.0
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 daemon node 2 save max 3.1.1.0 run 0.0.0.0
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 receive_protocol 2 max 3.1.1.0 run 3.1.1.0
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 daemon node 2 prot max 3.1.1.0 run 0.0.0.0
Feb 12 14:29:39 hun25-10v dlm_controld[20194]: 2284 daemon node 2 save max 3.1.1.0 run 3.1.1.0
Feb 12 14:31:09 hun25-10v systemd[1]: dlm.service: start operation timed out. Terminating.
Feb 12 14:31:09 hun25-10v dlm_controld[20194]: 2374 helper pid 20195 term signal 15
Feb 12 14:31:09 hun25-10v dlm_controld[20194]: 2374 helper pid 20195 term signal 15
Feb 12 14:31:09 hun25-10v dlm_controld[20194]: 2374 shutdown
Feb 12 14:31:09 hun25-10v dlm_controld[20194]: 2374 cpg_leave dlm:controld ...
Feb 12 14:31:09 hun25-10v dlm_controld[20194]: 2374 clear_configfs_nodes rmdir "/sys/kernel/config/dlm/cluster/comms/3"
Feb 12 14:31:09 hun25-10v dlm_controld[20194]: 2374 clear_configfs_nodes rmdir "/sys/kernel/config/dlm/cluster/comms/2"
Feb 12 14:31:09 hun25-10v dlm_controld[20194]: 2374 clear_configfs_nodes rmdir "/sys/kernel/config/dlm/cluster/comms/1"
Feb 12 14:31:09 hun25-10v systemd[1]: dlm.service: Failed with result 'timeout'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░
░░ The unit dlm.service has entered the 'failed' state with result 'timeout'.

javieitez · Apr 4, 2024

I've followed this tutorial on a 2 node cluster. Works perfectly, but it doesn't mount the GFS2 volumes after a reboot.

After some troubleshooting, we've managed to solve it by creating the following systemd unit and script

Bash:

[Unit]
Description=Enable Startup Lock for GFS2 volumes
After=corosync.service dlm.service
StartLimitIntervalSec=0

[Service]
Type=oneshot
ExecStartPre=/bin/sleep 2
User=root
ExecStart=/usr/sbin/initlock.sh

[Install]
WantedBy=multi-user.target

Bash:

#!/bin/bash
vgchange --lockstart
sleep 2
lvchange -asy /dev/VGNAME/LVNAME
sleep 2
mount -a

spirit · Aug 29, 2024

@Glowsome

Hi, sorry to bump this old thread, but I'm currently evaluating gfs2 vs ocfs 2 vs qcow2 on top of lvm.

for gfs2, I see in redhat documentation that it's possible to use gfs2 without lvm. (redhat don't support it, but it's technically possible, and for example, xenserver support it https://docs.xenserver.com/en-us/citrix-hypervisor/storage/gfs2.html)

Do you have already tried gfs2 without lvm ?

spirit · Aug 30, 2024

Another question about write performance,

I have done some test with fio, and I have abymissal results when the vm disk file is not preallocated.

preallocated: I got around 20000iops 4k randwrite, 3GB/S write 4M. (This is almost the same than my physical disk without gfs2)

but when the disk is not preallocated, or when I take a snapshot on a preallocated drive. (so new write are not preallocated anymore), I have :

60 iops 4k randwrite, 40MB/S for write 4M

Glowsome · Sep 1, 2024

spirit said:
@Glowsome

Hi, sorry to bump this old thread, but I'm currently evaluating gfs2 vs ocfs 2 vs qcow2 on top of lvm.

for gfs2, I see in redhat documentation that it's possible to use gfs2 without lvm. (redhat don't support it, but it's technically possible, and for example, xenserver support it https://docs.xenserver.com/en-us/citrix-hypervisor/storage/gfs2.html)

Do you have already tried gfs2 without lvm ?

i have not changed my setup since the beginning of the/this post.
So the answer is : No
The reason i used GFS2 in combo with LVM is that i actually offer a filesystem to ProxMox.

[TUTORIAL] PVE 7.x Cluster Setup of shared LVM/LV with MSA2040 SAS [partial howto]

Well-Known Member

Well-Known Member

Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Well-Known Member

Distinguished Member

Renowned Member

Distinguished Member

Renowned Member

New Member

Renowned Member

Active Member

New Member

Distinguished Member

Distinguished Member

Renowned Member

We value your privacy