[TUTORIAL] MD3200SAS / Proxmox + OCFS2 (Not supported by the Proxmox Team)

gesora

New Member
Jun 18, 2023
5
1
3
I have received an equipment that was decommissioned a few days ago and I wanted to test Proxmox.

Equipment received:
  • (4) R620 Dell Servers. (4) Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2
  • (1) MD3200 SAS (2 Controllers).
The first thing I was to make the Proxmox detect the MD3200 first installed the multipath tools (apt-get install multipath-tools) package after that we can create the multipath.conf file (without blacklist exceptions and multipath) part run multipath -v3 the presented LUN will show itself as blacklisted it can be easily whitelisted with the blacklist_exceptions and multipath section be sure to create the path to the same mount dir on all nodes mine was /MD3200/test_lun

multipath.conf

Bash:
[CODE]
defaults {
         find_multipaths no
}

blacklist {
        wwid .*
}

blacklist_exceptions {
        wwid "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
}

devices {
        device {
                vendor                  "DELL"
                product                 "MD32xx"
                path_grouping_policy    group_by_prio
                prio                    rdac
                path_checker            rdac
                path_selector           "round-robin 0"
                hardware_handler        "1 rdac"
                failback                immediate
                features                "2 pg_init_retries 50"
                no_path_retry           30
                rr_min_io               100
        }
}

multipaths {
        multipath {
        wwid "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
        alias test_lun
        }
}

multipath -ll result

Bash:
test_lun (XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX) dm-5 DELL,MD32xx
size=5.0T features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 rdac' wp=rw
|-+- policy='round-robin 0' prio=14 status=active
| `- 1:0:0:1 sdc 8:32 active ready running
`-+- policy='round-robin 0' prio=9 status=enabled
  `- 1:0:1:1 sdd 8:48 active ready running

After we get the multipath working, we can install the OCFS2 Package "apt-get install ocfs2-tools" then configure it with the following conf at /etc/ocfs2/cluster.conf be sure to respect the formatting it's a PITA if you don't do it, the name has to be hostname do not use any kind of special characters and after creating this file run dpkg-reconfigure ocfs2-tools the cluster name should be the same name that was declared on the file the other values can be the default values, (the cluster conf has to be passed to the other nodes)

cluster.conf
Bash:
cluster:
    node_count = 4
    name = ocfstestclu


node:
    ip_port = 7777
    ip_address = X.X.X.X
    number = 0
    name = pvenode01
    cluster = ocfstestclu


node:
    ip_port = 7777
    ip_address = X.X.X.X
    number = 1
    name = pvenode02
    cluster = ocfstestclu


node:
    ip_port = 7777
    ip_address = X.X.X.X
    number = 2
    name = pvenode03
    cluster = ocfstestclu


node:
    ip_port = 7777
    ip_address = X.X.X.X
    number = 3
    name = pvenode04
    cluster = ocfstestclu

We can check if the file conf file is working with service oc2b status

Bash:
Jun 18 23:09:35 pvenode04 systemd[1]: Starting LSB: Load O2CB cluster services at system boot....
Jun 18 23:09:35 pvenode04 o2cb[1294]: checking debugfs...
Jun 18 23:09:35 pvenode04 o2cb[1294]: Loading stack plugin "o2cb": OK
Jun 18 23:09:35 pvenode04 o2cb[1294]: Loading filesystem "ocfs2_dlmfs": OK
Jun 18 23:09:35 pvenode04 o2cb[1294]: Mounting ocfs2_dlmfs filesystem at /dlm: OK
Jun 18 23:09:35 pvenode04 o2cb[1294]: Setting cluster stack "o2cb": OK
Jun 18 23:09:36 pvenode04 o2cb[1294]: Registering O2CB cluster "ocfstestclu": OK
Jun 18 23:09:36 pvenode04 o2cb[1294]: Setting O2CB cluster timeouts : OK
Jun 18 23:09:36 pvenode04 o2hbmonitor[1419]: Starting
Jun 18 23:09:36 pvenode04 systemd[1]: Started LSB: Load O2CB cluster services at system boot..

We can now create the PV and VG

Bash:
pvcreate /dev/mapper/test_lun (only  run it on the first node)
vgcreate test_lun /dev/mapper/test_lun (only run it on the first node)
mkfs.ocfs2 -L "vmstore" -N 4 /dev/mapper/test_lun -F (only run it on the first node)

Add the partition to the fstab to the other nodes

/etc/fstab
Bash:
/dev/mapper/test_lun /MD3200/test_lun ocfs2 _netdev,nointr 0 0

run mount -a on the nodes and mount it as a directory with the shared flags
1687189835861.png

I did this for learning and testing it, I do advise you to not use this on production and if you do it on production make sure to have good backups and please do not bother the devs with this because is not supported.

Edit: as suggested by @bbgeek17 is necessary to add the following flag to /etc/pve/storage.cfg "is_mountpoint 1" There are some risks associated with LVM/LVM2 on OCFS2 which is not supported by Oracle https://support.oracle.com/knowledge/Oracle Cloud/423207_1.html thanks again @bbgeek17 and @abdul

example /etc/pve/storage.cfg
Bash:
dir:test_lun
        path /MD3200/test_lun
        content images,iso
        is_mountpoint 1
        prune-backups keep-all=1
        shared 1

or we can ran the switch Taken From: ("https://pve.proxmox.com/pve-docs/pvesm.1.html")
Bash:
pvesm add dir test_lun --path /MD3200/test_lun --is_mountpoint 1 --shared 1

Thank you.
 
Last edited:
  • Like
Reactions: bbgeek17
great job.
One thing to add : make sure this boolean is set to "yes" for the storage object :
Code:
 --is_mountpoint <string> (default = no)
           Assume the given path is an externally managed mountpoint and consider the storage offline if it is not mounted. Using a boolean (yes/no) value serves as a shortcut to using the target path in this field.

Otherwise, if there is a failure in OCFS service/mount, PVE wont notice and continue using empty local directory for some operations.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: gesora
great job.
One thing to add : make sure this boolean is set to "yes" for the storage object :
Code:
 --is_mountpoint <string> (default = no)
           Assume the given path is an externally managed mountpoint and consider the storage offline if it is not mounted. Using a boolean (yes/no) value serves as a shortcut to using the target path in this field.

Otherwise, if there is a failure in OCFS service/mount, PVE wont notice and continue using empty local directory for some operations.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
Thank you, I will do that.
 
hi, this is interesting!

please can you tell me, why you are using these 2 commands:
Code:
pvcreate /dev/mapper/test_lun (only  run it on the first node)
vgcreate test_lun /dev/mapper/test_lun (only run it on the first node)

wouldn't it be enough to use
Code:
mkfs.ocfs2 -L "vmstore" -N 4 /dev/mapper/test_lun -F
?

thx!
 
hi, this is interesting!

please can you tell me, why you are using these 2 commands:
Code:
pvcreate /dev/mapper/test_lun (only  run it on the first node)
vgcreate test_lun /dev/mapper/test_lun (only run it on the first node)

wouldn't it be enough to use
Code:
mkfs.ocfs2 -L "vmstore" -N 4 /dev/mapper/test_lun -F
?

thx!

pvcreate and vgcreate creates the physical volume and a volume group in order to proxmox to be able to mount the disks, but the LUN itself doesn't have a filesystem example EXT3, EXT4, XFS, or in this particular a cluster aware filesystem like OCFS2, that's why after create the PV and VG we format the LUN with mkfs.

Edit: I correct this I tought the use with LVM/LMV2 was supported actually it works but as @bbgeek17 mentions I could add more problems and is not currently supported by oracle.
 
Last edited:
On one hand, having LVM will allow you to potentially expand underlying disk with other disks, seamlessly expanding the VG/LV/Partition/FS later.
On the other hand, adding an extra layer is always a liability and an additional failure point. Especially, given that LVM is not properly cluster aware in this case.

I am not familiar with OCFS2 functionality, and perhaps it has facility to achieve LVM-like functionality, in which case I would not use LVM.
If one can provision maximum disk size that would be sufficient for work and does not expect expansion - I would not use LVM either. Plan to expand with a new datastore/filesystem/disk.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
On one hand, having LVM will allow you to potentially expand underlying disk with other disks, seamlessly expanding the VG/LV/Partition/FS later.
On the other hand, adding an extra layer is always a liability and an additional failure point. Especially, given that LVM is not properly cluster aware in this case.

I am not familiar with OCFS2 functionality, and perhaps it has facility to achieve LVM-like functionality, in which case I would not use LVM.
If one can provision maximum disk size that would be sufficient for work and does not expect expansion - I would not use LVM either. Plan to expand with a new datastore/filesystem/disk.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
Interesting this is pretty good I was doing only this for testing, but here is the actual response from oracle regarding lvm/lvm2 alongside with OCFS2.
https://support.oracle.com/knowledge/Oracle Cloud/423207_1.html so probably not using that in prod is a good start.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!