Ceph over multipath device or something else

decibel83

Renowned Member
Oct 15, 2008
210
1
83
Hi,
I have 4 Proxmox servers, all connected to a Fiber Channel storage using multipath which is serving two volumes: STORAGE-DATA and STORAGE-DUMP.

Every node is showing 4 sd* devices (2 per volume) and one mapper device per volume:

Code:
root@node2:~# multipath -ll
STORAGE-DATA (36b4432610018cd305a2f574700000013) dm-5 HUAWEI,XSG1
size=8.0T features='0' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| `- 1:0:1:1 sdb 8:16 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  `- 8:0:0:1 sdd 8:48 active ghost running
STORAGE-DUMP (36b4432610018cd305a2f64b900000014) dm-6 HUAWEI,XSG1
size=4.0T features='0' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| `- 8:0:0:2 sde 8:64 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  `- 1:0:1:2 sdc 8:32 active ghost running

root@node2:~# mount|grep STORAGE
/dev/mapper/STORAGE-DATA-part1 on /STORAGE-DATA type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,coherency=full,user_xattr,acl,_netdev)
/dev/mapper/STORAGE-DUMP-part1 on /STORAGE-DUMP type ocfs2 (rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,coherency=full,user_xattr,acl,_netdev)

Actually the volumes are formatted with OCFS2 (Oracle Cluster File System) because as every node is seeing the same storage this can be distributed over the network to avoid conflicts.

The problem is that I don't know OCFS2 (this configuration was made by another IT team) and the volumes are mounted on Proxmox as directory, so I don't have snapshots, migration, hight availability, nothing.

I'm wondering if I could switch to Ceph to have a block device which supports snapshots, or if you have any other ideas about how to make things better here.

Thank you very much for your help!
Bye
 
Hi, I think the only way to achieve your goals is to mount these storages and use it like directory, flagging the shared checkbox.
In this way you can migrate and have HA between the two nodes.
Concerning the snapshot you can obtain storing vm disks as qcow2 files.

I'm not really sure about migration, HA, thin provisioning and snapshot implementing LVM over the storage.

I'm doing the same thing in these days too.
 
Thanks! I will check.
In your opinion, could I make a LVM physical volume over the OCFS2 volume?

Thanks!
 
Hi, I think the only way to achieve your goals is to mount these storages and use it like directory, flagging the shared checkbox.
In this way you can migrate and have HA between the two nodes.
Concerning the snapshot you can obtain storing vm disks as qcow2 files.

I'm not really sure about migration, HA, thin provisioning and snapshot implementing LVM over the storage.

I'm doing the same thing in these days too.
This won't work because mounting the same FS from multiple machines will definitely lead to a corrupted FS. That's why a cluster FS was used in the first place.

I'm wondering if I could switch to Ceph to have a block device which supports snapshots, or if you have any other ideas about how to make things better here.
Ceph is not intended for such a setup!

What works from Proxmox is if you setup a thick LVM on the shared storage box. The LVs for the individual virtual machines will only ever be used by a single Proxmox VE instance, thus avoiding corruption by two machines writing to it at the same time.
 
What works from Proxmox is if you setup a thick LVM on the shared storage box. The LVs for the individual virtual machines will only ever be used by a single Proxmox VE instance, thus avoiding corruption by two machines writing to it at the same time.

So could I create one LVM physical volume over the top of the OCFS2 volume directly into the /dev/mapper/STORAGE-DATA device?
So basically pvcreate /dev/mapper/STORAGE-DATA and so on?

What about GFS2? Do you consider it better than OCFS2?

Thanks!
 
No, create the LVM directly on the block device. No OCFS or other cluster file system underneath.

This works (I hope I don't butcher the technical details here) because when using thick LVM each VM disk gets a dedicated area on the block device (logical volume). Since a VM can run only once in a cluster, you don't run into the problem of parallel writes to the same blocks.
 
  • Like
Reactions: Alwin
No, create the LVM directly on the block device. No OCFS or other cluster file system underneath.

This works (I hope I don't butcher the technical details here) because when using thick LVM each VM disk gets a dedicated area on the block device (logical volume). Since a VM can run only once in a cluster, you don't run into the problem of parallel writes to the same blocks.

Ok, thanks!
Yes,
Code:
/dev/mapper/STORAGE-DATA
is the block device from multimapper, so I would be able to create a physical LVM volume on it.

Just one question: should I create an LVM or LVM-Thin volume?
Because LVM will not give me snapshot, but LVM-Thin yes.

So I don't need OCFS2, fine!
 
Just one question: should I create an LVM or LVM-Thin volume?
Because LVM will not give me snapshot, but LVM-Thin yes.
This will not work with LVM-Thin because it stores data all over place.

If you want snapshots from a shared central storage you will need to resort to a Samba/CIFS or NFS share on which you then store the images in qcow format.

Ceph spreads the storage over the cluster nodes and does need quite a different set of requirements which can make it hard to introduce it into an existing cluster.
 
This will not work with LVM-Thin because it stores data all over place.

Ok, but now using OCFS2 as a local directory storage I can have snapshots if I use qcow2 images.
But I cannot have HA.

So I have to choose between snapshots and HA, right?

If you want snapshots from a shared central storage you will need to resort to a Samba/CIFS or NFS share on which you then store the images in qcow format.

I cannot use CIFS or NFS because that FC storage is directly attached to Proxmox nodes.

Ceph spreads the storage over the cluster nodes and does need quite a different set of requirements which can make it hard to introduce it into an existing cluster.

Ok Ceph is not a solution, I understand. Unless I will make one different volume per each node and configure Ceph on them, but this way I will have 1/4 of available space with no more high-availability advantages...[/QUOTE][/QUOTE]
 
I don't have any experience with OCFS2 so I cannot tell you if it can be set up in a HA way.

Regarding Ceph: please read a bit more about it and how it works. It does not take iSCSI or FC block devices but wants raw disks that are in the server. Redundancy is achieved by storing it on at least 3 servers. A fast dedicated network between the Ceph servers is needed as well.

Ceph does break with the classical approach of a central storage that each node can access.
 
I don't have any experience with OCFS2 so I cannot tell you if it can be set up in a HA way.

OCFS2 is a shared file system. So basically every node is seeing the same storage in the same way and they can share the same files without conflicting each other.

Regarding Ceph: please read a bit more about it and how it works. It does not take iSCSI or FC block devices but wants raw disks that are in the server. Redundancy is achieved by storing it on at least 3 servers. A fast dedicated network between the Ceph servers is needed as well.

Ceph does break with the classical approach of a central storage that each node can access.

Yes, I know how Ceph works, but I was thinking that it would work on multipath block devices.
It will work if I would have local storage on every node, but this is not my case so it won't suit for my situation.
 
OCFS2 is a shared file system. So basically every node is seeing the same storage in the same way and they can share the same files without conflicting each other.
So why should this not work with HA? If the storage is available on all nodes and works if a node is down it should work AFAICT?

Yes, I know how Ceph works, but I was thinking that it would work on multipath block devices.
Technically it could work, but it is not a great idea from a performance and failsafe perspective and definitely not a setup that is supported from our side.
 
So, adding up: your advise is to use LVM (not Thin!) on the multipath mapper volume on every node, configuring it as "shared".
This way I would have HA but not snapshots.

Right?
 
The supported setup for a shared storage over iSCSI/FC is a thick LVM yes.

If you have OCSF2 set up and it is working for you, what is the problem why you say you cannot have HA with it?

The HA stack will make sure that a node that lost contact with the quorum part of the cluster will fence (hard reset) itself after ~2 minutes if a HA enabled guest is running on it. One of the remaining nodes will boot the HA guest after about ~3 minutes. If the shared storage is available for all nodes I don't see why it shouldn't work.
 
If you have OCSF2 set up and it is working for you, what is the problem why you say you cannot have HA with it?

Because it's not a shared storage on the Proxmox part, it's a local dir storage mounted on OCSF2.
Proxmox is seeing that storage as a directory storage.
 
Because it's not a shared storage on the Proxmox part, it's a local dir storage mounted on OCSF2.
Proxmox is seeing that storage as a directory storage.
Have you tried to enable the Shared option for that directory storage?
This option exists to tell PVE that whatever is actually mounted at the location, can be accessed from all nodes.
 
Have you tried to enable the Shared option for that directory storage?
This option exists to tell PVE that whatever is actually mounted at the location, can be accessed from all nodes.

I was not aware on this. I will try and check, thank you!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!