Help with LVM over iSCSI on a 3 node Proxmox cluster

logui

Member
Feb 22, 2024
52
5
8
I have a cluster with 3 nodes, each node has a second disks that I want to use to enable iSCSI then LVM on top of it, to be used as Shared storage for the cluster, similar to what Ceph will do, but I expect to be lighter on resources and logging.

I haven't been able to find a good guide on how to install and configure Open-iSCSI on each node, then enable LVM on top, and use the 3 disks as a shared storage for the cluster.
 
Hi @logui.

It seems that you have a misunderstanding of how shared iSCSI storage works.
Unlike Ceph, which is a distributed storage meant to use local disks, iSCSI is centralized type of storage.
Meaning that a single node (or specialized cluster) is presenting single or multiple LUNs to single or multiple remote host.
Those hosts/clients can access the LUN presented over iSCSI simultaneously.

While you can arrive at a semblance of this with local disks, it's not going to be well documented anywhere because it's just not something that people do...

An example would be exporting local disks from host1 via iSCSI and then accessing the same disk from host1 via iSCSI on host1. A mouthful.

Good luck


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: Kingneutron
Hi @logui.

It seems that you have a misunderstanding of how shared iSCSI storage works.
Unlike Ceph, which is a distributed storage meant to use local disks, iSCSI is centralized type of storage.
Meaning that a single node (or specialized cluster) is presenting single or multiple LUNs to single or multiple remote host.
Those hosts/clients can access the LUN presented over iSCSI simultaneously.

While you can arrive at a semblance of this with local disks, it's not going to be well documented anywhere because it's just not something that people do...

An example would be exporting local disks from host1 via iSCSI and then accessing the same disk from host1 via iSCSI on host1. A mouthful.

Good luck


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
I understand, thank you for the clarification, any other suggestion on how to accomplish the shared storage goal with local disks without using Ceph? and not using NFS because I don't have extra hardware to mount it on.
 
I was thinking on ZFS + Replication, thoughts?
This depends on your usecase. ZFS replication isn't fully synchron: You need to enable and configure the replication schedule in the vm settings, default is 15 minutes. It can be extended up to several hours (I'm not sure about the upper limit) and reduced to one minute. So in the worst case your VM will loose the data since the last sync. Now depending on the actual application this doesn't need to be a big deal (for my DNS cachers I really don't care, with a file hosting or database this might be a different story) since you might afford this minimal dataloss or design around it (e.G. by setting up a database cluster with dedicated vms on your PVE nodes without syncing them).

On the other hand you don't have a single point of failure like with a single NAS or SAS. (1)

For my homelab ZFS+replication fits my needs, Ceph would be way to much overkill. I also read in the German forum here, that for many small businesses a two-node+qdevice cluster + ZFS+replication with a schedule reduced to one minute is more than enough for their needs.

But If I ever would have to implement PVE in a professional environment I would prefer to use Ceph if possible.

Best regards, Johannes.

(1) Of course you could also put two NAS or SAS together who replicates their data between them. In such a case the single-point-of-failure argument is obviouvsly not valid anymore.
 
This depends on your usecase. ZFS replication isn't fully synchron: You need to enable and configure the replication schedule in the vm settings, default is 15 minutes. It can be extended up to several hours (I'm not sure about the upper limit) and reduced to one minute. So in the worst case your VM will loose the data since the last sync. Now depending on the actual application this doesn't need to be a big deal (for my DNS cachers I really don't care, with a file hosting or database this might be a different story) since you might afford this minimal dataloss or design around it (e.G. by setting up a database cluster with dedicated vms on your PVE nodes without syncing them).

On the other hand you don't have a single point of failure like with a single NAS or SAS. (1)

For my homelab ZFS+replication fits my needs, Ceph would be way to much overkill. I also read in the German forum here, that for many small businesses a two-node+qdevice cluster + ZFS+replication with a schedule reduced to one minute is more than enough for their needs.

But If I ever would have to implement PVE in a professional environment I would prefer to use Ceph if possible.

Best regards, Johannes.

(1) Of course you could also put two NAS or SAS together who replicates their data between them. In such a case the single-point-of-failure argument is obviouvsly not valid anymore.
Thank you, for my use case, that is mostly DR oriented, ZFS+Repl seems to be the solution
 
Thank you, for my use case, that is mostly DR oriented, ZFS+Repl seems to be the solution

If you're using a 1GbE based network, it will work fine most of the time. Just note that it's possible for the VMs to use up enough network resources while a sync is running and the sync might fail. That's few and far between, but a possibility. If you can have a separate NIC dedicated to the sync, you will not have any issues. Or just let it do it's thing, as it will more than likely work the next sync no problem.

I've had a cluster of 3 mini-PCs with 1GbE network run a ZFS replication every minute and it would maybe fail once or twice a day, but like I said, the minute later it goes fine. So no biggy.
 
  • Like
Reactions: Johannes S
If you're using a 1GbE based network, it will work fine most of the time. Just note that it's possible for the VMs to use up enough network resources while a sync is running and the sync might fail. That's few and far between, but a possibility. If you can have a separate NIC dedicated to the sync, you will not have any issues. Or just let it do it's thing, as it will more than likely work the next sync no problem.

I've had a cluster of 3 mini-PCs with 1GbE network run a ZFS replication every minute and it would maybe fail once or twice a day, but like I said, the minute later it goes fine. So no biggy.
Thanks for the information, my network bandwidth is 2.5GbE, I am using one network for everything, mostly because I don't have many cards on the appliances and not many ports left in the 2.5G switch, my traffic at home is very very low, therefore I have never seen any issue related to congestion.

I am also planning to set the replication with a short time interval, and because the future replications will be deltas from the previous one, the amount of data sent will not be super big, higher frequency means less data lost and less data sent, at the cost of higher CPU usage, but I am ok with that.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!