Help with LVM over iSCSI on a 3 node Proxmox cluster

logui · Nov 20, 2024

I have a cluster with 3 nodes, each node has a second disks that I want to use to enable iSCSI then LVM on top of it, to be used as Shared storage for the cluster, similar to what Ceph will do, but I expect to be lighter on resources and logging.

I haven't been able to find a good guide on how to install and configure Open-iSCSI on each node, then enable LVM on top, and use the 3 disks as a shared storage for the cluster.

bbgeek17 · Nov 20, 2024

Hi @logui.

It seems that you have a misunderstanding of how shared iSCSI storage works.
Unlike Ceph, which is a distributed storage meant to use local disks, iSCSI is centralized type of storage.
Meaning that a single node (or specialized cluster) is presenting single or multiple LUNs to single or multiple remote host.
Those hosts/clients can access the LUN presented over iSCSI simultaneously.

While you can arrive at a semblance of this with local disks, it's not going to be well documented anywhere because it's just not something that people do...

An example would be exporting local disks from host1 via iSCSI and then accessing the same disk from host1 via iSCSI on host1. A mouthful.

Good luck

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

logui · Nov 20, 2024

bbgeek17 said:
Hi @logui.

It seems that you have a misunderstanding of how shared iSCSI storage works.
Unlike Ceph, which is a distributed storage meant to use local disks, iSCSI is centralized type of storage.
Meaning that a single node (or specialized cluster) is presenting single or multiple LUNs to single or multiple remote host.
Those hosts/clients can access the LUN presented over iSCSI simultaneously.

While you can arrive at a semblance of this with local disks, it's not going to be well documented anywhere because it's just not something that people do...

An example would be exporting local disks from host1 via iSCSI and then accessing the same disk from host1 via iSCSI on host1. A mouthful.

Good luck

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

I understand, thank you for the clarification, any other suggestion on how to accomplish the shared storage goal with local disks without using Ceph? and not using NFS because I don't have extra hardware to mount it on.

bbgeek17 · Nov 20, 2024

if you don't need high availability than you can use anything really.
If you need to be able to survive node failure (where storage is hosted on) then it's either Ceph or something centralized, ie outside your nodes.

Good luck

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

logui · Nov 20, 2024

bbgeek17 said:
if you don't need high availability than you can use anything really.
If you need to be able to survive node failure (where storage is hosted on) then it's either Ceph or something centralized, ie outside your nodes.

Good luck

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

I was thinking on ZFS + Replication, thoughts?

Johannes S · Nov 20, 2024

logui said:
I was thinking on ZFS + Replication, thoughts?

This depends on your usecase. ZFS replication isn't fully synchron: You need to enable and configure the replication schedule in the vm settings, default is 15 minutes. It can be extended up to several hours (I'm not sure about the upper limit) and reduced to one minute. So in the worst case your VM will loose the data since the last sync. Now depending on the actual application this doesn't need to be a big deal (for my DNS cachers I really don't care, with a file hosting or database this might be a different story) since you might afford this minimal dataloss or design around it (e.G. by setting up a database cluster with dedicated vms on your PVE nodes without syncing them).

On the other hand you don't have a single point of failure like with a single NAS or SAS. (1)

For my homelab ZFS+replication fits my needs, Ceph would be way to much overkill. I also read in the German forum here, that for many small businesses a two-node+qdevice cluster + ZFS+replication with a schedule reduced to one minute is more than enough for their needs.

But If I ever would have to implement PVE in a professional environment I would prefer to use Ceph if possible.

Best regards, Johannes.

(1) Of course you could also put two NAS or SAS together who replicates their data between them. In such a case the single-point-of-failure argument is obviouvsly not valid anymore.

bbgeek17 · Nov 20, 2024

ZFS+replication is not shared storage solution, it's a Disaster Recovery solution.
You can't access source and target simultaneously.

Good luck

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

logui · Nov 20, 2024

Johannes S said:
This depends on your usecase. ZFS replication isn't fully synchron: You need to enable and configure the replication schedule in the vm settings, default is 15 minutes. It can be extended up to several hours (I'm not sure about the upper limit) and reduced to one minute. So in the worst case your VM will loose the data since the last sync. Now depending on the actual application this doesn't need to be a big deal (for my DNS cachers I really don't care, with a file hosting or database this might be a different story) since you might afford this minimal dataloss or design around it (e.G. by setting up a database cluster with dedicated vms on your PVE nodes without syncing them).

On the other hand you don't have a single point of failure like with a single NAS or SAS. (1)

For my homelab ZFS+replication fits my needs, Ceph would be way to much overkill. I also read in the German forum here, that for many small businesses a two-node+qdevice cluster + ZFS+replication with a schedule reduced to one minute is more than enough for their needs.

But If I ever would have to implement PVE in a professional environment I would prefer to use Ceph if possible.

Best regards, Johannes.

(1) Of course you could also put two NAS or SAS together who replicates their data between them. In such a case the single-point-of-failure argument is obviouvsly not valid anymore.

Thank you, for my use case, that is mostly DR oriented, ZFS+Repl seems to be the solution

PwrBank · Nov 20, 2024

logui said:
Thank you, for my use case, that is mostly DR oriented, ZFS+Repl seems to be the solution

If you're using a 1GbE based network, it will work fine most of the time. Just note that it's possible for the VMs to use up enough network resources while a sync is running and the sync might fail. That's few and far between, but a possibility. If you can have a separate NIC dedicated to the sync, you will not have any issues. Or just let it do it's thing, as it will more than likely work the next sync no problem.

I've had a cluster of 3 mini-PCs with 1GbE network run a ZFS replication every minute and it would maybe fail once or twice a day, but like I said, the minute later it goes fine. So no biggy.

logui · Nov 20, 2024

PwrBank said:
If you're using a 1GbE based network, it will work fine most of the time. Just note that it's possible for the VMs to use up enough network resources while a sync is running and the sync might fail. That's few and far between, but a possibility. If you can have a separate NIC dedicated to the sync, you will not have any issues. Or just let it do it's thing, as it will more than likely work the next sync no problem.

I've had a cluster of 3 mini-PCs with 1GbE network run a ZFS replication every minute and it would maybe fail once or twice a day, but like I said, the minute later it goes fine. So no biggy.

Thanks for the information, my network bandwidth is 2.5GbE, I am using one network for everything, mostly because I don't have many cards on the appliances and not many ports left in the 2.5G switch, my traffic at home is very very low, therefore I have never seen any issue related to congestion.

I am also planning to set the replication with a short time interval, and because the future replications will be deltas from the previous one, the amount of data sent will not be super big, higher frequency means less data lost and less data sent, at the cost of higher CPU usage, but I am ok with that.

Search

Search

Help with LVM over iSCSI on a 3 node Proxmox cluster

logui

Member

bbgeek17

Distinguished Member

logui

Member

bbgeek17

Distinguished Member

logui

Member

Johannes S

Distinguished Member

bbgeek17

Distinguished Member

logui

Member

PwrBank

Active Member

logui

Member

We value your privacy