Hi,
I'm looking at how best to use shared storage to keep things consistent. Preferably NFS, though iSCSI could also be used if that turns out to be safer.
We would run two NAS nodes that replicate for redundancy. We rely on the replication mechanism within the NAS to keep its internal filesystem consistent for NFS shares.
Then on the NAS there would be a qcow2 file for each VM using NFS. That file is used for VM storage, which means that the qcow2 file actually has the VM's filesystem within (For example LVM+XFS). The NAS would not know about that filesystem (It just knows that there is a qcow2 file) and therefore the replication to the secondary NAS node would not keep that in mind. It would just replicate according to its scheduled replication jobs.
Am I understanding correctly that if one such snapshot is taken mid-write - that the second NAS could actually contain an corrupt filesystem within the qcow2 file? Though, I would guess that the journalling functionality of XFS should theoretically be able to fix that if you need to use that replicated backup?
Then there is the matter of sync or async NFS mounts. I've found that running in sync mode greatly impacts performance. I'm leaning towards using sync mode 'to be sure', however what kind of risk are we talking about here. If the NAS (or more likely: The network) would fail, and we end up with an inconsistent XFS filesystem, should I expect journaling to be able to bring us back to a consistent state? Or are there other potential risks that I am missing?
On the application level about the most risky thing is probably the databases, which would have its own journal/WAL to - theoretically - recover from an abnormal shutdown. I'm not sure about the OS itself - though I would assume that any modern OS (Like RHEL) would be able to handle an inconsistent state of its application-level files. Or would that be incorrect?
I've played around with iSCSI as well. One thing I noticed is that writes seem to not be synchronous, the performance is actually better than asynchronous NFS. If I wanted to make those writes synchronous, would that have to be configured on the SAN side - or is that something that proxmox could control?
Regarding data consistency, there is no filesystem on the SAN side. Attaching the LUN directly to the VM means that the VM's filesystem will be the only thing on there, for example XFS. Does that mean that a network or SAN or network interruption, at worst, will cause a repair through the journal (And potentially on the application level as well)? Or are there other risks I'm missing?
The one thing I would fear with the iSCSI approach is having two initiators write to the same filesystem. Practically that means having two hosts have the same VM running through misconfiguration or an HA split brain. Which could then cause substantial corruption that cannot be recovered, as older data could be overwritten... which journalling is not going to protect us from. Thinking about it, this could actually also happen with the qcow file via NFS in a split-brain situation.
To prevent that scenario, would it make sense to implement some kind of locking mechanism on the storage - e.g. verify (through a lockfile on the shared storage) that there's nobody else running the same VM?
Any advice would be appreciated
I'm looking at how best to use shared storage to keep things consistent. Preferably NFS, though iSCSI could also be used if that turns out to be safer.
We would run two NAS nodes that replicate for redundancy. We rely on the replication mechanism within the NAS to keep its internal filesystem consistent for NFS shares.
Then on the NAS there would be a qcow2 file for each VM using NFS. That file is used for VM storage, which means that the qcow2 file actually has the VM's filesystem within (For example LVM+XFS). The NAS would not know about that filesystem (It just knows that there is a qcow2 file) and therefore the replication to the secondary NAS node would not keep that in mind. It would just replicate according to its scheduled replication jobs.
Am I understanding correctly that if one such snapshot is taken mid-write - that the second NAS could actually contain an corrupt filesystem within the qcow2 file? Though, I would guess that the journalling functionality of XFS should theoretically be able to fix that if you need to use that replicated backup?
Then there is the matter of sync or async NFS mounts. I've found that running in sync mode greatly impacts performance. I'm leaning towards using sync mode 'to be sure', however what kind of risk are we talking about here. If the NAS (or more likely: The network) would fail, and we end up with an inconsistent XFS filesystem, should I expect journaling to be able to bring us back to a consistent state? Or are there other potential risks that I am missing?
On the application level about the most risky thing is probably the databases, which would have its own journal/WAL to - theoretically - recover from an abnormal shutdown. I'm not sure about the OS itself - though I would assume that any modern OS (Like RHEL) would be able to handle an inconsistent state of its application-level files. Or would that be incorrect?
I've played around with iSCSI as well. One thing I noticed is that writes seem to not be synchronous, the performance is actually better than asynchronous NFS. If I wanted to make those writes synchronous, would that have to be configured on the SAN side - or is that something that proxmox could control?
Regarding data consistency, there is no filesystem on the SAN side. Attaching the LUN directly to the VM means that the VM's filesystem will be the only thing on there, for example XFS. Does that mean that a network or SAN or network interruption, at worst, will cause a repair through the journal (And potentially on the application level as well)? Or are there other risks I'm missing?
The one thing I would fear with the iSCSI approach is having two initiators write to the same filesystem. Practically that means having two hosts have the same VM running through misconfiguration or an HA split brain. Which could then cause substantial corruption that cannot be recovered, as older data could be overwritten... which journalling is not going to protect us from. Thinking about it, this could actually also happen with the qcow file via NFS in a split-brain situation.
To prevent that scenario, would it make sense to implement some kind of locking mechanism on the storage - e.g. verify (through a lockfile on the shared storage) that there's nobody else running the same VM?
Any advice would be appreciated