HA / Failover configuration

dadep · Feb 12, 2025

Hello,
I'm new on Proxmox, and I'm testing it for a possible future vmware replacement.
I'm trying two node cluster (PVE 8.3) with external quorum server (Qdevice, installed on a debian vm) and F.C. shared storage (see attached schema).
My approach on testing is trial and error style.
I got stuck on the ha and failover tests. I shut down one of the two cluster nodes and expect the VMs failover to the other node without interrupting service.
This does not happen. The VMs migrate to the surviving node, but they are in power off state, and do not restart. The error thas occurs is the following:
*************************************************
task started by HA resource agent
TASK ERROR: no such logical volume pve/vm-103-disk-0
*************************************************
The only solution I have found is to destroy the VMs and recreate them from scratch.
I'm definitely doing something wrong, butI can't figure out what.

thanks in advance to those who want to help me.
Best regards
Davide

bbgeek17 · Feb 12, 2025

Hi @dadep , welcome to the forum.

dadep said:
pve/vm-103-disk-0

The naming of this volume implies that it is located on a local storage (i.e. not on FC).

I would recommend that you review the article mentioned here: https://forum.proxmox.com/threads/understanding-lvm-shared-storage-in-proxmox.160693/

It may help with visualizing the layers involved and what, if anything, you need to correct.

Cheers.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

dadep · Feb 14, 2025

Hi bbgeekk17,
thank you for the explanation. So, from what I could understand by reading some documentation it seems that the most reliable solution is an HCI cluster with ceph. Configurations with shared storage are not suitable in production environments. This greatly limits the reuse of infrastructure already installed in many customers. I work in italy and many companies are in SMB maket segment with limited IT budget.
Do you confirm ?

Regards
Davide

bbgeek17 · Feb 14, 2025

dadep said:
So, from what I could understand by reading some documentation it seems that the most reliable solution is an HCI cluster with ceph. Configurations with shared storage are not suitable in production environments.

dadep said:
Do you confirm ?

I vehemently disagree.

dadep said:
I work in italy and many companies are in SMB maket segment with limited IT budget.

dadep said:
Do you confirm ?

If you are budget constrained then that severely limits your option, but that's not the fault of other solutions that exist.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

dadep · Feb 17, 2025

Hello bbgeek17,
ok, I'm pretty much unclear, but given my infrastructure, what is the configuration I can make to have storage that allows me to have HA and VM failover (just for testing) ?

thank you
best regards

bbgeek17 · Feb 17, 2025

dadep said:
but given my infrastructure, what is the configuration I can make to have storage that allows me to have HA and VM failover (just for testing) ?

You should use LVM-thick, as it is supported and approved solution to use out-of-the-box with the PVE.
The details are described here: https://kb.blockbridge.com/technote/proxmox-lvm-shared-storage/

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

alexskysilk · Feb 17, 2025

Listen to @bbgeek17, for he knows what he speaks of.

I'm only chiming in to let you know I have a customer with a very similar configuration with essentially the same solution (quorum node served on an existing vsphere cluster.) now onto specifics:

dadep said:
TASK ERROR: no such logical volume pve/vm-103-disk-0

unless you named your shared volume's store "pve" the vdisk is likely not on it. post the contents of /etc/pve/storage.cfg for a more detailed discussion.

dadep · Feb 18, 2025

Hi all,

Here the storage.cfg content and the screenshot of my storage configuration. Note tha tI have limited and very old linux background.

thank you
Davide

root@proxmox01:/etc/pve# cat storage.cfg
dir: local
path /var/lib/vz
content iso,backup,vztmpl

lvmthin: local-lvm
thinpool data
vgname pve
content images,rootdir

lvm: shared
vgname shared
content rootdir,images
nodes proxmox02,proxmox01
saferemove 0
shared 1

bbgeek17 · Feb 18, 2025

You have 3 storage pools in your configuration:

dir: local - this is directory on your root device, shared with your hypervisor OS.
lvmthin: local-lvm - this is, most likely, a slice of your bootable disk, sharing capacity of the this disk with your hypervisor OS
lvm: shared - this appears to be your lun from external SAN. Note, I am basing this conclusion on the properties of the storage pool.

dadep said:
TASK ERROR: no such logical volume pve/vm-103-disk-0

As mentioned earlier, the disk image in question is stored in the "local-lvm" storage pool, which is backed by the Volume Group (VG) named "pve." As the name suggests, this pool is local to a specific node, meaning that any data stored there will not be accessible from other nodes.

To ensure availability across nodes, you need to migrate (move) the data from "local-lvm" to the "shared" pool, which is backed by the Volume Group "shared."

Cheers

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

alexskysilk · Feb 18, 2025

I'm a mindreader

dadep · Feb 19, 2025

Hi,
about disk partitoning, I have installed PVE using GUI setup with default parameters.
Here the configuration of my two VM, where it seems the disks are on the shared volume.

also If I looks for vm-103-disk-0 I find it in

EDIT: I just tried :
live migration from one node to the second one and it is working.
power off one node, the VM failover now is working.

I didn't change any PVE configuration, just did some node reboots, deleted and recreated the VMs

fabian · Feb 19, 2025

dadep said:
recreated the VMs

well

that probably means that the old (now deleted) VMs had the wrong storage, and the new ones have the correct one..

Search

Search

HA / Failover configuration

dadep

New Member

bbgeek17

Distinguished Member

dadep

New Member

bbgeek17

Distinguished Member

dadep

New Member

bbgeek17

Distinguished Member

alexskysilk

Distinguished Member

dadep

New Member

bbgeek17

Distinguished Member

alexskysilk

Distinguished Member

dadep

New Member

Attachments

fabian

Proxmox Staff Member

We value your privacy