HA / Failover configuration

dadep

New Member
Feb 10, 2025
5
1
3
Hello,
I'm new on Proxmox, and I'm testing it for a possible future vmware replacement.
I'm trying two node cluster (PVE 8.3) with external quorum server (Qdevice, installed on a debian vm) and F.C. shared storage (see attached schema).
My approach on testing is trial and error style.
I got stuck on the ha and failover tests. I shut down one of the two cluster nodes and expect the VMs failover to the other node without interrupting service.
This does not happen. The VMs migrate to the surviving node, but they are in power off state, and do not restart. The error thas occurs is the following:
*************************************************
task started by HA resource agent
TASK ERROR: no such logical volume pve/vm-103-disk-0
*************************************************
The only solution I have found is to destroy the VMs and recreate them from scratch.
I'm definitely doing something wrong, butI can't figure out what.

thanks in advance to those who want to help me.
Best regards
Davide


schema.jpg
 
  • Like
Reactions: waltar
Hi @dadep , welcome to the forum.

pve/vm-103-disk-0
The naming of this volume implies that it is located on a local storage (i.e. not on FC).

I would recommend that you review the article mentioned here: https://forum.proxmox.com/threads/understanding-lvm-shared-storage-in-proxmox.160693/

It may help with visualizing the layers involved and what, if anything, you need to correct.

Cheers.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: waltar
Hi bbgeekk17,
thank you for the explanation. So, from what I could understand by reading some documentation it seems that the most reliable solution is an HCI cluster with ceph. Configurations with shared storage are not suitable in production environments. This greatly limits the reuse of infrastructure already installed in many customers. I work in italy and many companies are in SMB maket segment with limited IT budget.
Do you confirm ?

Regards
Davide
 
So, from what I could understand by reading some documentation it seems that the most reliable solution is an HCI cluster with ceph. Configurations with shared storage are not suitable in production environments.
Do you confirm ?
I vehemently disagree.

I work in italy and many companies are in SMB maket segment with limited IT budget.
Do you confirm ?
If you are budget constrained then that severely limits your option, but that's not the fault of other solutions that exist.



Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: UdoB
Hello bbgeek17,
ok, I'm pretty much unclear, but given my infrastructure, what is the configuration I can make to have storage that allows me to have HA and VM failover (just for testing) ?

thank you
best regards
 
Listen to @bbgeek17, for he knows what he speaks of.

I'm only chiming in to let you know I have a customer with a very similar configuration with essentially the same solution (quorum node served on an existing vsphere cluster.) now onto specifics:

TASK ERROR: no such logical volume pve/vm-103-disk-0
unless you named your shared volume's store "pve" the vdisk is likely not on it. post the contents of /etc/pve/storage.cfg for a more detailed discussion.
 
Hi all,

Here the storage.cfg content and the screenshot of my storage configuration. Note tha tI have limited and very old linux background.

thank you
Davide

root@proxmox01:/etc/pve# cat storage.cfg
dir: local
path /var/lib/vz
content iso,backup,vztmpl

lvmthin: local-lvm
thinpool data
vgname pve
content images,rootdir

lvm: shared
vgname shared
content rootdir,images
nodes proxmox02,proxmox01
saferemove 0
shared 1

1739892904081.png
 
Last edited:
You have 3 storage pools in your configuration:
  • dir: local - this is directory on your root device, shared with your hypervisor OS.
  • lvmthin: local-lvm - this is, most likely, a slice of your bootable disk, sharing capacity of the this disk with your hypervisor OS
  • lvm: shared - this appears to be your lun from external SAN. Note, I am basing this conclusion on the properties of the storage pool.

TASK ERROR: no such logical volume pve/vm-103-disk-0
As mentioned earlier, the disk image in question is stored in the "local-lvm" storage pool, which is backed by the Volume Group (VG) named "pve." As the name suggests, this pool is local to a specific node, meaning that any data stored there will not be accessible from other nodes.

To ensure availability across nodes, you need to migrate (move) the data from "local-lvm" to the "shared" pool, which is backed by the Volume Group "shared."

Cheers


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Hi,
about disk partitoning, I have installed PVE using GUI setup with default parameters.
Here the configuration of my two VM, where it seems the disks are on the shared volume.

also If I looks for vm-103-disk-0 I find it in
1739953431061.png



1739950132362.png

1739950150444.png


EDIT: I just tried :
live migration from one node to the second one and it is working.
power off one node, the VM failover now is working.

I didn't change any PVE configuration, just did some node reboots, deleted and recreated the VMs
 

Attachments

  • 1739950626640.png
    1739950626640.png
    3 KB · Views: 5
  • 1739951765213.png
    1739951765213.png
    9.9 KB · Views: 1
  • 1739951963962.png
    1739951963962.png
    9.9 KB · Views: 4
Last edited: