Cluster config with local directory storage

Rich.H

New Member
Oct 11, 2010
4
0
1
Hi folks. I have two PVE servers, vm-host-1 and vm-host-2, which I'm trying to cluster. Here's the storage config for vm-host-1 and vm-host-2, respectively:
vm-host-1_b4.pngvm-host-2_b4.png
I set up the cluster with vm-host-1 as master, per the installation docs, and can see both nodes from both servers. Storage for vm-host-1 is unchanged; however, vm-host-2's storage is now:
vm-host-2_aft.png
Note the missing VM2, and the oddball sizes on the vm-host-1 storage pools. I can re-add VM2, but any attempt to view or manipulate anything but the local pool on vm-host-2 thereafter returns "does not exist" or "no write permission" errors. Adding VM2 on the master (vm-host-1) shows it as 26.63G; obviously it isn't seeing VM2 on vm-host-2. Permissions are 755 on both VMn directories on both hosts.

What am I doing wrong?
 
the /etc/pve/storage.cfg file from the master should be synced to the node(s). therefore, you can only edit it on the master. as you got a different list on the nodes, the cluster sync seems not working - try to find out why.

you are using "Directory" and you are mark this as "shared" - what kind of file system to you mount here to prevent data corruption?
 
Is there information on the cluster sync process? All I found on the wiki was initial configuration and reversion.
After clustering, cluster.cfg and storage.cfg are identical on both nodes. cluster.cfg reflects the configuration of vm-host-1 (the master), so it looks like the sync is ignoring vm-host-2's (the slave) storage pools. One thing I noticed in storage.cfg is that there is nothing that tells vm-host-2 that storage VM1 and backup1 are on the master. This explains the odd 34.8G sizes displayed, since pve-root on host 2 is 35G. I get similar results if I add VM2 on the host 1 console. Clearly, both nodes think all defined storage pools are local to them. Do I have to make the devices known to NFS or such? Does defining them as "directory" somehow glitch the sync process? I did try rebuilding VM2 under LVM and adding the volume, but appears to have made no difference.

Re data integrity, these are test machines. The production servers will be ext3 on RAID 10.
 
Okay. So I split up the cluster, rebuilt the VM1 pool under LVM (as recommended in the docs) identically on both nodes:

- storage.cfg -
dir: local
path /var/lib/vz
content images,iso,vztmpl,rootdir

dir: Backup1
path /var/backup1
content backup

lvm: VM1
vgname lvm-raid
content images

Both servers worked perfectly stand-alone. I then re-created the cluster. A guest created on the "local" storage pool migrates between servers correctly. However, attempting to migrate a guest created on VM1 gives the following:
command finishedAbort/usr/bin/ssh -t -t -n -o BatchMode=yes 10.42.10.20 /usr/sbin/qmigrate 10.42.10.10 211
Oct 21 20:53:36 starting migration of VM 211 to host '10.42.10.10'
Oct 21 20:53:36 copying disk images
Oct 21 20:53:36 Failed to sync data - can't migrate 'VM1:vm-211-disk-1' - storagy type 'lvm' not supported
Oct 21 20:53:36 migration aborted
Connection to 10.42.10.20 closed.
VM 211 migration failed -


The manual, wiki, etc. all strongly, repeatedly recommend creating storage on LVM devices. Am I now to understand that guests cannot migrate between LVM pools? These are all KVM guests, btw.
 
The manual, wiki, etc. all strongly, repeatedly recommend creating storage on LVM devices. Am I now to understand that guests cannot migrate between LVM pools? These are all KVM guests, btw.

You need a shared storage to get that working (LVM on shared storage)
 
Okay: I tore down the cluster, reworked the storage pools to regular filesystem mounts (not LVM), made sure they were unshared, put everything back together, rebuilt the cluster, and....everything works. Both nodes are hosting running guests on the VM1 pool(s), and I can freely migrate guests between nodes. I could swear I had it configured this way early on, but perhaps not. Now we'll exercise it for a few weeks to make sure it's stable. Many thanks for your help; I learned a lot about PVE and KVM from this experience.

The big lesson learned (for me) is that all nodes must be configured identically. This is fine as far as it goes, but it disturbs me that the clustering software casually assumes that the nodes are identical. As I've accidentally demonstrated, this isn't necessarily so. Given the short life cycle of storage hardware (Seagate no longer makes the 640GB drives in my test servers), and the bizarre behavior PVE exhibits with mismatched storage pools, I'm going to be VERY reluctant to add nodes to a production cluster, or add or change out drives in existing nodes.

This isn't criticism, btw - PVE still works a LOT better than several other FOSS solutions I've tried - but it is leading up to a request: that the next PVE release allow configuration of node-unique storage pools (for example, node "A" being configured with storage pools "S1" and "S2"; node "B" with pools "S3", "S4", "S5"), and allow migration of guests between storage pools, even within a node. These closely-related enhancements would, imho, eliminate a lot of the current brittleness in the clustering software, and would add a LOT of flexibility to PVE's architecture; among other things, upgrading a cluster to add a SAN pool, for example, would become almost effortless because you could simply migrate guests from the existing node-local storage.

Thanks again,

RH
 
but it is leading up to a request: that the next PVE release allow configuration of node-unique storage pools (for example, node "A" being configured with storage pools "S1" and "S2"; node "B" with pools "S3", "S4", "S5"),

Yes, that is the plan.

and allow migration of guests between storage pools, even within a node.

Not for 2.0, but maybe later. Storage migration is very slow, so it is nothing you want to do normally.