Proxmox 3.3 Cluter HA and NFS Shared Storage

is-max

New Member
Jan 29, 2015
15
1
3
Hello,

For a testbed I'v set-up two servers with Proxmox Clustered and HA enabled, I'll call: IS-VmProx & IS-proxMox2.

I'v configured one OpenVZ Container on IS-proxMox2, and using cluster.conf I'v configured to relocate that machine on IS-proxMox2 after a failure.

I'v noticed, in case of failure of IS-proxMox2, the machine will be restarted on IS-VmProx, and it is correct.
But when IS-proxMox2 come back alive, the system try to migrate back the machine to IS-proxMox2 but it fail, there is the log:
Code:
Jan 29 10:51:25 IS-VmProx corosync[2577]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jan 29 10:51:32 IS-VmProx rgmanager[2935]: State change: IS-proxMox2 UP
Jan 29 10:51:32 IS-VmProx rgmanager[2935]: Migrating pvevm:103 to better node IS-proxMox2
Jan 29 10:51:32 IS-VmProx rgmanager[2935]: Migrating pvevm:103 to IS-proxMox2
Jan 29 10:51:32 IS-VmProx pvevm: <root@pam> starting task UPID:IS-VmProx:0002C5E2:00D8192C:54CA02A4:vzmigrate:103:root@pam:
Jan 29 10:51:33 IS-VmProx kernel: CT: 103: checkpointed
Jan 29 10:51:34 IS-VmProx kernel: vmbr0: port 3(veth103.0) entering disabled state
Jan 29 10:51:34 IS-VmProx kernel: device veth103.0 left promiscuous mode
Jan 29 10:51:34 IS-VmProx kernel: vmbr0: port 3(veth103.0) entering disabled state
Jan 29 10:51:35 IS-VmProx kernel: CT: 103: stopped
Jan 29 10:51:36 IS-VmProx rgmanager[2935]: Migration of pvevm:103 to IS-proxMox2 completed
Jan 29 10:51:36 IS-VmProx rgmanager[2935]: status on pvevm "103" returned 7 (unspecified)
Jan 29 10:51:43 IS-VmProx rgmanager[2935]: status on pvevm "103" returned 7 (unspecified)
Jan 29 10:51:43 IS-VmProx rgmanager[2935]: Recovering failed service pvevm:103
Jan 29 10:51:44 IS-VmProx rgmanager[181930]: [pvevm] Move config for CT 103 to local node
Jan 29 10:51:44 IS-VmProx pvevm: <root@pam> starting task UPID:IS-VmProx:0002C6BE:00D81DA5:54CA02B0:vzstart:103:root@pam:
Jan 29 10:51:44 IS-VmProx task UPID:IS-VmProx:0002C6BE:00D81DA5:54CA02B0:vzstart:103:root@pam:: starting CT 103: UPID:IS-VmProx:0002C6BE:00D81DA5:54CA02B0:vzstart:103:root@pam:
Code:
task started by HA resource agent
Jan 29 10:51:33 starting migration of CT 103 to node 'IS-proxMox2' (10.10.10.29)
Jan 29 10:51:33 container is running - using online migration
Jan 29 10:51:33 container data is on shared storage 'NAS-001_NFS'
Jan 29 10:51:33 start live migration - suspending container
Jan 29 10:51:33 dump container state
Jan 29 10:51:34 dump 2nd level quota
Jan 29 10:51:36 initialize container on remote node 'IS-proxMox2'
Jan 29 10:51:36 initializing remote quota
Jan 29 10:51:36 # /usr/bin/ssh -o 'BatchMode=yes' root@10.10.10.29 vzctl quotainit 103
Jan 29 10:51:36 vzquota : (warning) Quota file exists, it will be overwritten
Jan 29 10:51:36 vzquota : (error) quota check : stat /mnt/pve/NAS-001_NFS/private/103: No such file or directory
Jan 29 10:51:36 ERROR: online migrate failure - Failed to initialize quota: vzquota init failed [1]
Jan 29 10:51:36 start final cleanup
Jan 29 10:51:36 ERROR: migration finished with problems (duration 00:00:04)
TASK ERROR: migration problems

The most important part of the log is that:
Jan 29 10:51:36 vzquota : (error) quota check : stat /mnt/pve/NAS-001_NFS/private/103: No such file or directory

So, I'v checked what happend on IS-proxMox2, and seems when the machine bootup, it say too fast that he is ready for service, but he havent mounted the NAS-001_NFS yet.

I tought that I could fix it, putting the mount of the NFS on /etc/fstab, but I would know if it is a know issue, or I'm doing something wrong.

P.s. I i wait ~1minute after the boot I could manual migrate the machine without problem.

There is the cluster.conf:
Code:
<?xml version="1.0"?>
<cluster config_version="11" name="node-1">
  <cman keyfile="/var/lib/pve-cluster/corosync.authkey" two_node="1" expected_votes="1"/>
  <fencedevices>
    <fencedevice agent="fence_null" name="null_fence"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="IS-VmProx" nodeid="1" votes="1">
      <fence>
        <method name="1">
          <device name="null_fence"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="IS-proxMox2" nodeid="2" votes="1">
      <fence>
        <method name="1">
          <device name="null_fence"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <rm>
    <failoverdomains>
        <failoverdomain name="first_node" nofailback="0" ordered="1" restricted="1">
            <failoverdomainnode name="IS-VmProx" priority="1"/>
            <failoverdomainnode name="IS-proxMox2" priority="2"/>
        </failoverdomain>
        <failoverdomain name="second_node" nofailback="0" ordered="1" restricted="1">
            <failoverdomainnode name="IS-VmProx" priority="2"/>
            <failoverdomainnode name="IS-proxMox2" priority="1"/>
        </failoverdomain>
    </failoverdomains>
    <pvevm autostart="1" vmid="103" domain="second_node" recovery="relocate"/>
  </rm>
</cluster>

Thank you
Regards
 
Hello

Today I'v tryed to put the NFS on /etc/fstab, and everything work as expected. On the recovery the VM livemigrate to IS-proxMox2 correctly.
So thereis something wrong that take too much time to proxMox to mount the storage.

No-one had my same problems?

Regards
 
Hi,

Just as curiosity, but I'm the first one in having this problem, or who had that problem has applied workaround like the mine?

Regards
 
Hi,
do you used before Ip or names for the NFS?
If you used names this could be a problem on boot.