Proxmox 3.3 Cluter HA and NFS Shared Storage

is-max · Jan 29, 2015

Hello,

For a testbed I'v set-up two servers with Proxmox Clustered and HA enabled, I'll call: IS-VmProx & IS-proxMox2.

I'v configured one OpenVZ Container on IS-proxMox2, and using cluster.conf I'v configured to relocate that machine on IS-proxMox2 after a failure.

I'v noticed, in case of failure of IS-proxMox2, the machine will be restarted on IS-VmProx, and it is correct.
But when IS-proxMox2 come back alive, the system try to migrate back the machine to IS-proxMox2 but it fail, there is the log:

Code:

Jan 29 10:51:25 IS-VmProx corosync[2577]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jan 29 10:51:32 IS-VmProx rgmanager[2935]: State change: IS-proxMox2 UP
Jan 29 10:51:32 IS-VmProx rgmanager[2935]: Migrating pvevm:103 to better node IS-proxMox2
Jan 29 10:51:32 IS-VmProx rgmanager[2935]: Migrating pvevm:103 to IS-proxMox2
Jan 29 10:51:32 IS-VmProx pvevm: <root@pam> starting task UPID:IS-VmProx:0002C5E2:00D8192C:54CA02A4:vzmigrate:103:root@pam:
Jan 29 10:51:33 IS-VmProx kernel: CT: 103: checkpointed
Jan 29 10:51:34 IS-VmProx kernel: vmbr0: port 3(veth103.0) entering disabled state
Jan 29 10:51:34 IS-VmProx kernel: device veth103.0 left promiscuous mode
Jan 29 10:51:34 IS-VmProx kernel: vmbr0: port 3(veth103.0) entering disabled state
Jan 29 10:51:35 IS-VmProx kernel: CT: 103: stopped
Jan 29 10:51:36 IS-VmProx rgmanager[2935]: Migration of pvevm:103 to IS-proxMox2 completed
Jan 29 10:51:36 IS-VmProx rgmanager[2935]: status on pvevm "103" returned 7 (unspecified)
Jan 29 10:51:43 IS-VmProx rgmanager[2935]: status on pvevm "103" returned 7 (unspecified)
Jan 29 10:51:43 IS-VmProx rgmanager[2935]: Recovering failed service pvevm:103
Jan 29 10:51:44 IS-VmProx rgmanager[181930]: [pvevm] Move config for CT 103 to local node
Jan 29 10:51:44 IS-VmProx pvevm: <root@pam> starting task UPID:IS-VmProx:0002C6BE:00D81DA5:54CA02B0:vzstart:103:root@pam:
Jan 29 10:51:44 IS-VmProx task UPID:IS-VmProx:0002C6BE:00D81DA5:54CA02B0:vzstart:103:root@pam:: starting CT 103: UPID:IS-VmProx:0002C6BE:00D81DA5:54CA02B0:vzstart:103:root@pam:

Code:

task started by HA resource agent
Jan 29 10:51:33 starting migration of CT 103 to node 'IS-proxMox2' (10.10.10.29)
Jan 29 10:51:33 container is running - using online migration
Jan 29 10:51:33 container data is on shared storage 'NAS-001_NFS'
Jan 29 10:51:33 start live migration - suspending container
Jan 29 10:51:33 dump container state
Jan 29 10:51:34 dump 2nd level quota
Jan 29 10:51:36 initialize container on remote node 'IS-proxMox2'
Jan 29 10:51:36 initializing remote quota
Jan 29 10:51:36 # /usr/bin/ssh -o 'BatchMode=yes' root@10.10.10.29 vzctl quotainit 103
Jan 29 10:51:36 vzquota : (warning) Quota file exists, it will be overwritten
Jan 29 10:51:36 vzquota : (error) quota check : stat /mnt/pve/NAS-001_NFS/private/103: No such file or directory
Jan 29 10:51:36 ERROR: online migrate failure - Failed to initialize quota: vzquota init failed [1]
Jan 29 10:51:36 start final cleanup
Jan 29 10:51:36 ERROR: migration finished with problems (duration 00:00:04)
TASK ERROR: migration problems

The most important part of the log is that:
Jan 29 10:51:36 vzquota : (error) quota check : stat /mnt/pve/NAS-001_NFS/private/103: No such file or directory

So, I'v checked what happend on IS-proxMox2, and seems when the machine bootup, it say too fast that he is ready for service, but he havent mounted the NAS-001_NFS yet.

I tought that I could fix it, putting the mount of the NFS on /etc/fstab, but I would know if it is a know issue, or I'm doing something wrong.

P.s. I i wait ~1minute after the boot I could manual migrate the machine without problem.

There is the cluster.conf:

Code:

<?xml version="1.0"?>
<cluster config_version="11" name="node-1">
  <cman keyfile="/var/lib/pve-cluster/corosync.authkey" two_node="1" expected_votes="1"/>
  <fencedevices>
    <fencedevice agent="fence_null" name="null_fence"/>
  </fencedevices>
  <clusternodes>
    <clusternode name="IS-VmProx" nodeid="1" votes="1">
      <fence>
        <method name="1">
          <device name="null_fence"/>
        </method>
      </fence>
    </clusternode>
    <clusternode name="IS-proxMox2" nodeid="2" votes="1">
      <fence>
        <method name="1">
          <device name="null_fence"/>
        </method>
      </fence>
    </clusternode>
  </clusternodes>
  <rm>
    <failoverdomains>
        <failoverdomain name="first_node" nofailback="0" ordered="1" restricted="1">
            <failoverdomainnode name="IS-VmProx" priority="1"/>
            <failoverdomainnode name="IS-proxMox2" priority="2"/>
        </failoverdomain>
        <failoverdomain name="second_node" nofailback="0" ordered="1" restricted="1">
            <failoverdomainnode name="IS-VmProx" priority="2"/>
            <failoverdomainnode name="IS-proxMox2" priority="1"/>
        </failoverdomain>
    </failoverdomains>
    <pvevm autostart="1" vmid="103" domain="second_node" recovery="relocate"/>
  </rm>
</cluster>

Thank you
Regards

is-max · Jan 30, 2015

Hello

Today I'v tryed to put the NFS on /etc/fstab, and everything work as expected. On the recovery the VM livemigrate to IS-proxMox2 correctly.
So thereis something wrong that take too much time to proxMox to mount the storage.

No-one had my same problems?

Regards

is-max · Feb 7, 2015

Hi,

Just as curiosity, but I'm the first one in having this problem, or who had that problem has applied workaround like the mine?

Regards

wolfgang · Feb 10, 2015

Hi,
do you used before Ip or names for the NFS?
If you used names this could be a problem on boot.

is-max · Feb 10, 2015

wolfgang said:
Hi,
do you used before Ip or names for the NFS?
If you used names this could be a problem on boot.

Hello,

I always use the IP.

Regards

Search

Search

Proxmox 3.3 Cluter HA and NFS Shared Storage

is-max

New Member

is-max

New Member

is-max

New Member

wolfgang

Proxmox Retired Staff

is-max

New Member

We value your privacy