Hello all,
we are running a three node cluster (Proxmox Vers. 3.1-3). The VM images are stored on two shares provided by a failover storage server (Ubuntu 12.04, drbd, Corosync/Pacemaker).
Tonight while backup was running the cluster lost the connection to the storage for a very short moment. (As expected the backup stopped with an error – so far no problem)
All cluster members were able to reconnect share1 but unfortunately not share2.
In the morning when I discovered this unhappy state, I tried to connect to share2 on the storage servers from another Linux box for testing. This worked immediately and I could access everything on share2 form this box. Afterwards I unmounted this share.
I checked the logs on the storage servers but couldn’t find any messages that indicates unexpected behavior.
But all three proxmox cluster members were reporting
Afterwards I tried to unmount share2 on all cluster members manually from commandline hoping that the hosts will reconnect automatically after a short while. Unmounting worked without problems on two of the hosts. The third note denied this with a ‘share busy’ message. The share was locked by two vms. After stopping these vms I was able to unmount this share also.
Unfortunately the hosts didn’t reconnect automatically. After I've changed the share from the Proxmox GUI (allowing storing ISOs also) the hosts were reestablishing the connection. – Later I changed this setting back to store images only on this share.
Afterwards everything was running as before.
All shares are connected the same way:
Has anyone an idea why proxmox wasn’t able to reconnect automatically to share1 but not to share2?
Are there possibly any additional options for nfs shares that make reconnect a share more reliable?
Any hint is appreciated.
we are running a three node cluster (Proxmox Vers. 3.1-3). The VM images are stored on two shares provided by a failover storage server (Ubuntu 12.04, drbd, Corosync/Pacemaker).
Tonight while backup was running the cluster lost the connection to the storage for a very short moment. (As expected the backup stopped with an error – so far no problem)
All cluster members were able to reconnect share1 but unfortunately not share2.
In the morning when I discovered this unhappy state, I tried to connect to share2 on the storage servers from another Linux box for testing. This worked immediately and I could access everything on share2 form this box. Afterwards I unmounted this share.
I checked the logs on the storage servers but couldn’t find any messages that indicates unexpected behavior.
But all three proxmox cluster members were reporting
pvestatd[4397]: WARNING: unable to activate storage 'san2-VMs-nfs' - directory '/mnt/pve/san2-VMs-nfs' does not exist
Afterwards I tried to unmount share2 on all cluster members manually from commandline hoping that the hosts will reconnect automatically after a short while. Unmounting worked without problems on two of the hosts. The third note denied this with a ‘share busy’ message. The share was locked by two vms. After stopping these vms I was able to unmount this share also.
Unfortunately the hosts didn’t reconnect automatically. After I've changed the share from the Proxmox GUI (allowing storing ISOs also) the hosts were reestablishing the connection. – Later I changed this setting back to store images only on this share.
Afterwards everything was running as before.
All shares are connected the same way:
192.168.33.50:/nfs-share-lun1 on /mnt/pve/san1-VMs-nfs type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.33.50,mountvers=3,mountport=54612,mountproto=udp,local_lock=none,addr=192.168.33.50)
192.168.33.51:/nfs-share-lun2 on /mnt/pve/san2-VMs-nfs type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.33.51,mountvers=3,mountport=54612,mountproto=udp,local_lock=none,addr=192.168.33.51)
Has anyone an idea why proxmox wasn’t able to reconnect automatically to share1 but not to share2?
Are there possibly any additional options for nfs shares that make reconnect a share more reliable?
Any hint is appreciated.
Last edited: