Connection Problem between Host & Datastore

Ahmad Dhamiri

Member
Jul 25, 2019
14
0
6
29
Hi all,

Recently we have a problem for our host and datastore which seems weird.

In our cluster it consists of three host connected to datastores from HPE StoreEasy and NetApp (the storage network is the same)
But what confuses me that the first and third host have stable connection to the datastore on the NetApp, but the second host didn't. It gave the unknown symbol status to the NetApp ones but it can ping the datastore from StoreEasy

1585797334034.png 1585797353375.png

1585797380306.png

As you can see, the datastore NetApp on host 2 isn't available to the host. I wanted to try to remove it from the datacenter but I'm afraid the risk is too high. Is there a way so me and my team can overcome this problem without removing the datastore?
 
Hi!

Could you please clarify which storage belongs to the NetApp and to the StoreEasy? Especially
  • "datastore on the NetApp" is "nfs-proxmox-netapp?
  • What about the other storage with netapp in the name ("proxmox-datastore-netapp) - was this only a test?
Can you ping the NetApp from proxmox02?

Could you please post from proxmox02
Code:
cat /etc/pve/storage.cfg
and from proxmox02 and another node
Code:
mount | grep nfs

For the future: Instead of creating screenshots you can also post the output of pvesm status ;)
 
Hi Dominic.

Could you please clarify which storage belongs to the NetApp and to the StoreEasy? Especially
  • "datastore on the NetApp" is "nfs-proxmox-netapp?
  • What about the other storage with netapp in the name ("proxmox-datastore-netapp) - was this only a test?

Yes, "nfs-proxmox-netapp is the datastore from the NetApp storage (connection via NFS)

Can you ping the NetApp from proxmox02?
Now this is odd. I can ping the NetApp from proxmox02 as of now
root@proxmox02:~# ping 192.168.153.162
PING 192.168.153.162 (192.168.153.162) 56(84) bytes of data.
64 bytes from 192.168.153.162: icmp_seq=1 ttl=255 time=0.239 ms
64 bytes from 192.168.153.162: icmp_seq=2 ttl=255 time=0.147 ms

But as yesterday and last week, after I reboot the node I can ping NetApp for about 30 minutes before they lost connection with each other. to bad I did not capture yesterday's result.

Could you please post from proxmox02 cat /etc/pve/storage.cfg
and from proxmox02 and another node mount | grep nfs

*I'm choosing the trouble datastore if that's okay with you
cat /etc/pve/storage.cfg

nfs: nfs-proxmox-netapp
export /vol/vol_proxmox_nfs
path /mnt/pve/nfs-proxmox-netapp
server 192.168.153.162
content images,backup,snippets,rootdir,vztmpl,iso
maxfiles 1
nodes proxmox01,proxmox02,proxmox03

nfs: nfs-his-walet-datastore
export /vol/vol_proxmox_his
path /mnt/pve/nfs-his-walet-datastore
server 192.168.153.162
content iso,backup,snippets,images
maxfiles 1
nodes proxmox01,proxmox03,proxmox02

Host 1 said:
root@proxmox01:~# mount | grep nfs
192.168.153.102:/proxmox-nfs on /mnt/pve/nfs-datastore-proxmox01 type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.153.191,local_lock=none,addr=192.168.153.102)
192.168.153.162:/vol/vol_proxmox_his on /mnt/pve/nfs-his-walet-datastore type nfs (rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.153.162,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=192.168.153.162)
192.168.153.162:/vol/vol_proxmox_nfs on /mnt/pve/nfs-proxmox-netapp type nfs (rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.153.162,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=192.168.153.162)

Host 2 - The troubled one said:
root@proxmox02:~# mount | grep nfs
192.168.153.162:/vol/vol_proxmox_nfs on /mnt/pve/nfs-proxmox-netapp type nfs (rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.153.162,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=192.168.153.162)
192.168.153.102:/proxmox-nfs on /mnt/pve/nfs-datastore-proxmox01 type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.153.192,local_lock=none,addr=192.168.153.102)
192.168.153.162:/vol/vol_proxmox_his on /mnt/pve/nfs-his-walet-datastore type nfs (rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.153.162,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=192.168.153.162)

Host 3 said:
root@proxmox03:~# mount | grep nfs
192.168.153.162:/vol/vol_proxmox_nfs on /mnt/pve/nfs-proxmox-netapp type nfs (rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.153.162,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=192.168.153.162)
192.168.153.102:/proxmox-nfs on /mnt/pve/nfs-datastore-proxmox01 type nfs4 (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.153.193,local_lock=none,addr=192.168.153.102)
192.168.153.162:/vol/vol_proxmox_his on /mnt/pve/nfs-his-walet-datastore type nfs (rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.153.162,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=192.168.153.162)

For the future: Instead of creating screenshots you can also post the output of pvesm status ;)

Noted, I provided the result from the code you mentioned


Host 3 said:
pvesm status
WARNING: Not using device /dev/sdc for PV mDcaE6-h1pY-YkhM-cUt5-iBg9-R0gM-Wzp1VV.
WARNING: Not using device /dev/sdd for PV mDcaE6-h1pY-YkhM-cUt5-iBg9-R0gM-Wzp1VV.
WARNING: PV mDcaE6-h1pY-YkhM-cUt5-iBg9-R0gM-Wzp1VV prefers device /dev/sdb because device was seen first.
WARNING: PV mDcaE6-h1pY-YkhM-cUt5-iBg9-R0gM-Wzp1VV prefers device /dev/sdb because device was seen first.
WARNING: Not using device /dev/sdc for PV mDcaE6-h1pY-YkhM-cUt5-iBg9-R0gM-Wzp1VV.
WARNING: Not using device /dev/sdd for PV mDcaE6-h1pY-YkhM-cUt5-iBg9-R0gM-Wzp1VV.
WARNING: PV mDcaE6-h1pY-YkhM-cUt5-iBg9-R0gM-Wzp1VV prefers device /dev/sdb because device was seen first.
WARNING: PV mDcaE6-h1pY-YkhM-cUt5-iBg9-R0gM-Wzp1VV prefers device /dev/sdb because device was seen first.
Name Type Status Total Used Available %
iscsi-proxmox-datastore01 lvm active 1181114368 672137216 508977152 56.91%
iscsi-proxmox-datastore02 lvm active 1610608640 236978176 1373630464 14.71%
local dir active 70950056 28806848 38496132 40.60%
local-lvm lvmthin active 189812736 3131910 186680825 1.65%
nfs-datastore-proxmox01 nfs active 10752000000 7758620672 2993379328 72.16%
nfs-his-walet-datastore nfs active 8670465280 1316617856 7353847424 15.19%
nfs-proxmox-netapp nfs active 2040109504 427058816 1613050688 20.93%
proxmox-datastore-netapp iscsi active 0 0 0 0.00%
proxmox-datastore01 iscsi active 0 0 0 0.00%
proxmox-datastore02 iscsi active 0 0 0 0.00%

Is there anything wrong with my pvesm status and mount | grep nfs?
 
Thank you for providing the detailed output! This makes it a lot easier to get an overview.

If I understand this correctly, then the time where the storage is unavailable in Proxmox VE is exactly the time where you cannot even ping the NetApp storage from proxmox02. This would be a hint that it is not Proxmox VE itself which causes the trouble but rather some underlying network problem.

Your output suggests that currently you can not only ping the NetApp storage but really use it. Please check out your network with tools like ping if this changes again.
 
Thank you for providing the detailed output! This makes it a lot easier to get an overview.

If I understand this correctly, then the time where the storage is unavailable in Proxmox VE is exactly the time where you cannot even ping the NetApp storage from proxmox02. This would be a hint that it is not Proxmox VE itself which causes the trouble but rather some underlying network problem.

Your output suggests that currently you can not only ping the NetApp storage but really use it. Please check out your network with tools like ping if this changes again.

Thanks for the respond.
I think it does related to the network problem. I'm afraid that it could be the adapter problem, which could be the worst case scenario.

So my team just tried to change the ip address and reboot for node 2, and so far it resolves the problem with the NetApp

cat /etc/pve/storage.cfg

nfs: nfs-proxmox-netapp
export /vol/vol_proxmox_nfs
path /mnt/pve/nfs-proxmox-netapp
server 192.168.153.162
content images,backup,snippets,rootdir,vztmpl,iso
maxfiles 1
nodes proxmox01,proxmox02,proxmox03

nfs: nfs-his-walet-datastore
export /vol/vol_proxmox_his
path /mnt/pve/nfs-his-walet-datastore
server 192.168.153.162
content iso,backup,snippets,images
maxfiles 1
nodes proxmox01,proxmox03,proxmox02


root@proxmox02:~# ping 192.168.153.162
PING 192.168.153.162 (192.168.153.162) 56(84) bytes of data.
64 bytes from 192.168.153.162: icmp_seq=1 ttl=255 time=0.088 ms
64 bytes from 192.168.153.162: icmp_seq=2 ttl=255 time=0.121 ms
64 bytes from 192.168.153.162: icmp_seq=3 ttl=255 time=0.120 ms
64 bytes from 192.168.153.162: icmp_seq=4 ttl=255 time=0.134 ms
^C
--- 192.168.153.162 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 54ms
rtt min/avg/max/mdev = 0.088/0.115/0.134/0.021 ms

1585818834460.png

However now it seems like it has a connection problem with the HPE StoreEasy. The connection seems okay but on the Proxmox portal it shows the unknown status. As usual, node 1 and 3 shows no problem.

HPE StoreEasy NFS & iSCSI LVM configuration: said:
I provide the configuration we use for HPE StoreEasy storage.
cat /etc/pve/storage.cfg

nfs: nfs-datastore-proxmox01
export /proxmox-nfs
path /mnt/pve/nfs-datastore-proxmox01
server 192.168.153.102
content backup,images,vztmpl,iso,snippets,rootdir
maxfiles 5
nodes proxmox01,proxmox03,proxmox02

iscsi: proxmox-datastore01
portal 192.168.153.102
target iqn.1991-05.com.microsoft:win-m6ndhfubpa0-proxmox-datastore01-target
content images
nodes proxmox02,proxmox03,proxmox01

lvm: iscsi-proxmox-datastore01
vgname iscsi-proxmox-datastore01
base proxmox-datastore01:0.0.0.scsi-360003ff44dc75adcb186205b3c9ca3cc
content images,rootdir
shared 1

iscsi: proxmox-datastore02
portal 192.168.153.102:3260
target iqn.1991-05.com.microsoft:win-m6ndhfubpa0-proxmox-datastore02-target
content images
nodes proxmox02,proxmox03,proxmox01

lvm: iscsi-proxmox-datastore02
vgname iscsi-proxmox-datastore02
base proxmox-datastore02:0.0.0.scsi-360003ff44dc75adc93e366fe8fb88ea7
content rootdir,images
nodes proxmox02,proxmox03,proxmox01
shared 1

1585818652803.png

But somehow node 2 still can ping to the StoreEasy storage.
root@proxmox02:~# ping 192.168.153.102
PING 192.168.153.102 (192.168.153.102) 56(84) bytes of data.
64 bytes from 192.168.153.102: icmp_seq=1 ttl=128 time=0.387 ms
64 bytes from 192.168.153.102: icmp_seq=2 ttl=128 time=0.289 ms
64 bytes from 192.168.153.102: icmp_seq=3 ttl=128 time=0.190 ms
64 bytes from 192.168.153.102: icmp_seq=4 ttl=128 time=0.291 ms
^C
--- 192.168.153.102 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 58ms
rtt min/avg/max/mdev = 0.190/0.289/0.387/0.070 ms

I forgot to mention that we just upgrade Proxmox from version 6.0 to 6.1-8

So I think the network isn't the problem here. I have to assume there might be a little bug on the latest version of Proxmox, which I find is not likely or probably something to do with our NFS configuration from our side or the hardware we use.

Side note : So far iSCSI LVM for StoreEasy has been good in condition so far and we might want plan to use it in future, but seems not really reliable since we prefer to use qcow2 images for our VMs and iSCSI LVM storages only provides raw disk images format so far. If there's a workaround where iSCSI LVM storages can utilize or store qcow2 images, that would be dope.
 
Can you do something different than pinging with your StoreEasy? For example showmount -e 192.168.153.102 from proxmox02?

Has something from mount | grep nfs on proxmox02 changed unexpectedly?

Can you use the NFS on proxmox02 outside the Proxmox VE GUI, for example ls /mnt/pve/nfs-datastore-proxmox01?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!