NFS shares not becoming online

BenPi

Member
Mar 15, 2019
4
0
6
Hello!

A few weeks ago (at the beginning of March, around March 7th), my NFS shares weren't becoming online on three of my Proxmox servers. They're all running the latest version of Proxmox 6.3 (see attached txt file for the output of "pveversion -v"). My NFS server is also a fourth Proxmox 6.3 server with a ZFS pool with NFS exports configured with "sharenfs" ZFS parameters. The problem started after I installed the updates on February 27th if I'm not mistaken.

On top of the unavailable NFS shares, the Proxmox web interface on all hosts was becoming bugged after a few minutes the server booted. All the elements in the host "three view" have a "question mark" (see attached screenshot) and the graphs are no longuer updating. When it was happening, I'm able to "revive" the web interface for a moment if I kill the "pvestatd" and "pvedaemon" services and restart the "pvestatd" service (if I tried to only restart the service, the mount commands stayed stuck trying to mount the share indefinitely, so I needed to also "kill -9" the related processes), but it was always becoming bugged again after a few minutes. (see attached file "ps.txt")

I realized that it was related to the "pvestatd" or the "pvedaemon" services because when is was running the "ps auxf" command, there was a few sub-process of "pvestatd" and "pvedaemon" that were trying to mount the NFS shares indefinitely (what's strange is that the commands never seemed to timed out). If I run the command "systemctl status pvestatd" I can also see the stuck mount commands in their syslog.

The only "workaround" was to disable all the NFS storage shares on my Proxmox hosts to "fix" the issue (and kill the "pvestatd" and "pvedaemon" services and restart the "pvestatd" service to fix the web interface) but obviously I was unable to do backups on my NFS shares.

A few days later (2021-03-11), there was some Proxmox updates available. I installed them and the issue was gone.

But I realized today that I have exactly the same issue again since I rebooted the NFS server earlier today.

I first realized it because I received a vzdump error e-mail from the Proxmox / NFS server stating that a backup job failed because there was an hostname lookup error :
Code:
hostname lookup 'localhost' failed - got local IP address ''
I don't think it's related, but because of that, Proxmox is unable to mount the local NFS shares (there's two storage NFS shares that I configured with the "localhost" address since they are on this host itself). When I try to look at the content of the share it's showing the same error above when it tries to mount the NFS shares (see attached screenshot). I fixed the issue by modifying the storage configuration file and replacing "localhost" by the IP "127.0.0.1".

But after I fixed that issue, I checked the status of the NFS shares/storage on my other three Proxmox servers and I realized that they are no longer becoming online with the exact same behavior than a few weeks ago. I've changed nothing on my configuration since. These servers are all accessible from the same network subnet and there's not firewall involved. Last time, it fixed itself after an update, but I don't know which one...

I checked the syslog on all servers, and there's not much.
  • On the NFS server I only get errors about "pvestatd" trying in a loop to mount the NFS shares and falling (syslog_nfs-and-proxmox_server.txt).
  • On one of the Proxmox server (which is a NFS client of the NFS server), that I rebooted since, I don't see any errors in the syslog. I can only see that it's trying indefinetely, without timing out, to mount the NFS shares.
  • On another one of the Proxmox server, before rebooting it, I was getting errors about pvestatd who was unable to activate any of the NFS storage in a loop (syslog_proxmox_host.txt). After rebooting the server, I have the same behavior than the previous Proxmox host.
By the way, I obviously tried to reboot the servers, but I'm still having the same issues.

Can you help me debug the issue, or is it a known bug?

Thanks!
 

Attachments

  • 2021-04-11 16_33_37-Window.png
    2021-04-11 16_33_37-Window.png
    11.4 KB · Views: 2
  • 2021-04-11 16_51_54-Window.png
    2021-04-11 16_51_54-Window.png
    2.9 KB · Views: 2
  • pveversion.txt
    1.4 KB · Views: 2
  • syslog_nfs-and-proxmox_server.txt
    1.5 KB · Views: 0
  • syslog_proxmox_host.txt
    1.8 KB · Views: 0
  • ps.txt
    1.1 KB · Views: 0
Last edited:
After I posted my original post, I was able to fix (get around) the issue but I'm able to reproduce the issue afterwards consistently.

So the issue wasn't fixed by a previous update and broken again by a subsequent one... I just randomly rebooted the servers, disabled the NFS storage and restarted the services in the right order to fix the issue a month ago. But I still think that the issue probably started following an update, since I didn't modified the NFS server configuration for months before that issue started and I only installed updates periodically on these servers. The last modification I've done, was adding a volume with its related NFS share and adding an IP that can access that share by modifying a "sharenfs" parameter of that ZFS volume but I didn't touch any NFS parameters at the service level.

So, when all my NFS storage is working fine on my hosts, if I reboot the NFS server, I experience all the issue I described in my previous post.

Normally, when I rebooted the NFS server in the past, the NFS shares were remounting on all my hosts automatically after the NFS server finished booting.

So here's how I can reproduce the issue. If I reboot the NFS server I have these issues on the Proxmox hosts :
  • After rebooting the NFS server I get these kind of errors for NFS shares, in loop, in the syslog of the Proxmox hosts.
    • "pvestatd[4072]: unable to activate storage [...]"
  • If I reboot the Proxmox hosts, I have issues with "pvestatd" and the "pvedaemon" services that get stuck trying to mount the NFS shares like I explained in my previous post.
  • If I reboot the NFS server without doing anything on the Proxmox hosts, the issue is continuing.
If I disable the NFS storage entries on the Proxmox hosts, restart the services, and I reboot the NFS server again, after it finished rebooting I'm able to enable again the NFS storage entries on the hosts and the NFS shares are now are accessible again on all the Proxmox servers.

But if I reboot the NFS server again, I have the same issues again.
 
Last edited:
So I guess it's to specific a problem. If nobody can help me here, what's the best way to report a bug then?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!