Hello!
A few weeks ago (at the beginning of March, around March 7th), my NFS shares weren't becoming online on three of my Proxmox servers. They're all running the latest version of Proxmox 6.3 (see attached txt file for the output of "pveversion -v"). My NFS server is also a fourth Proxmox 6.3 server with a ZFS pool with NFS exports configured with "sharenfs" ZFS parameters. The problem started after I installed the updates on February 27th if I'm not mistaken.
On top of the unavailable NFS shares, the Proxmox web interface on all hosts was becoming bugged after a few minutes the server booted. All the elements in the host "three view" have a "question mark" (see attached screenshot) and the graphs are no longuer updating. When it was happening, I'm able to "revive" the web interface for a moment if I kill the "pvestatd" and "pvedaemon" services and restart the "pvestatd" service (if I tried to only restart the service, the mount commands stayed stuck trying to mount the share indefinitely, so I needed to also "kill -9" the related processes), but it was always becoming bugged again after a few minutes. (see attached file "ps.txt")
I realized that it was related to the "pvestatd" or the "pvedaemon" services because when is was running the "ps auxf" command, there was a few sub-process of "pvestatd" and "pvedaemon" that were trying to mount the NFS shares indefinitely (what's strange is that the commands never seemed to timed out). If I run the command "systemctl status pvestatd" I can also see the stuck mount commands in their syslog.
The only "workaround" was to disable all the NFS storage shares on my Proxmox hosts to "fix" the issue (and kill the "pvestatd" and "pvedaemon" services and restart the "pvestatd" service to fix the web interface) but obviously I was unable to do backups on my NFS shares.
A few days later (2021-03-11), there was some Proxmox updates available. I installed them and the issue was gone.
But I realized today that I have exactly the same issue again since I rebooted the NFS server earlier today.
I first realized it because I received a vzdump error e-mail from the Proxmox / NFS server stating that a backup job failed because there was an hostname lookup error :
I don't think it's related, but because of that, Proxmox is unable to mount the local NFS shares (there's two storage NFS shares that I configured with the "localhost" address since they are on this host itself). When I try to look at the content of the share it's showing the same error above when it tries to mount the NFS shares (see attached screenshot). I fixed the issue by modifying the storage configuration file and replacing "localhost" by the IP "127.0.0.1".
But after I fixed that issue, I checked the status of the NFS shares/storage on my other three Proxmox servers and I realized that they are no longer becoming online with the exact same behavior than a few weeks ago. I've changed nothing on my configuration since. These servers are all accessible from the same network subnet and there's not firewall involved. Last time, it fixed itself after an update, but I don't know which one...
I checked the syslog on all servers, and there's not much.
Can you help me debug the issue, or is it a known bug?
Thanks!
A few weeks ago (at the beginning of March, around March 7th), my NFS shares weren't becoming online on three of my Proxmox servers. They're all running the latest version of Proxmox 6.3 (see attached txt file for the output of "pveversion -v"). My NFS server is also a fourth Proxmox 6.3 server with a ZFS pool with NFS exports configured with "sharenfs" ZFS parameters. The problem started after I installed the updates on February 27th if I'm not mistaken.
On top of the unavailable NFS shares, the Proxmox web interface on all hosts was becoming bugged after a few minutes the server booted. All the elements in the host "three view" have a "question mark" (see attached screenshot) and the graphs are no longuer updating. When it was happening, I'm able to "revive" the web interface for a moment if I kill the "pvestatd" and "pvedaemon" services and restart the "pvestatd" service (if I tried to only restart the service, the mount commands stayed stuck trying to mount the share indefinitely, so I needed to also "kill -9" the related processes), but it was always becoming bugged again after a few minutes. (see attached file "ps.txt")
I realized that it was related to the "pvestatd" or the "pvedaemon" services because when is was running the "ps auxf" command, there was a few sub-process of "pvestatd" and "pvedaemon" that were trying to mount the NFS shares indefinitely (what's strange is that the commands never seemed to timed out). If I run the command "systemctl status pvestatd" I can also see the stuck mount commands in their syslog.
The only "workaround" was to disable all the NFS storage shares on my Proxmox hosts to "fix" the issue (and kill the "pvestatd" and "pvedaemon" services and restart the "pvestatd" service to fix the web interface) but obviously I was unable to do backups on my NFS shares.
A few days later (2021-03-11), there was some Proxmox updates available. I installed them and the issue was gone.
But I realized today that I have exactly the same issue again since I rebooted the NFS server earlier today.
I first realized it because I received a vzdump error e-mail from the Proxmox / NFS server stating that a backup job failed because there was an hostname lookup error :
Code:
hostname lookup 'localhost' failed - got local IP address ''
But after I fixed that issue, I checked the status of the NFS shares/storage on my other three Proxmox servers and I realized that they are no longer becoming online with the exact same behavior than a few weeks ago. I've changed nothing on my configuration since. These servers are all accessible from the same network subnet and there's not firewall involved. Last time, it fixed itself after an update, but I don't know which one...
I checked the syslog on all servers, and there's not much.
- On the NFS server I only get errors about "pvestatd" trying in a loop to mount the NFS shares and falling (syslog_nfs-and-proxmox_server.txt).
- On one of the Proxmox server (which is a NFS client of the NFS server), that I rebooted since, I don't see any errors in the syslog. I can only see that it's trying indefinetely, without timing out, to mount the NFS shares.
- On another one of the Proxmox server, before rebooting it, I was getting errors about pvestatd who was unable to activate any of the NFS storage in a loop (syslog_proxmox_host.txt). After rebooting the server, I have the same behavior than the previous Proxmox host.
Can you help me debug the issue, or is it a known bug?
Thanks!
Attachments
Last edited: