[SOLVED] 2 nodes stopped responding on port 8006

Dec 28, 2019
32
2
8
31
Hi,

Among 6 nodes, 2 nodes suddenly stopped responding on port 8006 (web interface is not working). Recently we had done updates on each node. Does anyone have this same experience? Any idea how to troubleshoot? Thanks in advance.
 
SSH into the box and start checking if port is up via netstat.
After that see if the PVE services are running.
Probably a service is stopped / not coming up. You should investigate logfiles to see what's wrong.
 
Hi @tburger, Thanks for your reply.

I examined two nodes and below are the results.

Node1
-----------
:~# netstat -ntlp |grep 8006
tcp 129 0 0.0.0.0:8006 0.0.0.0:* LISTEN 85935/pveproxy work

:~# telnet localhost 8006
Trying 127.0.0.1...


Node3
-----------
:~# netstat -ntlp |grep 8006
tcp 16 0 0.0.0.0:8006 0.0.0.0:* LISTEN 1858/pveproxy


:~# telnet localhost 8006
Trying 127.0.0.1...
Connected to localhost.localdomain.
Escape character is '^]'.

Though the node3 is responding to port 8006, the WebGUI is not accessible.
 
Maybe its a client related issue?
You could try clearing the browser cache or using another browser.
 
No, I cleared the cache already. Not working. And both node1 and 3 not displaying any options like 'summary'.

Attached is a screenshot from node1
 

Attachments

  • node1.png
    node1.png
    23.9 KB · Views: 5
Hi @tburger, I think I got the issue. I ran 'systemctl status pveproxy' on both the nodes and both displaying an error like,

Mar 16 01:25:59 sg1-n1 pveproxy[97079]: /etc/pve/local/pve-ssl.pem: failed to use local certificate chain (cert_file or cert) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1713.

'Mar 16 01:28:43 sg1-n3 pveproxy[3450122]: /etc/pve/local/pve-ssl.pem: failed to use local certificate chain (cert_file or cert) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1713. Mar 16 01:28:43 satapx-sg1-n3 pveproxy[3450119]: worker exit"

But the service 'pveproxy' is running on both the nodes even though the mentioned error is there. Do you think these errors can really cause the web GUI inaccessible?
 
Last edited:
I just moved back the existing SSL files and regenerated 'node files' & 'node certificate' using 'pvecm updatecerts --force'. Then after I did a 'service pvedaemon restart' which resolved the 'communication failure error'. The web GUI is accessible now.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!