pveproxy not starting and qm list won't work

danielb1

New Member
Sep 11, 2020
6
0
1
38
Hi,

I'm having problems in our PVE cluster. Currently running Proxmox 5.4 (will upgrade to 6 once the current cluster is healthy). The pveproxy won't start on all nodes, and also commands like "qm list" get stuck on every node. Also, browsing some folders inside /etc/pve makes SSH freeze totally.

From syslog:

Sep 11 06:42:45 pve1 systemd[1]: Stopped PVE API Proxy Server.
Sep 11 06:42:45 pve1 systemd[1]: pveproxy.service: Unit entered failed state.
Sep 11 06:42:45 pve1 systemd[1]: pveproxy.service: Failed with result 'timeout'.
Sep 11 06:42:45 pve1 systemd[1]: Starting PVE API Proxy Server...
Sep 11 06:44:15 pve1 systemd[1]: pveproxy.service: Start operation timed out. Terminating.
Sep 11 06:45:45 pve1 systemd[1]: pveproxy.service: State 'stop-final-sigterm' timed out. Killing.
Sep 11 06:45:45 pve1 systemd[1]: pveproxy.service: Killing process 18378 (pveproxy) with signal SIGKILL.
Sep 11 06:45:45 pve1 systemd[1]: pveproxy.service: Killing process 9774 (pveproxy worker) with signal SIGKILL.
Sep 11 06:45:45 pve1 systemd[1]: pveproxy.service: Killing process 22110 (pveproxy worker) with signal SIGKILL.
Sep 11 06:45:45 pve1 systemd[1]: pveproxy.service: Killing process 23064 (pveproxy worker) with signal SIGKILL.
Sep 11 06:45:45 pve1 systemd[1]: pveproxy.service: Killing process 887 (pveproxy) with signal SIGKILL.
Sep 11 06:45:45 pve1 systemd[1]: pveproxy.service: Killing process 7596 (pveproxy) with signal SIGKILL.
Sep 11 06:47:16 pve1 systemd[1]: pveproxy.service: Processes still around after final SIGKILL. Entering failed mode.
Sep 11 06:47:16 pve1 systemd[1]: Failed to start PVE API Proxy Server.
Sep 11 06:47:16 pve1 systemd[1]: pveproxy.service: Unit entered failed state.
Sep 11 06:47:16 pve1 systemd[1]: pveproxy.service: Failed with result 'timeout'
 
I noticed this in the syslog from yesterday:

pve1 pvestatd[4554]: storage 'NFS1' is not online

I can however see that it is mounted and accessible from the PVE.
 
Hi,

I restarted and tweaked some NFS settings in FreeNAS. Don't see any NFS related errors at the moment.

However, on our first node, I can see a lot of these:

Sep 11 09:47:51 pve1 pmxcfs[3870]: [status] notice: cpg_send_message retried 100 times
Sep 11 09:47:51 pve1 pmxcfs[3870]: [status] crit: cpg_send_message failed: 6
Sep 11 09:47:52 pve1 pmxcfs[3870]: [status] notice: cpg_send_message retry 10
Sep 11 09:47:53 pve1 pmxcfs[3870]: [status] notice: cpg_send_message retry 20
Sep 11 09:47:54 pve1 pmxcfs[3870]: [status] notice: cpg_send_message retry 30

What are causing these and how to get rid of these?
 
I also noticed that "pvesm status" is very slow on our first/main node. It's a lot faster on the other nodes.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!