pveproxy not starting and qm list won't work

danielb1 · Sep 11, 2020

Hi,

I'm having problems in our PVE cluster. Currently running Proxmox 5.4 (will upgrade to 6 once the current cluster is healthy). The pveproxy won't start on all nodes, and also commands like "qm list" get stuck on every node. Also, browsing some folders inside /etc/pve makes SSH freeze totally.

From syslog:

Sep 11 06:42:45 pve1 systemd[1]: Stopped PVE API Proxy Server.
Sep 11 06:42:45 pve1 systemd[1]: pveproxy.service: Unit entered failed state.
Sep 11 06:42:45 pve1 systemd[1]: pveproxy.service: Failed with result 'timeout'.
Sep 11 06:42:45 pve1 systemd[1]: Starting PVE API Proxy Server...
Sep 11 06:44:15 pve1 systemd[1]: pveproxy.service: Start operation timed out. Terminating.
Sep 11 06:45:45 pve1 systemd[1]: pveproxy.service: State 'stop-final-sigterm' timed out. Killing.
Sep 11 06:45:45 pve1 systemd[1]: pveproxy.service: Killing process 18378 (pveproxy) with signal SIGKILL.
Sep 11 06:45:45 pve1 systemd[1]: pveproxy.service: Killing process 9774 (pveproxy worker) with signal SIGKILL.
Sep 11 06:45:45 pve1 systemd[1]: pveproxy.service: Killing process 22110 (pveproxy worker) with signal SIGKILL.
Sep 11 06:45:45 pve1 systemd[1]: pveproxy.service: Killing process 23064 (pveproxy worker) with signal SIGKILL.
Sep 11 06:45:45 pve1 systemd[1]: pveproxy.service: Killing process 887 (pveproxy) with signal SIGKILL.
Sep 11 06:45:45 pve1 systemd[1]: pveproxy.service: Killing process 7596 (pveproxy) with signal SIGKILL.
Sep 11 06:47:16 pve1 systemd[1]: pveproxy.service: Processes still around after final SIGKILL. Entering failed mode.
Sep 11 06:47:16 pve1 systemd[1]: Failed to start PVE API Proxy Server.
Sep 11 06:47:16 pve1 systemd[1]: pveproxy.service: Unit entered failed state.
Sep 11 06:47:16 pve1 systemd[1]: pveproxy.service: Failed with result 'timeout'

dietmar · Sep 11, 2020

Maybe a hanging storage? Does the following command work without problems?

# pvesm status

danielb1 · Sep 11, 2020

Hi,

yes, I'm able to run "pvesm status" without problems.

danielb1 · Sep 11, 2020

I noticed this in the syslog from yesterday:

pve1 pvestatd[4554]: storage 'NFS1' is not online

I can however see that it is mounted and accessible from the PVE.

danielb1 · Sep 11, 2020

Hi,

I restarted and tweaked some NFS settings in FreeNAS. Don't see any NFS related errors at the moment.

However, on our first node, I can see a lot of these:

Sep 11 09:47:51 pve1 pmxcfs[3870]: [status] notice: cpg_send_message retried 100 times
Sep 11 09:47:51 pve1 pmxcfs[3870]: [status] crit: cpg_send_message failed: 6
Sep 11 09:47:52 pve1 pmxcfs[3870]: [status] notice: cpg_send_message retry 10
Sep 11 09:47:53 pve1 pmxcfs[3870]: [status] notice: cpg_send_message retry 20
Sep 11 09:47:54 pve1 pmxcfs[3870]: [status] notice: cpg_send_message retry 30

What are causing these and how to get rid of these?

danielb1 · Sep 11, 2020

I also noticed that "pvesm status" is very slow on our first/main node. It's a lot faster on the other nodes.

danielb1 · Sep 12, 2020

Well, running "service pve-cluster restart" on all nodes resolved this. Great!

Search

Search

pveproxy not starting and qm list won't work

danielb1

New Member

dietmar

Proxmox Staff Member

danielb1

New Member

danielb1

New Member

danielb1

New Member

danielb1

New Member

danielb1

New Member