node in cluster "red", 100% pmxcfs process

encore

Member
May 4, 2018
95
0
6
31
Hi,

we have one node in our cluster what becomes "red" every 1-3 days. Can't open "summary" or "system" then. We see a 100% pmxcfs process and need to restart the whole node to fix the issue temporarily.

Linux bondsir003-74050-bl10 4.15.18-14-pve #1 SMP PVE 4.15.18-38 (Tue, 30 Apr 2019 10:51:33 +0200) x86_64 GNU/Linux
Syslog:
May 21 16:20:45 bondsir003-74050-bl10 systemd[1]: Started Session 4309 of user root.
May 21 16:25:41 bondsir003-74050-bl10 systemd[1]: Started Session 4315 of user root.
May 21 16:25:57 bondsir003-74050-bl10 systemd[1]: Stopping The Proxmox VE cluster filesystem...
May 21 16:26:01 bondsir003-74050-bl10 cron[1840]: (*system*vzdump) CAN'T OPEN SYMLINK (/etc/cron.d/vzdump)
May 21 16:26:08 bondsir003-74050-bl10 systemd[1]: pve-cluster.service: State 'stop-sigterm' timed out. Killing.
May 21 16:26:08 bondsir003-74050-bl10 systemd[1]: pve-cluster.service: Killing process 1788 (pmxcfs) with signal SIGKILL.
May 21 16:26:08 bondsir003-74050-bl10 systemd[1]: pve-cluster.service: Main process exited, code=killed, status=9/KILL
May 21 16:26:08 bondsir003-74050-bl10 pvesr[33277]: error with cfs lock 'file-replication_cfg': no quorum!
May 21 16:26:08 bondsir003-74050-bl10 pve-ha-lrm[1988]: unable to write lrm status file - closing file '/etc/pve/nodes/bondsir003-74050-bl10/lrm_status.tmp.1988' failed - Software caused connection abort
May 21 16:26:08 bondsir003-74050-bl10 systemd[1]: Stopped The Proxmox VE cluster filesystem.
May 21 16:26:08 bondsir003-74050-bl10 systemd[1]: pve-cluster.service: Unit entered failed state.
May 21 16:26:08 bondsir003-74050-bl10 systemd[1]: pve-cluster.service: Failed with result 'timeout'.
May 21 16:26:08 bondsir003-74050-bl10 systemd[1]: Stopping Corosync Cluster Engine...
May 21 16:26:08 bondsir003-74050-bl10 systemd[1]: pvesr.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
May 21 16:26:08 bondsir003-74050-bl10 systemd[1]: Failed to start Proxmox VE replication runner.
May 21 16:26:08 bondsir003-74050-bl10 systemd[1]: pvesr.service: Unit entered failed state.
May 21 16:26:08 bondsir003-74050-bl10 systemd[1]: pvesr.service: Failed with result 'exit-code'.
May 21 16:26:08 bondsir003-74050-bl10 systemd[1]: Starting Proxmox VE replication runner...
May 21 16:26:08 bondsir003-74050-bl10 pve-firewall[1896]: status update error: Connection refused
May 21 16:26:08 bondsir003-74050-bl10 pve-firewall[1896]: firewall update time (972.065 seconds)
May 21 16:26:08 bondsir003-74050-bl10 pve-firewall[1896]: status update error: Connection refused
May 21 16:26:08 bondsir003-74050-bl10 pveproxy[63193]: ipcc_send_rec[1] failed: Connection refused
May 21 16:26:08 bondsir003-74050-bl10 pveproxy[63193]: ipcc_send_rec[2] failed: Connection refused
May 21 16:26:08 bondsir003-74050-bl10 pveproxy[63193]: ipcc_send_rec[3] failed: Connection refused
May 21 16:26:08 bondsir003-74050-bl10 pvestatd[1892]: ipcc_send_rec[4] failed: Connection refused
May 21 16:26:08 bondsir003-74050-bl10 pvesr[35287]: ipcc_send_rec[1] failed: Connection refused
May 21 16:26:08 bondsir003-74050-bl10 pvesr[35287]: ipcc_send_rec[2] failed: Connection refused
May 21 16:26:08 bondsir003-74050-bl10 pvesr[35287]: ipcc_send_rec[3] failed: Connection refused
May 21 16:26:08 bondsir003-74050-bl10 pvesr[35287]: Unable to load access control list: Connection refused
May 21 16:26:08 bondsir003-74050-bl10 systemd[1]: pvesr.service: Main process exited, code=exited, status=111/n/a
May 21 16:26:08 bondsir003-74050-bl10 systemd[1]: Failed to start Proxmox VE replication runner.
May 21 16:26:08 bondsir003-74050-bl10 systemd[1]: pvesr.service: Unit entered failed state.
May 21 16:26:08 bondsir003-74050-bl10 systemd[1]: pvesr.service: Failed with result 'exit-code'.
May 21 16:26:08 bondsir003-74050-bl10 pveproxy[63194]: ipcc_send_rec[1] failed: Connection refused
May 21 16:26:08 bondsir003-74050-bl10 pveproxy[63194]: ipcc_send_rec[2] failed: Connection refused
May 21 16:26:08 bondsir003-74050-bl10 pveproxy[63194]: ipcc_send_rec[3] failed: Connection refused
May 21 16:26:08 bondsir003-74050-bl10 pvestatd[1892]: ipcc_send_rec[4] failed: Connection refused
May 21 16:26:08 bondsir003-74050-bl10 pvestatd[1892]: ipcc_send_rec[4] failed: Connection refused
May 21 16:26:08 bondsir003-74050-bl10 pvestatd[1892]: ipcc_send_rec[4] failed: Connection refused
May 21 16:26:08 bondsir003-74050-bl10 pvestatd[1892]: ipcc_send_rec[4] failed: Connection refused
May 21 16:26:08 bondsir003-74050-bl10 pvestatd[1892]: status update time (977.179 seconds)
May 21 16:26:08 bondsir003-74050-bl10 pvestatd[1892]: ipcc_send_rec[1] failed: Connection refused
May 21 16:26:08 bondsir003-74050-bl10 pvestatd[1892]: ipcc_send_rec[2] failed: Connection refused
May 21 16:26:08 bondsir003-74050-bl10 pvestatd[1892]: ipcc_send_rec[3] failed: Connection refused
May 21 16:26:08 bondsir003-74050-bl10 pvestatd[1892]: ipcc_send_rec[4] failed: Connection refused
May 21 16:26:08 bondsir003-74050-bl10 pvestatd[1892]: status update error: Connection refused
May 21 16:26:09 bondsir003-74050-bl10 pveproxy[63192]: ipcc_send_rec[1] failed: Connection refused
May 21 16:26:09 bondsir003-74050-bl10 pveproxy[63192]: ipcc_send_rec[2] failed: Connection refused
May 21 16:26:09 bondsir003-74050-bl10 pveproxy[63192]: ipcc_send_rec[3] failed: Connection refused
May 21 16:26:13 bondsir003-74050-bl10 pve-ha-lrm[1988]: loop take too long (1228 seconds)
May 21 16:26:13 bondsir003-74050-bl10 pve-ha-lrm[1988]: updating service status from manager failed: Connection refused
May 21 16:26:14 bondsir003-74050-bl10 pve-ha-crm[1941]: loop take too long (976 seconds)
May 21 16:26:17 bondsir003-74050-bl10 pveproxy[63194]: ipcc_send_rec[1] failed: Connection refused
May 21 16:26:17 bondsir003-74050-bl10 pveproxy[63194]: ipcc_send_rec[2] failed: Connection refused
May 21 16:26:17 bondsir003-74050-bl10 pveproxy[63194]: ipcc_send_rec[3] failed: Connection refused
May 21 16:26:18 bondsir003-74050-bl10 pve-ha-lrm[1988]: updating service status from manager failed: Connection refused
Binary file (standard input) matches
Any ideas?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!