- proxmox 4.4
- nosub repo
I dist-upraded two nodes 11-Dec. Now both those nodes have multiple unkillable pveproxy processes. dmesg has many entries of:
[50996.416909] INFO: task pveproxy:6798 blocked for more than 120 seconds.
[50996.416914] Tainted: P O 4.4.95-1-pve #1
[50996.416918] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[50996.416922] pveproxy D ffff8809194e3df8 0 6798 1 0x00000004
[50996.416925] ffff8809194e3df8 ffff880ff6f5ed80 ffff880ff84fe200 ffff880fded5e200
[50996.416927] ffff8809194e4000 ffff880fc7fb43ac ffff880fded5e200 00000000ffffffff
[50996.416929] ffff880fc7fb43b0 ffff8809194e3e10 ffffffff818643b5 ffff880fc7fb43a8
- cluster file system is fine
- pvesm returns all storage ok.
- pvecm status is normal
- qm list and qm migrate just hang.
- can't connect to the webgui on the two ndoes in question.
- The 3rd node that I didn't upgrade is fine, no problems.
It took a remote hard reset to bring the nodes back. This has happen mutliple times since unfortunately.
systemctl status pveproxy
● pveproxy.service - PVE API Proxy Server
Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled)
Active: failed (Result: timeout) since Wed 2017-12-20 06:49:06 AEST; 3h 44min ago
Main PID: 4325 (code=exited, status=0/SUCCESS)
Dec 20 06:46:06 vng systemd[1]: pveproxy.service start operation timed out. Terminating.
Dec 20 06:47:36 vng systemd[1]: pveproxy.service stop-final-sigterm timed out. Killing.
Dec 20 06:49:06 vng systemd[1]: pveproxy.service still around after final SIGKILL. Entering failed mode.
Dec 20 06:49:06 vng systemd[1]: Failed to start PVE API Proxy Server.
Dec 20 06:49:06 vng systemd[1]: Unit pveproxy.service entered failed state.
Anyone else seeing this?