Interface doesn't work after upgrade

Frakir

Member
Jan 20, 2016
29
0
21
52
I had two-node cluster of proxmox 4.1.5. Only lxc-containers, no HA, only local disks.
I added a new node 4.1.22 and upgraded and rebooted the other nodes.
It worked well for severla hours but now interface doesn't work.
What can be done? Just reboot (it seems like the only but temporary solution).

There are p1,p2,p3.
I can't see any details about any lxc-container.
On p1 when I log in and it shows all nodes ok.
On p2 I see p2 as up and p1 and p3 as down.
On p3 I see p3 as up and p1 and p2 as down.

pvecm status is ok on all nodes.
service pveproxy restart fails:
Failed to start PVE API Proxy Server.
pct list halts on all nodes.

# pvevm status
Quorum information
------------------
Date: Mon Apr 18 15:15:29 2016
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000003
Ring ID: 1940
Quorate: Yes

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000002 1 10.1.1.1
0x00000003 1 10.1.1.2 (local)
0x00000001 1 10.1.1.2

# pveversion


pve-manager/4.1-22/aca130cf (running kernel: 4.2.8-1-pve)
 
# systemctl status pveproxy.service
● pveproxy.service - PVE API Proxy Server
Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled)
Active: failed (Result: timeout) since Пн 2016-04-18 13:07:15 MSK; 4h 16min ago
Main PID: 6750 (code=exited, status=0/SUCCESS)

апр 18 13:04:14 p5 systemd[1]: pveproxy.service start operation timed out. Terminating.
апр 18 13:05:45 p5 systemd[1]: pveproxy.service stop-final-sigterm timed out. Killing.
апр 18 13:07:15 p5 systemd[1]: pveproxy.service still around after final SIGKILL. Entering failed mode.
апр 18 13:07:15 p5 systemd[1]: Failed to start PVE API Proxy Server.
апр 18 13:07:15 p5 systemd[1]: Unit pveproxy.service entered failed state.
 
# systemctl start pveproxy.service
Job for pveproxy.service failed. See 'systemctl status pveproxy.service' and 'journalctl -xn' for details.

# systemctl status pveproxy.service
● pveproxy.service - PVE API Proxy Server
Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled)
Active: failed (Result: timeout) since Пн 2016-04-18 18:04:22 MSK; 37s ago
Main PID: 6750 (code=exited, status=0/SUCCESS)

апр 18 18:01:22 p5 systemd[1]: pveproxy.service start operation timed out. Terminating.
апр 18 18:02:52 p5 systemd[1]: pveproxy.service stop-final-sigterm timed out. Killing.
апр 18 18:04:22 p5 systemd[1]: pveproxy.service still around after final SIGKILL. Entering failed mode.
апр 18 18:04:22 p5 systemd[1]: Failed to start PVE API Proxy Server.
апр 18 18:04:22 p5 systemd[1]: Unit pveproxy.service entered failed state.


pct list hangs forever. pveproxy just sais "Failed to start..".
 
From /var/log/syslog:

Apr 18 18:01:22 p2 systemd[1]: pveproxy.service start operation timed out. Terminating.
Apr 18 18:02:52 p2 systemd[1]: pveproxy.service stop-final-sigterm timed out. Killing.
Apr 18 18:04:22 p2 systemd[1]: pveproxy.service still around after final SIGKILL. Entering failed mode.
Apr 18 18:04:22 p2 systemd[1]: Failed to start PVE API Proxy Server.
Apr 18 18:04:22 p2 systemd[1]: Unit pveproxy.service entered failed state.
 
When I try to do, for example, ls -l /etc/pve/ it hangs forever...
It looks like some lock on pmcfs.
But how to prevent it and how unlock it?
 
Yes, after reboot cluster works.
As I understand, reboot is the only solution? Is there any way to prevent this?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!