Rebooting ProxMox Node

Kaanha

New Member
Jan 4, 2013
20
0
1
Okay, I'll start by saying I've had this problem before. The web interface completely fails now (it was working completely fine 2 days ago when I did my routine maintenance), it says "Communication failed" when I attempt to do anything, it won't show the status of any VM, yet they all are running and I can access them via SSH, etc. And now it's telling me my login information is incorrect. I've fixed this problem by a simple restart, however my server is in a new area and I'd like to prevent having to drive to the data center.

I have tried the restart option in the web interface (when it'd let me log in) and all it did was give me "Communication failed" even though the SSH instance would pop up and say "The server is going down NOW!" Nothing would happen. I have also manually shut down ever VM and tried the restart option via the web interface, it still does the exact same thing.

I have tried the command reboot, it says "The server is going down NOW!" in the SSH client...but nothing actually happens.

I have tried shutdown -R now, it says "The server is going down NOW!" in the SSH client...but nothing actually happens.

Is there something else I need to do to restart this server via command line? Or perhaps restart the Web Interface service (What's the service that needs to be restarted for that?).

I'd like to prevent physically powering the machine off then back on.

my pveversion -v is

Code:
proxmox-ve-2.6.32: 3.1-109 (running kernel: 2.6.32-23-pve)pve-manager: 3.1-3 (running version: 3.1-3/dc0e9b0e)
pve-kernel-2.6.32-20-pve: 2.6.32-100
pve-kernel-2.6.32-19-pve: 2.6.32-95
pve-kernel-2.6.32-16-pve: 2.6.32-82
pve-kernel-2.6.32-23-pve: 2.6.32-109
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-2
pve-cluster: 3.0-8
qemu-server: 3.1-8
pve-firmware: 1.0-23
libpve-common-perl: 3.0-8
libpve-access-control: 3.0-7
libpve-storage-perl: 3.0-17
pve-libspice-server1: 0.12.4-2
vncterm: 1.1-4
vzctl: 4.0-1pve4
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-17
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.1-1

Thanks in advance for any help.
 
I just logged into Proxmox Forum to describe exactly the same problem I have since last night! Thanks for posting it. Saves me from typing all over. :)

But yes, i have the same problem. All VM running, i can access them remotely or via SSH not a problem. Among running VMs i have 3 email servers. Emails are flowing as usual but i cannot log in due to Login Failed. I have 4 nodes and all has same issue. This is not first time i came across this error. For me the issue happened everytime i did something with Shared Storage. Last night i created a FreeNAS VM to setup NFS share and see if i could install OpenVZ by adding FreeNAS VM as NFS shared storage. I was able to connect successfully, but when i started creating a Container into that NFS shared all fell apart. I can no longer login. I have not rebooted any machines yet and i rather not go that route.
I still sometime notice this issue when i do massive amount of data transfer within CEPH cluster. In the past i was able to do #service pvestatd restart and all was ok. But this time it is not working. All nodes are up to date and got Community Subscription.

@Kaanha, try this command and see it it helps in your situation.
# service pvestatd restart
 
Well I'm not sure if it fixed the status issue, it still wont' let me log in. :mad: I've tried changing my password as well, I can get into the SSH with no issue, but the web interface, keeps telling me that the Login Failed and nothing I do will restart the server remotely.

If all else fails I'll have to drive by and physically touch the bare metal, not something I'm really wanting to do.

I just logged into Proxmox Forum to describe exactly the same problem I have since last night! Thanks for posting it. Saves me from typing all over. :)

But yes, i have the same problem. All VM running, i can access them remotely or via SSH not a problem. Among running VMs i have 3 email servers. Emails are flowing as usual but i cannot log in due to Login Failed. I have 4 nodes and all has same issue. This is not first time i came across this error. For me the issue happened everytime i did something with Shared Storage. Last night i created a FreeNAS VM to setup NFS share and see if i could install OpenVZ by adding FreeNAS VM as NFS shared storage. I was able to connect successfully, but when i started creating a Container into that NFS shared all fell apart. I can no longer login. I have not rebooted any machines yet and i rather not go that route.
I still sometime notice this issue when i do massive amount of data transfer within CEPH cluster. In the past i was able to do #service pvestatd restart and all was ok. But this time it is not working. All nodes are up to date and got Community Subscription.

@Kaanha, try this command and see it it helps in your situation.
 
Dont lose hope yet. I am sure there is a solution for this. Changing password will not help since it seems to me it is unable to talk to the Cluster Database. I will keep trying and will update if anything changes. I have several dozen VMs running in all nodes. Would not want to login one by one and do manual shutdown just to reboot all Proxmox nodes.
 
Have you tried:
service restart pvedaemon
service restart pvestatd
service restart pveproxy

As a last resort:
service pve-cluster restart
service restart pvedaemon
service restart pvestatd
service restart pveproxy

I am pretty sure you need to restart pveproxy after restarting pvedaemon
 
And I am in !

I did not have to goto last restart. Just restarted pvedaemon, pvestatd, pveproxy and it worked.

So now the curious question is why changing something in Storage brings this services down and needs restart?
 
That absolutely worked! I had to do the "Last Resort" option, but I'm back in and all of my VMS are now showing their status, etc.

That was a great fix and this definitely needs to be pinned or something, because two of us was having the problem and I'm sure others are or will have it. This could save a lot of time with running to the data center.

Thanks a lot man!


Have you tried:
service restart pvedaemon
service restart pvestatd
service restart pveproxy

As a last resort:
service pve-cluster restart
service restart pvedaemon
service restart pvestatd
service restart pveproxy

I am pretty sure you need to restart pveproxy after restarting pvedaemon
 
So now the curious question is why changing something in Storage brings this services down and needs restart?

Some storage types (NFS) can block syscalls forever. We try to avoid that, but obviously not everywhere. Please report a bug if you ever find a way to reproduce that behavior.
 
A typical cause for this behavior with NFS is if a file "locked" by a client gets removed on the storage. This will cause a hanging process on the client which can only be resolved by killing all NFS processes on the client as well as on the storage and thenrestart all NFS daemon processes. In worst case only option is to reboot the storage and the client.

Why? NFS processes and locks live in kernel space!!!!
 
A typical cause for this behavior with NFS is if a file "locked" by a client gets removed on the storage. This will cause a hanging process on the client which can only be resolved by killing all NFS processes on the client as well as on the storage and thenrestart all NFS daemon processes. In worst case only option is to reboot the storage and the client.

Why? NFS processes and locks live in kernel space!!!!

In my case the Container i was trying to create may have caused "locked" issue. After pvestatd in all nodes started going offline, i figured it was the new storage i added caused it. So i forcefully(from storage.conf) removed the shared storage then none of the proxmox node wanted to show online on GUI. While digging i came across this solution which seems to allow kill NFS processes, unmount and bring everythign back to normal without restart of proxmox node.

Create a fake IP for dead nfs server on Proxmox node

# ifconfig eth0:fakenfs 192.160.1.1 netmask 255.255.255.0

Unmount NFS Share
#umount -f -l /mn/<nfs-share>

Then delete the fake IP
# ifconfig eth0 delete 192.168.1.1

Find NFS prcesses
# ps aux
# kill <nfs id>

I recreated this issue by trying to readd the shared storage using FreeNAS VM. Following combination of fakenfs and restarting proxmox 3 services brought everything back to normal without restart.
 
Alright, mine is still messing up and I've tracked the issue down, some what to the pvestatd service. It will just, stop working or something, though the process appears to still be running.

It will work and report the status for about 30 seconds and then it'll start showing "Status Unknown" and then start giving the "Communication Error" message again and eventually all VM status will disappear, names, etc. I restart the pvestatd service and it all comes back for, again, about 30 or so seconds...then it repeats the process.

Here is a screenshot of the issue...

proxmox_error.png

Again, if I restart the pvestatd service again, it'll display information correctly for about 30 seconds before doing this exact thing again.


Some storage types (NFS) can block syscalls forever. We try to avoid that, but obviously not everywhere. Please report a bug if you ever find a way to reproduce that behavior.
 
Hi,
ew have a real problem...
the connection gets lost in between.
Then the hardware devices everythign under "Display" in the overview isnt listed anymore... the connection error is shown,

But on a running windows instance the downloads that are done are kept active and are not interrupted.
Then after a while the devices are shown again and its good... After another while it breaks again...

Very curious behaviour...
proxodelete2.png

Maybe this happened because creation of a new VM

And we observe, as soon as the console window is closed the stuff is okay again... no connection error. Curious, isnt it ?
 
Last edited:
Old post but it saved me also - Huge Thanks MIR for the instructions.
Running Proxmox 5.0.32;
Have you tried: - had to change the syntax:
service restart pvedaemon had to change to service pvedaemon --full-restart
service restart pvestatd had to change to service pvestatd --full-restart
service restart pveproxy had to change to service pveproxy --full-restart

Didn't not have to run "Last Resort" Section

And when I got into Poxmox; I had to restart Prox Server as it had a red X thru the Icon.
Restart cleared this to a "Green Tick". The reboot took a long time to complete 8 Minutes but it came back.
Thank You
Terry
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!