Proxmox 1.5 Can't login in webgui

Petrus4 · Jan 24, 2010

I have Proxmox VE 1.5 fully updated in a cluster with one other machine also with proxmox-ve-1.5. Both Machines were recently upgraded from 1.4.
I shut down the slave machine cluster machine down since there are no systems currently on it.

I do have nfs storage configured on the slave machine and had added it to the Master.

When I try to login in the webgui I get the window that you get when a bad username or password is entered. I have confirmed the username and password is correct.

I get his error in the logs:

[

login failure: 500 read timeout

I tried to start the pvedaemon: /etc/init.d/pvedaemon restart

after restarting I can login but can only see the main page when I click on anything else after a long delay I get the following error:

Internal Server Error

The server encountered an internal error or misconfiguration and was unable to complete your request.
Please contact the server administrator, root and inform them of the time the error occurred, and anything you might have done that may have caused the error.

[24880]ERR: 24: Error in Perl code: 500 read timeout

Apache Embperl 2.2.0 [Sun Jan 24 13:51:25 2010]

After this I cannot access anything. I have to restart the pvedaemon again to be able to login.

Has anyone else encountered this?

dietmar · Jan 24, 2010

You have a NFS or iSCSI server which is no longer reachable? Try

# pvesm list

Does that give some hint?

Petrus4 · Jan 24, 2010

dietmar said:
You have a NFS or iSCSI server which is no longer reachable? Try

# pvesm list

Does that give some hint?

I tried this command:

# pvesm list

and after 10 minutes I still get no results. Nothing happens and I do not get my cursor back.

what kind of output should I be getting?

Petrus4 · Jan 24, 2010

Just to be clear the NFS storage was being used for backups only not to store virtual machines.

Petrus4 · Jan 24, 2010

dietmar said:
You have a NFS or iSCSI server which is no longer reachable? Try

# pvesm list

Does that give some hint?

If the hint is that the system gets stuck because it is trying to access the nfs server and can't, then the next step would be to remove this storage from the master so the master is not looking for it.

I think I can use the pvesm command to remove storage but I need to know what the storage id is. Where can I find this? Am I on the right track here?

dietmar · Jan 25, 2010

Petrus4 said:
I think I can use the pvesm command to remove storage but I need to know what the storage id is. Where can I find this? Am I on the right track here?

The correct fix is to get the NFS storage online again!

If that is not possible, just edit /etc/pve/storage.cfg directly.

Petrus4 · Jan 25, 2010

dietmar said:
The correct fix is to get the NFS storage online again!

If that is not possible, just edit /etc/pve/storage.cfg directly.

Ok editing the storage.cfg and removing the nfs share fixed the problem. Thanks Dietmar!

Will there be a fix for this kind of behaviour when an nfs server is down? I would expect that you would still be able to manage the system via the gui when an nfs backup location is offline.

dietmar · Jan 25, 2010

yes, we try to fix that.

Petrus4 · Jan 25, 2010

dietmar said:
yes, we try to fix that.

Great! Thanks.

SamTzu · Jan 26, 2010

I just ran in to this problem in v.1.4.

Jan 26 12:29:11 host10 kernel: nfs: server 10.10.108.20 not responding, still trying
Jan 26 12:29:34 host10 ntpd[3675]: Deleting interface #15 vmtab101i0, fe80::2ff:2fff:fe5e:8ea8#123, interface stats: received=0, sent=0, dropped=0, active_time=54000 secs
Jan 26 12:30:01 host10 /USR/SBIN/CRON[11568]: (root) CMD (/usr/share/vzctl/scripts/vpsreboot)
Jan 26 12:30:01 host10 /USR/SBIN/CRON[11570]: (root) CMD (/usr/share/vzctl/scripts/vpsnetclean)
Jan 26 12:30:01 host10 /USR/SBIN/CRON[11572]: (root) CMD (test -x /usr/lib/atsar/atsa1 && /usr/lib/atsar/atsa1)
Jan 26 12:30:34 host10 proxwww[11554]: 500 read timeout
Jan 26 12:31:30 host10 proxwww[11554]: update ticket
Jan 26 12:31:31 host10 proxwww[11594]: Starting new child 11594
Jan 26 12:31:41 host10 proxwww[11594]: update ticket
Jan 26 12:31:42 host10 proxwww[11595]: Starting new child 11595
Jan 26 12:31:47 host10 proxwww[11595]: update ticket
Jan 26 12:31:47 host10 proxwww[11596]: Starting new child 11596
Jan 26 12:32:40 host10 proxwww[11557]: 500 read timeout
Jan 26 12:33:30 host10 proxwww[11554]: update ticket failed: 500 read timeout
Jan 26 12:33:41 host10 proxwww[11594]: update ticket failed: 500 read timeout
Jan 26 12:33:47 host10 proxwww[11595]: update ticket failed: 500 read timeout
Jan 26 12:33:49 host10 proxwww[11599]: Starting new child 11599
Jan 26 12:34:52 host10 proxwww[11596]: login failure: 500 read timeout

Restarting these services returned the webconsole...
(except the Storage section!)

/etc/init.d/pvedaemon restart
/etc/init.d/pvenetcommit restart
/etc/init.d/pvetunnel restart

Petrus4 · Jan 29, 2010

dietmar said:
The correct fix is to get the NFS storage online again!

If that is not possible, just edit /etc/pve/storage.cfg directly.

Hi Dietmar,

I want to post another issue I encountered in case other users run into the same problem.

I ran into another issue after editing away the nfs storage in storage.cfg: updatedb or mlocate would hang indefinitely when it was run manually or via Cron. I was getting this error from Cron:

/etc/cron.daily/mlocate:
/usr/bin/updatedb.mlocate: `/var/lib/mlocate/mlocate.db' is locked (probably by an earlier updatedb)
run-parts: /etc/cron.daily/mlocate exited with return code 1

The reason was that the system still had the nfs share mounted.

Code:

umount -at nfs

would show me the nfs mounts and told me they were busy.

Code:

# fuser -km /mnt/pve/DMZ-S1

would also hang indefinitely.

I could not find nfs info in fstab

I looked in /etc/rc0.d and found S31umountnfs.sh

I ran this script and Voilà! the nfs mount was removed. I did a kill -9 on any remaining hung up updatedb instances and ran updatedb and it worked.

So in a nutshell:

If your nfs storage server is down, bring it back up or else your webgui might hang.

IF your nfs server can't be brought back. Remove it from /etc/pve/storage.cfg and run ./etc/rc0.d/S31umountnfs.sh

~or~

if you are removing your NFS server from storage do so first via the webgui and then remove your nfs server.

JustaGuy · Feb 12, 2010

It seems as though the NFS is indeed up in my case. Even when I reboot the NFS server the condition remains.

Being that I'm unable to reach the storage page, unable to ssh in, and unable to umount from a local console...

I've been rebooting PVE via the front power switch of the box.
Then being sure not to visit the storage page of the web interface first, I comment out the share in /etc/pve/storage.cfg.

Then as soon as I re-add the share via the web interface all's well again.

Until in an hour or about 20 GB transferred it drops off again.

I posted in NexentaStor's forum(s) about this too, as I'm not sure at all how it's happening.

tmo · Nov 8, 2011

Very odd, this is still happening in the latest 1.9 version:

pve:~# umount -at nfs
umount.nfs: Server failed to unmount '169.254.1.200:/home/vm-iso'
umount.nfs: Server failed to unmount '169.254.1.200:/home/vm-disks'
umount.nfs: Server failed to unmount '169.254.1.200:/home/vm-backup'

pvesm list just hands indefinitely.

pveversion -v
pve-manager: 1.9-24 (pve-manager/1.9/6542)
running kernel: 2.6.32-6-pve
proxmox-ve-2.6.32: 1.9-47
pve-kernel-2.6.32-6-pve: 2.6.32-47
qemu-server: 1.1-32
pve-firmware: 1.0-14
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.29-2pve1
vzdump: 1.2-16
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.15.0-1
ksm-control-daemon: 1.0-6

Is there any fix for this ..

The NFS mounts have been up for an eternity without problems, and the NFS server is working from other servers, just not the master cluster. Zero changes have been made to the NFS server, and only proxmox updates have been applied to the connecting PVE machines.

update: Rebooted the NFS server, and it still cannot unmount one of the shares. Seems it is still hanging, but the NFS is working and responsive from other servers. Unfortunately the one hanging is production, and cannot reboot it at this time.

Also, this is the only thing I see in the logs:
Nov 8 05:39:33 kernel ct99 nfs: server 169.254.1.200 OK
Nov 8 05:41:08 kernel ct0 nfs: server 169.254.1.200 OK
Nov 8 05:42:42 kernel ct0 statd: server rpc.statd not responding, timed out
Nov 8 05:42:42 kernel lockd: cannot unmonitor 169.254.1.200

Obviously something is using the NFS mount, but there is no other signs of activity.

Any suggestions?

fstanchina · Jan 20, 2012

Any news on this issue? I know my NFS servers shouldn't go down, but when they do PVE shouldn't become unusable either!

Search

Search

Proxmox 1.5 Can't login in webgui

Petrus4

Member

dietmar

Proxmox Staff Member

Petrus4

Member

Petrus4

Member

Petrus4

Member

dietmar

Proxmox Staff Member

Petrus4

Member

dietmar

Proxmox Staff Member

Petrus4

Member

SamTzu

Renowned Member

Petrus4

Member

JustaGuy

Renowned Member

tmo

Renowned Member

fstanchina

Member

We value your privacy