Problem in web gui

shotgun

New Member
Apr 12, 2012
7
0
1
Hi

I have a strange issue with the webgui.

I have a single node running some opnvz containers. The containers are all running ok.
The webgui shows the containers butt shows them greyed out without a hostname.
None of the graphs are displaying any data, eg storage usage and none of the contaners are showing any status data.

Screenshot 2014-09-08 13.18.43.png

I have cleared my browser cache. I have tried other broweser - the same.
I have restart pveproxy, pvedaemon and pvecluster. All no difference.

It appears there is a halted snaphot backup but the task list is empty so I can't see what happened. There is a snaphot left nehind but there is no way to remove (device busy)

As I say all the VM's are running ok, I'm not sure what to try next.

/etc/pve# pveversion -v
proxmox-ve-2.6.32: 3.2-126 (running kernel: 2.6.32-29-pve)
pve-manager: 3.2-4 (running version: 3.2-4/e24a91c1)
pve-kernel-2.6.32-29-pve: 2.6.32-126
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.5-1
pve-cluster: 3.0-12
qemu-server: 3.1-16
pve-firmware: 1.1-3
libpve-common-perl: 3.0-18
libpve-access-control: 3.0-11
libpve-storage-perl: 3.0-19
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-6
vzctl: 4.0-1pve5
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.7-8
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.2-1


Any ideas ?
 
Reboot? Also check that the clusterfs under /etc/pve is properly mounted though you should see an error when you restart pve-cluster. You could also try restarting pvestatd since it's responsible for keeping all status info up to date, AFAIK.
 
Reboot? Also check that the clusterfs under /etc/pve is properly mounted though you should see an error when you restart pve-cluster. You could also try restarting pvestatd since it's responsible for keeping all status info up to date, AFAIK.

Restarting pvestatd has got the webgui back.

Looks like other bad things happened though. The cause appears to have been a drama as a backup process was backing up to an NFS mount.
It was a snaphot backup and it looks like NFS barfed first, got a stack trace around the same that the pvestatd died, this left an orphaned snapshot that I cannot unmount.
I can't remove it with lvremove or dmsetup as we get a device busy I/O error. There's fs errors buried in the logs that relate to the filesystem on the snapshot.
It looks like we have filesystem errors being reported on the snapshot device (dm-1) prior to nfs crashing. I can't tell if the snapshot fs problem (orphaned inodes etc..) is a red herring or not.

Oddly I've done this plenty of times before but on a slightly older version of PVE maybe there are some known lvm snaphot / nfs bugs in this latest release.

Although everything is running I have the snapshot left behind, elevated loadavg due to io wait.
I cannot shift the snaphot, I can't find a pid that owns the process and running fuser or lsof to track it down just sits there for hours resulting in a zombie process.
I'm out of ideas and it looks like a reboot is the only way I'm getting this mess cleared up. I guess I'll have to rethink the backup strategy.

It's a new box , hardware raid LSI raid controller with a decent cache and three disks RAID5 - everything looks ok there, no bad sectors or other alerts / problems.

Any light that can be shed on this would be extremely welcome.
 
I've seen all kinds of funky backup problems with kernels newer than 2.6.32-27-pve on a couple of servers I manage, the workaround was to go back to this version. The problems were similar to yours, complete with high load and zombies. And it's on real server HW with HWRAID, too.

Where I couldn't downgrade I changed the backup storage to locally mounted CIFS (besides being a feasible workaround, it appears to be faster, too). Something's seriously broken in these kernels, its NFS kernel code or in the PVE backup system. These problems seem to be sporadic but there're more than one users on the forums reporting similar things. I'd suggest using CIFS as workaround or if you can (as in, you don't need OVZ), upgrade to 3.10, it might help.

I think your snapshot alerts are just the usual LVM snapshot side-effects, that's normal.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!