Problem in web gui

shotgun · Sep 8, 2014

Hi

I have a strange issue with the webgui.

I have a single node running some opnvz containers. The containers are all running ok.
The webgui shows the containers butt shows them greyed out without a hostname.
None of the graphs are displaying any data, eg storage usage and none of the contaners are showing any status data.

I have cleared my browser cache. I have tried other broweser - the same.
I have restart pveproxy, pvedaemon and pvecluster. All no difference.

It appears there is a halted snaphot backup but the task list is empty so I can't see what happened. There is a snaphot left nehind but there is no way to remove (device busy)

As I say all the VM's are running ok, I'm not sure what to try next.

/etc/pve# pveversion -v
proxmox-ve-2.6.32: 3.2-126 (running kernel: 2.6.32-29-pve)
pve-manager: 3.2-4 (running version: 3.2-4/e24a91c1)
pve-kernel-2.6.32-29-pve: 2.6.32-126
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.5-1
pve-cluster: 3.0-12
qemu-server: 3.1-16
pve-firmware: 1.1-3
libpve-common-perl: 3.0-18
libpve-access-control: 3.0-11
libpve-storage-perl: 3.0-19
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-6
vzctl: 4.0-1pve5
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.7-8
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.2-1

Any ideas ?

kobuki · Sep 8, 2014

Reboot? Also check that the clusterfs under /etc/pve is properly mounted though you should see an error when you restart pve-cluster. You could also try restarting pvestatd since it's responsible for keeping all status info up to date, AFAIK.

shotgun · Sep 9, 2014

kobuki said:
Reboot? Also check that the clusterfs under /etc/pve is properly mounted though you should see an error when you restart pve-cluster. You could also try restarting pvestatd since it's responsible for keeping all status info up to date, AFAIK.

Restarting pvestatd has got the webgui back.

Looks like other bad things happened though. The cause appears to have been a drama as a backup process was backing up to an NFS mount.
It was a snaphot backup and it looks like NFS barfed first, got a stack trace around the same that the pvestatd died, this left an orphaned snapshot that I cannot unmount.
I can't remove it with lvremove or dmsetup as we get a device busy I/O error. There's fs errors buried in the logs that relate to the filesystem on the snapshot.
It looks like we have filesystem errors being reported on the snapshot device (dm-1) prior to nfs crashing. I can't tell if the snapshot fs problem (orphaned inodes etc..) is a red herring or not.

Oddly I've done this plenty of times before but on a slightly older version of PVE maybe there are some known lvm snaphot / nfs bugs in this latest release.

Although everything is running I have the snapshot left behind, elevated loadavg due to io wait.
I cannot shift the snaphot, I can't find a pid that owns the process and running fuser or lsof to track it down just sits there for hours resulting in a zombie process.
I'm out of ideas and it looks like a reboot is the only way I'm getting this mess cleared up. I guess I'll have to rethink the backup strategy.

It's a new box , hardware raid LSI raid controller with a decent cache and three disks RAID5 - everything looks ok there, no bad sectors or other alerts / problems.

Any light that can be shed on this would be extremely welcome.

kobuki · Sep 9, 2014

I've seen all kinds of funky backup problems with kernels newer than 2.6.32-27-pve on a couple of servers I manage, the workaround was to go back to this version. The problems were similar to yours, complete with high load and zombies. And it's on real server HW with HWRAID, too.

Where I couldn't downgrade I changed the backup storage to locally mounted CIFS (besides being a feasible workaround, it appears to be faster, too). Something's seriously broken in these kernels, its NFS kernel code or in the PVE backup system. These problems seem to be sporadic but there're more than one users on the forums reporting similar things. I'd suggest using CIFS as workaround or if you can (as in, you don't need OVZ), upgrade to 3.10, it might help.

I think your snapshot alerts are just the usual LVM snapshot side-effects, that's normal.

Search

Search

Problem in web gui

shotgun

New Member

kobuki

Renowned Member

shotgun

New Member

kobuki

Renowned Member

We value your privacy