PVE 1.6 Cluster master behaving strange

T

TiagoRF

Guest
Afternoon people!

I've been noticing a strange behavior from the master of the cluster;

- Can't access it through web interface, says wrong login or pass

When I pveca -l it:

spr:~# pveca -l
CID----IPADDRESS----ROLE-STATE--------UPTIME---LOAD----MEM---DISK
1 : 10.0.2.101 M ERROR: 500 read timeout

2 : 10.0.2.102 N S 10 days 18:20 0.47 57% 3%


And the loads..

14:44:31 up 10 days, 18:21, 1 user, load average: 6.55, 6.04, 5.88


Comparing to the node, thats huge! Uptime14:46:14 up 10 days 18:23, load average: 0.25, 0.30, 0.33

thats the node!

In the webinterface through the node, we can see this:

HostnameIP AddressRoleStateUptimeLoadCPUIODelayMemoryDiskspr10.0.2.101MasterERROR: 500 read timeout sse10.0.2.102Nodenosync10 days 18:240.260%0%57%3%

Pretty awkward, rebooting is by any mean a good hint?
 
Probably just found out:

spr:~# ps x | grep vzdump
5723 ? Ds 0:00 /usr/bin/perl -w /usr/sbin/vzdump --quiet --node 1 --snapshot --compress --storage Backups --mailto xpto@xpto.org 101
18655 ? Ds 0:00 /usr/bin/perl -w /usr/sbin/vzdump --quiet --node 1 --suspend --compress --storage Backups --mailto xpto@xpto.org 103

The processes are hang!
 
any logs in /var/log/vzdump regarding these jobs?
 
15:05:54 up 10 days, 18:42, 1 user, load average: 3.50, 3.90, 4.66

the loads are coming normal again with time, yet the sync from the cluster sems to be gone at the moment.

drbd is fine nonetheless
 
not a single log tom

I had that a few times a while ago when we were between the 1.6.5121 & 1.6.5261 releases- while waiting for the 2.6.35 kernel with KSM.
Then twice again after reinstalling the newer iso, using 2.6.35.

I couldn't make it happen, they did it on their own at random it seemed.
I had to use the power button on the host each time.

Then it just stopped happening.
Since that general time period, for reasons unrelated, I started using these 2 packages from testing, and still run the 2.6.35 kernel.
They've been fine for over a week now, probably even more like 2 or 3.

Code:
Package: pve-qemu-kvm
Pin: release c=pvetest
Pin-Priority: 900

Package: qemu-server
Pin: release c=pvetest
Pin-Priority: 900
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!