Problems with a Master, urgent

  • Thread starter Thread starter TiagoRF
  • Start date Start date
T

TiagoRF

Guest
Hello,

Apparently our master got himself into acids, before the VM's used to hang, and we had to reboot the whole node/master.

Well now, our node seems very ok, yet, our master gets mad.8
VM's hang, proxmox itself died, as you could type anything on the shell, for an instance "top" or "init 6"

Result = 0, nothing would work.

Ever happened with any of you ?

PVE 1.4 with 2.6.24-8.
 
Nope Sir, just the usual on syslog

(last thing was the sync result)

On the console, nothing at all.

Our node, as I said, works perfectly atm, after some dificulties, but seems very stable.
 
Hello,

Apparently our master got himself into acids, before the VM's used to hang, and we had to reboot the whole node/master.

Well now, our node seems very ok, yet, our master gets mad.8
VM's hang, proxmox itself died, as you could type anything on the shell, for an instance "top" or "init 6"

Result = 0, nothing would work.

Ever happened with any of you ?

PVE 1.4 with 2.6.24-8.
Hi,
perhaps a problem with the disks? Do you have a raid? Or is a filesystem mounted, that not reachable is?

Perhaps it's a first help when you make your node to the master - than you can access the VMs on the node:
Code:
pveca -m
If you can, delete on the old master the cluster.conf (/etc/pve/cluster.cfg) and reboot.
If you found the error, you can join the old master to the new master.

Udo
 
Yes, we have a RAID (10)

We've ran an integrity check on the HW RAID and all seems ok.

The installation is by default, no changes done.

We've rebooted the machine, waiting for it to hang again, and it eventually will.

The thing that worries me is, the hangs have been from the core now, i mean, from prox's itself, not just the VM.

You can't reach it by apache, nor start/restart apache.
 
Hello

Dietmar, memtest indicates memory with no errors, but thanks!

Sniffer had an idea yesterday, related to the bonding, vmbr's, so we've stripped some out of it and the response times from the NICs are awesomely faster.

So far today, we had no problems, during the weekend we'll see whats what.

Thanks for all the support people!
 
Hi,

maybe could help, maybe not, here is just what append to me (at least twice )

The proxmox server crash during a night snapshot of the openVZ containers. After the reboot the proxmox server hangs somewhere before the apache server run or the login prompt appears on the console.

Here is what I did:
1. reboot in single mode
2. remove the snapshot from the LVM file system
3. change the config of each openVZ and KVM to NOT autostart at boot
4. reboot

5. wait until the proxmox server responds via browser, start each openVZ and KVM manually (one after the other, look they are running OK) and reconfigure the "start at boot" as before.

That help me to recover the whole server and data

Just my two cents.

PS:

I think the problem is the antivirus running on the openVZ container at the same time the vzdump is running. The antivirus checks a lot of files (>60GB of data) and maybe the 4GB free in the LVM are not enough (but I'm not a vzdump/LVM/snapshot expert)
 
I think the problem is the antivirus running on the openVZ container at the same time the vzdump is running. The antivirus checks a lot of files (>60GB of data) and maybe the 4GB free in the LVM are not enough (but I'm not a vzdump/LVM/snapshot expert)

The virus check only reads files (unless it clean some viruses), so that should not increase the size of the snapshot.