Proxmox server unreachable!

emanuelebruno

Renowned Member
May 1, 2012
143
7
83
Catania
emanuelebruno.it
Hi all,

today I was panicked because the server has become unreachable while I was reading e-mail.
I couldn't ping so I restarted it using the remote reboot function (I don't why it takes 40 minutes every time to reboot and the last time was more than 6 months ago...).

I am very scared that there is some hardware problem (such as a disk that it is going to damage) so I ask the community some advice to diagnose the server: what are the logs that might help me figure out if there is a problem with Proxmox (software) or if there is a hardware problem?

pve-manager: 2.1-1 (pve-manager/2.1/f9b0f63a)
running kernel: 2.6.32-11-pve
proxmox-ve-2.6.32: 2.0-66
pve-kernel-2.6.32-11-pve: 2.6.32-66
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.8-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.7-2
pve-cluster: 1.0-26
qemu-server: 2.0-39
pve-firmware: 1.0-15
libpve-common-perl: 1.0-27
libpve-access-control: 1.0-21
libpve-storage-perl: 2.0-18
vncterm: 1.0-2
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-9
ksm-control-daemon: 1.1-1

What follows is the hard drive tests: do you think it is failing?

View attachment smartctl.zip

Thanks for your help.
Sincerely,
E.Bruno
 
Last edited:
Hi,
I trying to understand why my dedicated server halted 2 days ago without an apparent reason, I have searched into /var/log but I haven't found any evidence...

Please,can you help me?

E.Bruno
 
Well, my guess is as good as your guess.

How old is the hardware? What kind of hardware? Single disk or Raid?

To be honest I didn't even look at your SMART Test, it was just last week that I had a drive on the way out ... (the Raid was slowing at odd times and doing funny things) ... but SMART showed everything was fine ... bad block scan showed all fine as well. So I watched the drive lights during lunch break while having it go to all sorts of places on the disks ... AND THERE it was ... the sudden stuck light on drive two.

So yes, it is great to have a raid and a hot spare plugged in, but it is not a fail safe.

If you got bad feelings about the machine make sure your backups are up to date and ok.
 
How old is the hardware? What kind of hardware? Single disk or Raid?

Yes a bit more of detail would help.
Disks are the most at risk, if in raid, use a spare unit, memory can also fault, and both can make systems go crazy, and power units too (better if redundant and hot swap), and a UPS will help to stabilize power too.
The best thing would be have a "mirror" node (identical) to be able to find out what exactly is running bad.
However, if you have a cluster of even 2 nodes, with shared storage, and good (tested) backups, you should be fine. When disaster happens, you will have a short downtime for vm on the failed node.

Marco
 
I discovered that my UPS software on pve host sometimes causes kernel panics:
1. if i disconnect USB keyboard or
2. if for some reason USB voltage drops.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!