problems after power cut

svennd

Renowned Member
Aug 4, 2014
51
6
73
Hey,

We have had a power cut and the proxmox machine has some problems after this. It boots, and finishes with :
INIT : no more processes left in this runlevel
The website *:8006 is not online, and the VM and containers are not online (or atleast not all)

I found some relevant links but that was when a container ran in this problem this is the host system running. I tried :
- recovery mode fsck everything (no problems)
- check inittab (seems normal)
- ran update & upgrade
- ran service pve-cluster restart

I got the website back online, but the password seems wrong... where should I start to look for the problem ?


For openVZ I found this to work (in recovery, after manually starting network & ssh) :
/etc/init.d/vz start
/etc/init.d/pve-cluster restart
vzctl start ID_of_container


SvennD
 
Last edited:
Thx for your help, well in "normal mode" it doesn't go down to shell so I cant give those... in recovery mode the results are : (note that i have started some myself)

runlevel
Code:
Unknown

service --status-all
Code:
 [ - ]  atd [ - ]  bootlogd
 [ - ]  bootlogs
 [ ? ]  bootmisc.sh
 [ ? ]  checkfs.sh
 [ ? ]  checkroot-bootclean.sh
 [ - ]  checkroot.sh
 [ - ]  clvm
 [ - ]  cman
 [ ? ]  console-screen.sh
 [ - ]  cpglockd
 [ - ]  cron
 [ ? ]  hdparm
 [ - ]  hostname.sh
 [ ? ]  hwclock.sh
 [ - ]  keymap.sh
 [ ? ]  killprocs
 [ ? ]  kmod
 [ - ]  ksmtuned
 [ - ]  lvm2
 [ - ]  motd
 [ ? ]  mountall-bootclean.sh
 [ ? ]  mountall.sh
 [ ? ]  mountdevsubfs.sh
 [ ? ]  mountkernfs.sh
 [ ? ]  mountnfs-bootclean.sh
 [ ? ]  mountnfs.sh
 [ ? ]  mtab.sh
 [ - ]  munin-node
 [ ? ]  networking
 [ + ]  nfs-common
 [ - ]  ntp
 [ + ]  open-iscsi
 [ - ]  postfix
 [ - ]  procps
 [ + ]  pve-cluster
 [ ? ]  pve-manager
 [ ? ]  pvebanner
 [ ? ]  pvedaemon
 [ ? ]  pvenetcommit
 [ ? ]  pveproxy
 [ ? ]  pvestatd
 [ ? ]  qemu-server
 [ ? ]  rc.local
 [ - ]  rgmanager
 [ - ]  rmnologin
 [ + ]  rpcbind
 [ - ]  rrdcached
 [ - ]  rsync
 [ - ]  rsyslog
 [ ? ]  sendsigs
 [ ? ]  spiceproxy
 [ + ]  ssh
 [ - ]  stop-bootlogd
 [ - ]  stop-bootlogd-single
 [ + ]  udev
 [ ? ]  udev-mtab
 [ ? ]  umountfs
 [ ? ]  umountiscsi.sh
 [ ? ]  umountnfs.sh
 [ ? ]  umountroot
 [ - ]  urandom
 [ + ]  vz
 [ + ]  vzeventd
 [ - ]  x11-common

ps aux | grep pve
Code:
oot       13713 10.0  3.1 2625836 2105884 ?     Sl   17:02  24:32 /usr/bin/kvm -id 104 -chardev socket,id=qmp,path=/var/run/qemu-server/104.qmp,server,nowait -mon chardev=qmp,mode=control -vnc unix:/var/run/qemu-server/104.vnc,x509,password -pidfile /var/run/qemu-server/104.pid -daemonize -name windc -smp sockets=1,cores=2 -nodefaults -boot menu=on -vga std -no-hpet -cpu kvm64,hv_spinlocks=0xffff,hv_relaxed,+lahf_lm,+x2apic,+sep -k en-us -m 2048 -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -drive file=/var/lib/vz/template/iso/virtio-win-0.1-74.iso,if=none,id=drive-ide2,media=cdrom,aio=native -device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200 -drive file=/var/lib/vz/images/104/vm-104-disk-1.qcow2,if=none,id=drive-virtio0,format=qcow2,aio=native,cache=none -device virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100 -drive file=/var/lib/vz/template/iso/winserver2008r2.iso,if=none,id=drive-ide0,media=cdrom,aio=native -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=201 -netdev type=tap,id=net0,ifname=tap104i0,script=/var/lib/qemu-server/pve-bridge -device e1000,mac=FE:DA:7D:F9:E5:8F,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300 -rtc driftfix=slew,base=localtime -global kvm-pit.lost_tick_policy=discard
root       32742  0.0  0.0   7788   928 pts/0    S+   21:05   0:00 grep pve

I started services : ssh, pve-cluster, vz and loaded module: kvm, kvm-intel, kvm-amd

This is pveversion -v
Code:
proxmox-ve-2.6.32: 3.2-132 (running kernel: 2.6.32-31-pve)pve-manager: 3.2-4 (running version: 3.2-4/e24a91c1)
pve-kernel-2.6.32-30-pve: 2.6.32-130
pve-kernel-2.6.32-29-pve: 2.6.32-126
pve-kernel-2.6.32-31-pve: 2.6.32-132
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.5-1
pve-cluster: 3.0-12
qemu-server: 3.1-16
pve-firmware: 1.1-3
libpve-common-perl: 3.0-18
libpve-access-control: 3.0-11
libpve-storage-perl: 3.0-19
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-6
vzctl: 4.0-1pve5
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.7-8
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.2-1

uname -r
Code:
2.6.32-31-pve
 
Last edited:
Hey,

In normal mode, I get no shell just the INIT : no more processes left in this runlevel; Nothing more... it ends there ...

Happy holiday! hopefully someone can help me point the correct direction ... :(

Svenn
 
/etc/inittab
Code:
# /etc/inittab: init(8) configuration.# $Id: inittab,v 1.91 2002/01/25 13:35:21 miquels Exp $


# The default runlevel.
id:2:initdefault:


# Boot-time system configuration/initialization script.
# This is run first except when booting in emergency (-b) mode.
si::sysinit:/etc/init.d/rcS


# What to do in single-user mode.
~~:S:wait:/sbin/sulogin


# /etc/init.d executes the S and K scripts upon change
# of runlevel.
#
# Runlevel 0 is halt.
# Runlevel 1 is single-user.
# Runlevels 2-5 are multi-user.
# Runlevel 6 is reboot.


l0:0:wait:/etc/init.d/rc 0
l1:1:wait:/etc/init.d/rc 1
l2:2:wait:/etc/init.d/rc 2
l3:3:wait:/etc/init.d/rc 3
l4:4:wait:/etc/init.d/rc 4
l5:5:wait:/etc/init.d/rc 5
l6:6:wait:/etc/init.d/rc 6
# Normally not reached, but fallthrough in case of emergency.
z6:6:respawn:/sbin/sulogin


# What to do when CTRL-ALT-DEL is pressed.
ca:12345:ctrlaltdel:/sbin/shutdown -t1 -a -r now


# Action on special keypress (ALT-UpArrow).
#kb::kbrequest:/bin/echo "Keyboard Request--edit /etc/inittab to let this work."


# What to do when the power fails/returns.
pf::powerwait:/etc/init.d/powerfail start
pn::powerfailnow:/etc/init.d/powerfail now
po::powerokwait:/etc/init.d/powerfail stop


# /sbin/getty invocations for the runlevels.
#
# The "id" field MUST be the same as the last
# characters of the device (after "tty").
#
# Format:
#  <id>:<runlevels>:<action>:<process>
#
# Note that on most Debian systems tty7 is used by the X Window System,
# so if you want to add more getty's go ahead but skip tty7 if you run X.
#
#1:2345:respawn:/sbin/getty 38400 tty1
#2:23:respawn:/sbin/getty 38400 tty2
#3:23:respawn:/sbin/getty 38400 tty3
#4:23:respawn:/sbin/getty 38400 tty4
#5:23:respawn:/sbin/getty 38400 tty5
#6:23:respawn:/sbin/getty 38400 tty6


# Example how to put a getty on a serial line (for a terminal)
#
#T0:23:respawn:/sbin/getty -L ttyS0 9600 vt100
#T1:23:respawn:/sbin/getty -L ttyS1 9600 vt100


# Example how to put a getty on a modem line.
#
#T3:23:respawn:/sbin/mgetty -x0 -s 57600 ttyS3
 
Sorry it took so long to reply, we just moved our office and I've been super busy.

The lines that setup the console are commented out. I don't have access to a machine to compare but try uncommenting the lines around this area:
#1:2345:respawn:/sbin/getty 38400 tty1

If you have a working Proxmox server just compare and make the necessary changes to this file.

A crash would not have caused this file to suddenly get some # added to it. Any idea how this file got modified?
 
Hey,

-holiday break-

That file was different from default install, we have a test server, and all the tty's there where not commented; I uncommented them, and i'm planning to reboot to try if it works... I have no idea what could have commented those; Though, could it be done in an update ? (seems most plausible idea so far...)

(no reboot so far)
 
This fix worked, however I have upgraded the server multiple times and now I was finally able to reboot multiple times to check if this problem was still there ... It wasn't, only the network interfaces (/etc/network/interfaces) was a bit messed up ... So if someone has a simular issue,

1:2345:respawn:/sbin/getty 38400 tty1
2:23:respawn:/sbin/getty 38400 tty2
3:23:respawn:/sbin/getty 38400 tty3
4:23:respawn:/sbin/getty 38400 tty4
5:23:respawn:/sbin/getty 38400 tty5
6:23:respawn:/sbin/getty 38400 tty6

should be uncommented, (I have no idea, who or why they where commented in the first place).

Thx all for your help!

Svenn