all KVM VMs suddenly shut down

R

RomanV

Guest
Hi everyone.

Yesterday I found all my 7 KVM VMs stopped.
And now I can't start any of them. From web interface is says Status - OK and nothing happens, from terminal qm start 100 doesn't work either.
I found nothing interesting about reasons in logs.

This is my home server and its OK if VMs is down some amount of time, but I have plans to use proxmox in production and perspective to get that situation scares me a little.

I didn't reboot proxmox so it's still in same state.

Can anyone help investigate reason and understand why it happend?
Any help is appreciated.

Code:
# pveversion -v 
pve-manager: 2.2-32 (pve-manager/2.2/3089a616)
running kernel: 2.6.32-17-pve
proxmox-ve-2.6.32: 2.2-83
pve-kernel-2.6.32-16-pve: 2.6.32-82
pve-kernel-2.6.32-17-pve: 2.6.32-83
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-34
qemu-server: 2.0-72
pve-firmware: 1.0-21
libpve-common-perl: 1.0-41
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.3-10
ksm-control-daemon: 1.1-1
 
Last edited by a moderator:
What output do you get? Any error message?

Nothing at all.
Code:
# qm start 100
#

Here is what appears in syslog:
Code:
Jan 31 13:45:31 proxmox qm[560515]: <root@pam> starting task UPID:proxmox:00088D84:07955126:510A3D3B:qmstart:100:root@pam:
Jan 31 13:45:31 proxmox qm[560516]: start VM 100: UPID:proxmox:00088D84:07955126:510A3D3B:qmstart:100:root@pam:
Jan 31 13:45:31 proxmox kernel: device tap100i0 entered promiscuous mode
Jan 31 13:45:31 proxmox kernel: vmbr0: port 2(tap100i0) entering forwarding state
Jan 31 13:45:32 proxmox qm[560515]: <root@pam> end task UPID:proxmox:00088D84:07955126:510A3D3B:qmstart:100:root@pam: OK
Jan 31 13:45:32 proxmox kernel: vmbr0: port 2(tap100i0) entering disabled state
Jan 31 13:45:32 proxmox kernel: vmbr0: port 2(tap100i0) entering disabled state
 
You changes something recently? If not, I can imagine that there is something wrong with the hardware. Does 'dmesg' give any hints?

I didn't change anything serious enough to be the reason.

I don't think that is something with hardware because host system works fine, only VMs down. And I think they will start if I reboot host.

What could be with hardware that affected only VMs?
It could be HDD, but SMART says anything is fine and I can read all VM's raw disks.

In dmesg I didn't find something that could help.
There is messages like this
Code:
EXT4-fs (dm-1): Unaligned AIO/DIO on inode 17956875 by kvm; performance will be poor.
In this thread noone knows what it means...
I suppose nothing serious.

There is also some amount messages like this:
Code:
kvm: 69443: cpu0 unhandled rdmsr: 0xc0010001
kvm: 69443: cpu0 unhandled rdmsr: 0xc0010001
kvm: 69619: cpu0 unhandled rdmsr: 0xc0010001
kvm: 69443: cpu0 unhandled rdmsr: 0xc0010001
May be that's it?
 
No, that is harmless. Maybe you can strace the kvm proccess ("qm showcmd <vmid>" gives you the command line to start with).

Unfortunately already rebooted server when saw that.

After reboot everything works fine.

Tested RAM with memtest, it looks good.

Next time will try strace, thanks for your help.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!