Running VMs shows as stopped

tores

Member
Nov 12, 2014
8
0
21
Hi everyone!

About 6 months ago we purchased brand new servers, installed Proxmox and started migrating our existing KVM virtual servers to the Proxmox platform.
We have 3 frontends running Proxmox in a cluster, all attached to a SAN running Nexenta.
Everything has been working smoothly, but a few days ago when I started to migrate an old server to the new system something weird happened.
In the Proxmox GUI, 2 vm's (running on the same server) show status as stopped, even though they are running just fine.
They show as stopped even if I log in to either of the two other Proxmox frontends.

I am afraid to do something right now in case of data corruption if I attempt to stop, start or migrate over to one of the other frontends.

Would a simple reboot from inside the vm's give them status running again?
I wanted to try the forums before I open a ticket, in case anyone else runs in to the same issue later on.

Note that the vm's running on the two other servers still show as running in the Proxmox GUI.

Hope anyone can shed some light on this as it is quite confusing.

Thank you in advance boys and girls!

-Tore
 
I tried that command yesterday.

Just now I noticed that under "Services" on the node everything except SMTP has status stopped.
Tried restart pve-cluster, but got the following error:
root@vmfe01:/# /etc/init.d/pve-cluster restartRestarting pve cluster filesystem: pve-cluster[main] notice: unable to aquire pmxcfs lock - trying again
[main] crit: unable to aquire pmxcfs lock: Resource temporarily unavailable
[main] notice: exit proxmox configuration filesystem (-1)
(warning).


Also, when I ssh into the box, it show the hostname and kernel of the virtual machine I recently tried to migrate over. However, when I do 'hostname' or 'uname' it displays correctly. Tried restarting SSH as well as unmounting the filesystems of the vm I was going to migrate.
A reboot would maybe do the trick, but as I mentioned in my first post I'm scared shitless of data corruption if something goes wrong.
Of course I have backup, but still.
 
Also, when I ssh into the box, it show the hostname and kernel of the virtual machine I recently tried to migrate over.

A wrong hostname is likely the reason for that problem. You need to fix that first. Then I would restart the node.
 
As I mentioned the hostname is correct after I login.
But if I just reboot the node is there a chance of corruption of the running vms?

Or could I live-migrate them to the other nodes (that would maybe not work since there state shows as stopped)?
 
I'm sorry, could have been clearer.
When I ssh in to the box:
tore@tore-pc:~$ vmfe01
Linux intern01 3.2.0-4-amd64 #1 SMP Debian 3.2.63-2 x86_64

But when I'm logged in:
root@vmfe01:~# uname -a
Linux vmfe01 2.6.32-32-pve #1 SMP Thu Aug 21 08:50:19 CEST 2014 x86_64 GNU/Linux
root@vmfe01:~# hostname
vmfe01

There it is correct.
vmfe01 is the correct hostname of the node.
intern01 that comes up right when I log in is the virual machine which I tried to migrate over.
It may be that the way we are migrating is a bit difficult, but it has worked in the past.

Just to include the process we are doing it:
Make a new iscsi target on the SAN, and create a new vm on the nodes from the Proxmox gui.
We mount the iscsi target and partition the drives, then rsync everything over from the running vm on the old platform.
Then we shut the old one down, mount /proc /dev /sys and chroot into the newly created vm, reinstall grub, edit fstab to reflect the changes and then boot the new vm.

Is there an easier way to migrate from the old setup?
The old one is just running qemu/libvirt on 3 nodes connected to a different and old SAN.
 
Perhaps.
I guess we'll just have to reboot the node to see if that makes things right again.
Thanks for the help so far!
 
Has HA been configured and enabled anywhere in this cluster? Reason I ask is I got stung once by enabling HA on running VMs, and it was kinda the way you describe. I showed the machines down but they were still running somewhat invisible to the GUI...
 
HA is configured and enabled for all vms running, so nothing new there. I still have not rebooted the node, I'm waiting until the next maintenance window.
Thanks again!
 
HA is configured and enabled for all vms running, so nothing new there. I still have not rebooted the node, I'm waiting until the next maintenance window.
Thanks again!

Just something to check... at the time (not sure it has changed) it would let you enable HA while the VM was running.. Which I later found out it not the right way...In my case I would start the VM thinking it was down and start getting warnings about the disk within them being corrupted. Kind of wiped out a half dozen VMs before I narrowed it down... Good luck...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!