Running VMs shows as stopped

tores · Nov 12, 2014

Hi everyone!

About 6 months ago we purchased brand new servers, installed Proxmox and started migrating our existing KVM virtual servers to the Proxmox platform.
We have 3 frontends running Proxmox in a cluster, all attached to a SAN running Nexenta.
Everything has been working smoothly, but a few days ago when I started to migrate an old server to the new system something weird happened.
In the Proxmox GUI, 2 vm's (running on the same server) show status as stopped, even though they are running just fine.
They show as stopped even if I log in to either of the two other Proxmox frontends.

I am afraid to do something right now in case of data corruption if I attempt to stop, start or migrate over to one of the other frontends.

Would a simple reboot from inside the vm's give them status running again?
I wanted to try the forums before I open a ticket, in case anyone else runs in to the same issue later on.

Note that the vm's running on the two other servers still show as running in the Proxmox GUI.

Hope anyone can shed some light on this as it is quite confusing.

Thank you in advance boys and girls!

-Tore

dietmar · Nov 12, 2014

Does it help if you restart pvestatd on that server?

# service pvestatd restart

tores · Nov 12, 2014

I tried that command yesterday.

Just now I noticed that under "Services" on the node everything except SMTP has status stopped.
Tried restart pve-cluster, but got the following error:

root@vmfe01:/# /etc/init.d/pve-cluster restartRestarting pve cluster filesystem: pve-cluster[main] notice: unable to aquire pmxcfs lock - trying again
[main] crit: unable to aquire pmxcfs lock: Resource temporarily unavailable
[main] notice: exit proxmox configuration filesystem (-1)
(warning).

Also, when I ssh into the box, it show the hostname and kernel of the virtual machine I recently tried to migrate over. However, when I do 'hostname' or 'uname' it displays correctly. Tried restarting SSH as well as unmounting the filesystems of the vm I was going to migrate.
A reboot would maybe do the trick, but as I mentioned in my first post I'm scared shitless of data corruption if something goes wrong.
Of course I have backup, but still.

dietmar · Nov 12, 2014

tores said:
Also, when I ssh into the box, it show the hostname and kernel of the virtual machine I recently tried to migrate over.

A wrong hostname is likely the reason for that problem. You need to fix that first. Then I would restart the node.

tores · Nov 13, 2014

As I mentioned the hostname is correct after I login.
But if I just reboot the node is there a chance of corruption of the running vms?

Or could I live-migrate them to the other nodes (that would maybe not work since there state shows as stopped)?

dietmar · Nov 13, 2014

tores said:
As I mentioned the hostname is correct after I login.

You wrote "it show the hostname and kernel of the virtual machine" - I am confused now.

tores · Nov 13, 2014

I'm sorry, could have been clearer.
When I ssh in to the box:

tore@tore-pc:~$ vmfe01
Linux intern01 3.2.0-4-amd64 #1 SMP Debian 3.2.63-2 x86_64

But when I'm logged in:

root@vmfe01:~# uname -a
Linux vmfe01 2.6.32-32-pve #1 SMP Thu Aug 21 08:50:19 CEST 2014 x86_64 GNU/Linux
root@vmfe01:~# hostname
vmfe01

There it is correct.
vmfe01 is the correct hostname of the node.
intern01 that comes up right when I log in is the virual machine which I tried to migrate over.
It may be that the way we are migrating is a bit difficult, but it has worked in the past.

Just to include the process we are doing it:
Make a new iscsi target on the SAN, and create a new vm on the nodes from the Proxmox gui.
We mount the iscsi target and partition the drives, then rsync everything over from the running vm on the old platform.
Then we shut the old one down, mount /proc /dev /sys and chroot into the newly created vm, reinstall grub, edit fstab to reflect the changes and then boot the new vm.

Is there an easier way to migrate from the old setup?
The old one is just running qemu/libvirt on 3 nodes connected to a different and old SAN.

dietmar · Nov 13, 2014

tores said:
I'm sorry, could have been clearer.
When I ssh in to the box:

Maybe you have an IP address conflict?

tores · Nov 13, 2014

Perhaps.
I guess we'll just have to reboot the node to see if that makes things right again.
Thanks for the help so far!

davlaw · Nov 17, 2014

Has HA been configured and enabled anywhere in this cluster? Reason I ask is I got stung once by enabling HA on running VMs, and it was kinda the way you describe. I showed the machines down but they were still running somewhat invisible to the GUI...

tores · Nov 17, 2014

HA is configured and enabled for all vms running, so nothing new there. I still have not rebooted the node, I'm waiting until the next maintenance window.
Thanks again!

davlaw · Nov 17, 2014

tores said:
HA is configured and enabled for all vms running, so nothing new there. I still have not rebooted the node, I'm waiting until the next maintenance window.
Thanks again!

Just something to check... at the time (not sure it has changed) it would let you enable HA while the VM was running.. Which I later found out it not the right way...In my case I would start the VM thinking it was down and start getting warnings about the disk within them being corrupted. Kind of wiped out a half dozen VMs before I narrowed it down... Good luck...

Search

Search

Running VMs shows as stopped

tores

Member

dietmar

Proxmox Staff Member

tores

Member

dietmar

Proxmox Staff Member

tores

Member

dietmar

Proxmox Staff Member

tores

Member

dietmar

Proxmox Staff Member

tores

Member

davlaw

Renowned Member

tores

Member

davlaw

Renowned Member