vm state frozen in webui

offerlam

Renowned Member
Dec 30, 2012
218
0
81
Denmark
Ok guys..

This is wierd...

I have a vm 116 that, in the web gui, is showen as turned off.

The issue started when the backup failed with this error:

116 Vtiger01 FAILED 00:00:02 Device 'drive-virtio0' has no medium

The server seems to work I can access the web interface its serving but slowly and seemed buggy..

So i went to the proxnode where it was located and went

qm unlock 116

This seemed to go ok - it retuned no error.

I than went to the webgui and told it to shutdown the vm.. this errored.. but now the web interface shows the vm as offline

I couldn't start it from web so i figured i would do it from CLI using QM

root@proxmox01:~# qm start 116
Executing HA start for VM 116
Member proxmox01 trying to enable pvevm:116...Success
Warning: pvevm:116 is now running on proxmox00

As you can see it says it went ok. But the web still shows the VM as offline. Also as you can see it says the vm is now running on proxmox00 but using QM list i can see its still located on proxmox01

i don't know how to fix this now? it looks something is out of sync but i don't know how to trouplshoot it further. The syslog says nothing.

Help me obi wan kenobi,

you are my only hope!
 
Hello offerlam,

What I would do:

1. Check if cluster communication works well (by "clustat", "pvecm status" and "pvecm nodes")

2. Figure out where the VMs are really located currently, e.g. by

Code:
ls /etc/pve/nodes/*/qemu-server/

and

Code:
qm list

at all nodes.

3. Check if the assigned storage is available on the node where the VM is located

4. Remove (temporarily) HA for that VM in order to reduce complexity

If all the above is ok the VM should work - if not there is a problem inside the VM

Kind regards

Mr.Holmes
 
Hello offerlam,

What I would do:

1. Check if cluster communication works well (by "clustat", "pvecm status" and "pvecm nodes")

2. Figure out where the VMs are really located currently, e.g. by

Code:
ls /etc/pve/nodes/*/qemu-server/

and

Code:
qm list

at all nodes.

3. Check if the assigned storage is available on the node where the VM is located

4. Remove (temporarily) HA for that VM in order to reduce complexity

If all the above is ok the VM should work - if not there is a problem inside the VM

Kind regards

Mr.Holmes

Hi Mr Holmes

and thanks for answering!

1.

Cluster looks ok

root@proxmox01:~# clustat
Cluster Status for DingITCluster @ Wed Jan 7 08:14:08 2015
Member Status: Quorate

Member Name ID Status
------ ---- ---- ------
proxmox00 1 Online, rgmanager
proxmox01 2 Online, Local, rgmanager
proxmox02 3 Online, rgmanager

Service Name Owner (Last) State
------- ---- ----- ------ -----
pvevm:101 proxmox01 started
pvevm:102 proxmox01 started
pvevm:103 proxmox01 started
pvevm:105 proxmox01 started
pvevm:107 proxmox01 started
pvevm:109 proxmox01 started
pvevm:110 proxmox01 started
pvevm:112 proxmox01 started
pvevm:113 proxmox01 started
pvevm:114 proxmox00 started
pvevm:116 proxmox00 started
pvevm:117 proxmox01 started
root@proxmox01:~# pvecm status
Version: 6.2.0
Config Version: 52
Cluster Name: DingITCluster
Cluster Id: 44340
Cluster Member: Yes
Cluster Generation: 175552
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Total votes: 3
Node votes: 1
Quorum: 2
Active subsystems: 6
Flags:
Ports Bound: 0 177
Node name: proxmox01
Node ID: 2
Multicast addresses: 239.192.173.225
Node addresses: 10.10.99.21
root@proxmox01:~# pvecm nodes
Node Sts Inc Joined Name
1 M 175552 2014-10-27 00:11:50 proxmox00
2 M 175520 2014-10-27 00:05:22 proxmox01
3 M 175548 2014-10-27 00:09:45 proxmox02

2.

the vm id is 116 and its really located on proxmox01 which is also what the GUI says. Its the qm start command on proxmox01 that says the vm is not there but on proxmox00

root@proxmox01:~# ls /etc/pve/nodes/proxmox01/qemu-server/
101.conf 103.conf 107.conf 110.conf 112.conf 113.conf.tmp.19564 116.conf
102.conf 105.conf 109.conf 111.conf 113.conf 115.conf 117.conf
root@proxmox01:~# ls /etc/pve/nodes/proxmox00/qemu-server/
100.conf 104.conf 108.conf 114.conf 114.conf.tmp.70575 118.conf
root@proxmox01:~# ls /etc/pve/nodes/proxmox03/qemu-server/
ls: cannot access /etc/pve/nodes/proxmox03/qemu-server/: No such file or directory
root@proxmox01:~# qm list
VMID NAME STATUS MEM(MB) BOOTDISK(GB) PID
101 Zentyal4test running 2048 200.00 6578
102 Centreon02 running 1024 50.00 6585
103 Owncloud01 running 4096 100.00 6554
105 INSWeb01 running 512 32.00 6600
107 INSWeb02 running 512 32.00 6700
109 Munkegaarden running 1024 200.00 6756
110 INSDataOpsamler01 running 8192 500.00 6809
111 Atriumhus running 1024 200.00 943490
112 Centreon01 running 2046 10.00 6580
113 Openchange01 running 512 200.00 49416
115 Observium01 running 1024 32.00 1016423
116 Vtiger01 stopped 1024 32.00 0
117 Zimbra01 running 6144 1024.00 6825
root@proxmox01:~#

3.
Storage is avalible

4.
Done

After all of this the vm 116 was still listed with qm list as stopped but i could access the vms web interface so clearly it wasn't.

I did a qm stop 116 which did nothing.

I than did a qm migrate 116 proxmox00 which was successfull and NOW in both the webgui and qm list of proxmox00 the vm was listed as runing.

So i went qm stop 116 and than qm migrate 116 proxmox01 - wanting to try and turn it on and show on in both qm list and webui of proxmox01

the migration was successfull..

and the vm is now listed both in webui and qm list of proxmox01 to be offline. GREAT!! now it seems webui and proxmox cli are in sync again.

So i tried qm start 116

which errored with this..

kvm: -drive file=/mnt/pve/Storage01_Vms/images/116/vm-116-disk-1.qcow2,if=none,i d=drive-virtio0,format=qcow2,aio=native,cache=none: could not open disk image /m nt/pve/Storage01_Vms/images/116/vm-116-disk-1.qcow2: qcow2: Image is corrupt; ca nnot be opened read/write
start failed: command '/usr/bin/kvm -id 116 -chardev 'socket,id=qmp,path=/var/ru n/qemu-server/116.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -vnc unix:/ var/run/qemu-server/116.vnc,x509,password -pidfile /var/run/qemu-server/116.pid -daemonize -name Vtiger01 -smp 'sockets=1,cores=1' -nodefaults -boot 'menu=on' - vga qxl -cpu host,+x2apic -k da -spice 'tls-port=61007,addr=127.0.0.1,tls-cipher s=DES-CBC3-SHA,seamless-migration=on' -device 'virtio-serial,id=spice,bus=pci.0, addr=0x9' -chardev 'spicevmc,id=vdagent,name=vdagent' -device 'virtserialport,ch ardev=vdagent,name=com.redhat.spice.0' -m 1024 -device 'piix3-usb-uhci,id=uhci,b us=pci.0,addr=0x1.0x2' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x 3' -drive 'file=/mnt/pve/Storage01_ISO/template/iso/ubuntu-12.04.5-server-amd64. iso,if=none,id=drive-ide2,media=cdrom,aio=native' -device 'ide-cd,bus=ide.1,unit =0,drive=drive-ide2,id=ide2,bootindex=200' -drive 'file=/mnt/pve/Storage01_Vms/i mages/116/vm-116-disk-1.qcow2,if=none,id=drive-virtio0,format=qcow2,aio=native,c ache=none' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr =0xa,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap116i0,script=/var/lib/qe mu-server/pve-bridge,vhost=on' -device 'virtio-net-pci,mac=BA:32:F8:65:88:80,net dev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' -machine 'type=pc-i440fx-1.7 '' failed: exit code 1

I have NO idea what to do now? apparently we are at a state now where the vm won't run on proxmox01 due to the error in the quote above but i WILL run on proxmox00 - I assume it will also work on proxmox02

And ideas?

THANKS!
 
I tried to move the VM back to proxmox00 since it was running there to work with it from the service it provides from its webserver...

when i did that I know get the same error when trying to start it on proxmox00

so the situration is now that i won't run on any nodes...
 
Hello offerlam

Have a look at the end of the following part of your error message

Code:
kvm: -drive file=/mnt/pve/Storage01_Vms/images/116/vm-116-disk-1.qcow2,if=none,id=drive-virtio0,format=qcow2,aio=native,cache=none: could not open disk image /mnt/pve/Storage01_Vms/images/116/vm-116-disk-1.qcow2: qcow2: Image is corrupt

youe see qcow2: Image is corrupt

- the situation is quite clear, the question is

- how it happened?

- what to do now?

Difficult to say in both cases. When put this text into google you may find some hints. For repair e.g. something with sounds very simple, but I have no idea if it can help in you case. However, the link is: http://b.killerbeaver.net/post/15373927161/fixing-corrupted-qcow2-disk-images

Or you have a recent backup!?

Good luck!

Mr.Holmes
 
Last edited:
Hi Mr. Holmes

Thanks for your help.. I had a snapshot.. but that didn't work.. so im gonna delete it and start over.. it was not a vm in production so its ok..

Thanks for all your help mate!!!

Hugz and kisses
Casper
;)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!