HELP - having a nightmare with client's production server!

gob

Renowned Member
Aug 4, 2011
69
2
73
Chesterfield, United Kingdom
Hi

I've been up all night trying to fix a customers Proxmox server but lack of sleep is making it hard to focus now.

They have a single DELL server running PVE 2.3 with three local RAID arrays.
There are two windows 2008 servers running on the host, each having a virtual disk on each of the three RAID volumes.
System has been running fine for months.
Last night during the backup, on of the servers completely locked up and I had to force a power off of the VM. restarting it would get to the loading windows animation then hang. The other server seemed to function and reboot fine.
After a bit of digging around I discovered that /var/lib/vz had 0% disk space left. I don't remember allocating all of the space to the two servers and the only things on there are the boot disks of the two windows VMs.
I planned to run another snapshot backup of one of the servers and restore it to the larger disk outside of /var/lib/vz but the backup kept failing and I was unable to swap media as I was working remotely.
So the next idea was to upgrade to PVE 3.1 and use the Storage Migration tool.
The upgrade went smooth and it is now running at 3.1-3.
I performed a storage migration of the server that was working fine but when I then tried to start the server on the new storage i got BSOD Phase0_Exception error and Windows wouldn't load.
After a reboot of the host and a refresh of my browser I now find that I cannot open a java console to the server to view the vm desktop.
I have tried Chrome, IE and Firefox but each time I open a console the browser hangs.

I am at a loss as what to try next and would appreciate any pointers.

thanks
Gordon
 
Think I have sorted it.
Resolved the hanging browser + console issue by upgrading java,
Having the console allowed me to see that the server was trying to boot into system recovery which was why I couldn't access the VM over the network.
The Phase0_Exception seems to have just sorted itself out after a few reboots.
Clearing the old disk from /var/lib/vz/images freed up 56Gb which then allowed the other VM to start without issue.

A bit more testing - then my bed!
 
FYI in the future if you have trouble with the Java console, you can enable VNC directly to the vm by editing the

/etc/pve/local/qemu-server/<VMID>.conf

Where <VMID> is your VM's ID, e.g. 101.
Then put this in the conf file
args: -vnc 0.0.0.0:<100+VMID>

Where <100+VMID> is your VMID + 100, e.g. 201.

Then make sure your VM is entirely shut down (e.g. stop). Then start it again and use your favorite VNC Client to VNC to
<YOUR_PROXMOX_SERVER_IP>:<5900+100+VMID>
e.g. 192.168.1.20:6101.

More details here:
http://pve.proxmox.com/wiki/Vnc_2.0...d_vnc_clients_.28Including_iOS_and_Android.29

Note that this will make your java console stop working entirely, as it's redirecting the kvm vnc port. But at least it'll get you into the VM so you can do what you need to. I use this method exclusively since the java console is kinda wonky.
 
Last edited:
Hi

I've been up all night trying to fix a customers Proxmox server but lack of sleep is making it hard to focus now.

Glad you got it worked out!
Long hours and lack of sleep is the worst part of the IT industry, been there done that myself too many times. You have my sympathy.

Something that might help someone recover from similar problems in the future:
With Windows machines I usually put the OS on the C:\ and all non-OS applications and data on another disk, I will always do this from now on.

The other day I tried to update the virtIO driver on one VM, reboot the VM and all I can get it to do is BSOD.
We restored the latest Proxmox backup for that VM then we copied the data disk from the BSOD VM into the restored VM.
Restored VM booted up just fine.
No data loss other than whatever log files had been written by the OS since last backup.

If I had the OS and data/apps on one disk it would have been much harder to restore and not loose the data generated after the latest backup.
 
I usually put the OS on the C:\ and all non-OS applications and data on another disk, I will always do this from now on.

Yes indeed. Why waste time recovering the data, after all - it's the OS that blue screens, not the data. So it's the OS partition/drive that we want to recover as quickly as possible.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!