Server Crisis, Advice welcomed!

cbad

New Member
Jun 25, 2017
6
0
1
47
Hi-

We run a server with an old version of proxmox ve (2.6.32-7-pve) and have been for a number of years pretty seamlessly. It has 6 VMs running. Unfortunately the guy who maintained/managed it left fairly recently, and can't be reached at the moment. We now have no one with proxmox experience.

What makes it a crisis is we had a hard drive crash on Friday that has taken us down and we desperately need to get the site back up. The drive in question seems to be mainly used for backups and ISOs. We prepared it as a typical linux device and partition, mounted it with the same name as the old drive, but that doesn't seem to be enough.

If anyone can point us to where we can find how out how to prepare the disk like the old one for use with proxmox, it would be extremely helpful. I imagine that requires us to figure out how our environment is expecting to use the disk, so any tips on how to find that out would be helpful too. We aren't exactly sure all the contexts in which the drive is being used. We're mostly working at the command line with an offline server at the moment.

Here are some things we think might be helpful for anyone who might like to help:

- The old partition name was /dev/pve/data2 and was mounted as /mnt/data2. Our new one is /dev/sdb1 and is mounted as /mnt/data2.

- There seem to be 3 logical drives maintained on it by proxmox. They are ISOs, Backup and lvm_pve_data2. The first two are of type Directory, the third is LVM. The LVM is not active. We have one other LVM (VirtDisk) that is active (we believe this resides on a different hd). ISOs and Backup are active and directories were created on the new hd (/mnt/data2).

- We have 6 VMs, only one of which cannot be started.

- We have two services that are stopped and can't be started. They are ClusterSync and ClusterTunnel.

- The WWW service is running and the web site can be reached. However, the https portion of the web site resides on the VM that cannot be started and hence cannot be reached.

- The SSH service is running, but we can't SSH into the server.

- We also suspect this faulty drive may also be used as a swap disk.

- Here's what our storage.cfg looks like:

dir: Local
path /var/lib/vz
content images,iso,vztmpl,rootdir,

lvm: VirtDisk
vgname pve
content images

lvm: lvm_pve2_backup
vgname pve_2
content images

dir: ISOs
path /mnt/data2/iso
content iso

dir: Backup
path /mnt/data2/backup
content backup

It all seems related to the drive that crashed, so we are trying to rebuild a drive that will fill its role in the system.

Many thanks to anyone who has read this far, and even more to anyone who has the time and inclination to help. Let me know what more information I can provide.

Thanks!

-cbad
 
The most important question to ask is this: are you able to access the virtual disk images? since you have logical volumes for both virtdisk and backup I guess that would be a yes, unless they were all residing on the same VG which was housed on the bad disk...

I would STRONGLY suggest to work without modifying anything on the original instance (so you can always get back to square one if recovery is not successful,) which would necessitate a second machine with sufficient storage to accept all your virtual disk images. you can use dd to duplicate your virtual disks to the new target; This may be your opportunity to upgrade to a more current version of proxmox (may also be worthwhile to switch your backing store to zfs at this time; its more complicated to move your virtual disks to zvols but not impossible.)

Once you have a working proxmox server with the virtual disks available, its only a matter of recreating the virtual machines and attaching their disks. once you're up and running, you can decide how to proceed with complete decommissioning of your old hardware or move the new installation to the old hardware as applicable.
 
Check the VM which doesn't start.. under hardware config.. check for CD I'll lay bets that it's pointing to an ISO image which was present on the bad drive. If that's the case just edit the CD-ROM and set it to unused.

Try to start the VM again. Should it fail copy th error messages so we can see what's wrong.

Guy
 
Thank you! I have made good progress to the point now we just have one VM that won't start and all else is working. Unfortunately, it is a critical one! I'll take your advice about migrating to more modern platforms once we get through this crisis (assuming we can!).
 
Check the VM which doesn't start.. under hardware config.. check for CD I'll lay bets that it's pointing to an ISO image which was present on the bad drive. If that's the case just edit the CD-ROM and set it to unused.

Try to start the VM again. Should it fail copy th error messages so we can see what's wrong.

Guy

Thank you! I have made good progress to the point now we just have one VM that won't start and all else is working. Your CD-ROM idea seemed promising, but it pointed to a "local" folder that had a matching iso in it (/var/lib/vz/template/iso).

It's a Win 2003 iso server (long story why we still need this). When I try to start it, I get no error messages. It only says "executing start task" very quickly and returns to "stopped". Is there a way to start it from the command line where I might get a more verbose error message?

Thanks so much for both of your help, it really helps!
 
Is there a way to start it from the command line where I might get a more verbose error message?

So I did this:

# qm start 104

And got this:

command '/sbin/lvchange -aly /dev/pve_2/vm-104-disk-2' failed with exit code 5
volume 'lvm_pve2_backup:vm-104-disk-2' does not exist

lvm_pve2_backup is the one that failed and is now empty. Therefore, this file obviously does not exist. We believe this drive was only used for backups, but that may not be the case. It kind of looks like vm-104-disk-2 is a logical volume. Do you think we could create one to satisfy this problem? Or does proxmox handle those type of files/volumes?

thanks again!
 
Well If you really don't have the drive anymore you can modify the VM settings and remove the drive... This will let the windows system boot.. HOWEVER if the drive contain useful information then that's obviously going to be missing. You don't say what the window server is running but if it's exchange or AD then this second drive might be data associated with that. Which is the recommended way to run it.
 
Hard drive crash physically or logically? What OS on VM that cannot start?

type in terminal on proxmox host:

vgdiaplay

lvdisplay -m

cat /etc/lvm/backup/*

cat /etc/pve/qemu-server/104.conf

ls -l /mnt/data2/backup

and place output here.
 
Hard drive crash physically or logically? What OS on VM that cannot start?
and place output here.

I can't give it all to you now, but I can give a summary and the 104.conf. It was a hard drive crash. Windows 2003 Server. Two volume groups: pve and pve_2. pve_2 is the relevant group that crashed. It contains a logical volume named data2 which is mapped to lvm_pve2_backup in storage.cfg. /mnt/data2/backup now has backups from the other VMs in our config, so it is recognized and is working.

Here's the 104.conf:
name: MVLEGACY01
ide2: local:iso/EN_WIN2003_ENTWITHSP1_WIN2003_STANDARDWITHSP1.ISO,media=cdrom
bootdisk: virtio0
ostype: w2k3
memory: 4096
sockets: 2
onboot: 1
cores: 2
vlan0: virtio=22:79:6A:FE:08:5B
virtio0: local:104/vm-104-disk-2.raw
boot: ca
freeze: 0
cpuunits: 1000
acpi: 1
kvm: 1
virtio2: lvm_pve2_backup:vm-104-disk-1 ,backup:no
virtio1: lvm_pve2_backup:vm-104-disk-2

The last two lines seem most relevant. neither vm-104-disc-x exist on the new drive, but apparently used to exist on the old one. I'm not sure if these are normal LVM or ones that are created through proxmox somehow. I'm hopeful once I create these, we'll be back up and running.

Any tips for creating these to volumes?

As always, thanks!
 
virtio2: lvm_pve2_backup:vm-104-disk-1 ,backup:no
virtio1: lvm_pve2_backup:vm-104-disk-2

The last two lines seem most relevant. neither vm-104-disc-x exist on the new drive, but apparently used to exist on the old one. I'm not sure if these are normal LVM or ones that are created through proxmox somehow. I'm hopeful once I create these, we'll be back up and running.

Any tips for creating these to volumes?
EDIT - My bad - those are the D: and E: drives in Windows - I missed virtio0. Maybe not necessary, but probably have a bunch of data that could be necessary for the proper operation of the VM. My other thought holds though - creating a new drive image will be like plugging blank hard drive into the computer.
 
Last edited:
Many thanks to everyone who provided ideas and suggestions! I've got it up and running again and your contributions played an important role. It's really nice to find a supportive community willing to help, even a newbie to proxmox.

Next up: upgrade to a more modern version of proxmox!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!