VMs Hard Drive Failing

Sherman Ravelo

New Member
Jan 3, 2019
5
0
1
32
Hello, I am writing to you to know if I can get some support about some events that have been occurring regarding Proxmox and different Linux VMs.

First of all, our company has 4 Proxmox servers running on IBM 3550 M3 connected to a NAS LenovoEMC PX12-450R with multiples disks arrays, PVEVersions are different one another. I'm going to describe the two of them which have been failing:

Proxmox1: PVEVersion pve-manager/5.0-30/5ab26bc (running kernel: 4.10.17-2-pve)
Proxmox2: PVEVersion pve-manager/5.1-41/0b958203 (running kernel: 4.13.13-2-pve)

There have been 2 Linux VMs with some random crashes regarding I/O hard drive superblock errors in its partition, one in either Proxmox.

I don't know exactly where to dig in this randomly crash but one of them when I restart the VM I completely lost the VM's Hard Drive.

There was no way to recover it. Any suggestions? Where do I exactly have to look the at?

If you need more information about this case let me know.
 
First, i recommend to upgrade all your Nodes to the newest PVE Version.

If you have an Error with your Disk and you use an external Storage (NAS LenovoEMC PX12-450R), have you Check this System if there are any faults?
 
Hello, thanks for the the answer. I change the array where the first VM got stuck meaning replace the disks for new one. Then yesterday a VM got stuck this one is mounted in another array... I double check my NAS an is working perfectly.

The last VM was changed to Local Storage in Proxmox2
 
Hi,

One of the best tool for a disk related problems is to use clonezilla (advance mode, using dd and you have a checkbox for rescue). In this mode clonezilla will try to read each disk block, and if it fail it will replace the data with 0 and go forward.
In some situation with some luck you can restore the cloned disk image. And maybe you can then restore some broken files from a previous backup.
If your data is too valuable for you you can also try dd-rescue who had a option for how many times can try to read a bad block . In such a situations I was able to recover all the bad data blocks using 100 as number of reads ....

Good luck! You will need a lot ;)
 
  • Like
Reactions: Sherman Ravelo
Hi,

One of the best tool for a disk related problems is to use clonezilla (advance mode, using dd and you have a checkbox for rescue). In this mode clonezilla will try to read each disk block, and if it fail it will replace the data with 0 and go forward.
In some situation with some luck you can restore the cloned disk image. And maybe you can then restore some broken files from a previous backup.
If your data is too valuable for you you can also try dd-rescue who had a option for how many times can try to read a bad block . In such a situations I was able to recover all the bad data blocks using 100 as number of reads ....

Good luck! You will need a lot ;)




Thanks for your advice will help me a lot.
 
- How many VMs you run?
- Is any other VM broken?
- The broken VMs run all on the same Node or different?
- Did you check for Updates for your NAS?
- How you use the NAS (NFS, iSCSI, SMB)?
- What's your HW Specs of the Nodes?
- What about Metrics from your Nodes, from your Network, the NAS?
- Did you check all Log files (in the VM, on the Nodes and your NAS)?
 
Hello,

- How many VMs you run?
Proxmox1: 12
Proxmox2: 7

- Is any other VM broken?
Proxmox1: The one that couldn't be recoverable.
Proxmox2: Took a backup of one when it register the I/O (same error ) then reboot it to start fine.

- The broken VMs run all on the same Node or different?
Different proxmox (hysical machines) one node each.

- Did you check for Updates for your NAS?
NAS up to date.

- How you use the NAS (NFS, iSCSI, SMB)?
NSF in use.

- What's your HW Specs of the Nodes?
Proxmox1: 16CPUs 2Sockets 64GB Ram
Proxmox2: 16CPUs 2Sockets 64GB Ram

- What about Metrics from your Nodes, from your Network, the NAS?
Proxmox1: Directly connected to NAS
Proxmos2: Switch > NAS

- Did you check all Log files (in the VM, on the Nodes and your NAS)?
Checked every possible log in both Proxmox - NAS and not a single error or bad behavior was found.

It was just like if someone deleted the VM's HD via Terminal in Proxmox.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!