Unable to resume pve-data

mcit

Renowned Member
May 16, 2010
35
1
73
I have a problem that has just occurred after a reboot of my main proxmox server.

During boot I get an error that the server "Couldn't find device with UUID xxxxx...." [This UUID is for the /dev/pve/data LV]
I can Control-D to continue and the system boots, however, due to the error above, I cannot start or in any way use any VMs.

My configuration is this
LSI 2108 PCIe Card
320Gb RAID1 [/root, /swap, /boot]
6TB RAID5 [/data]

I have booted into grml in an attempt to recover the data of the RAID5 and rebuild the server, however, I am seeing the following:

root@grml# vgchange -a y /dev/pve
Couldn't find device with UUID VSKDXQ.....
Refusing activation of partial LV data. Use --partial to override.
2 logical volume(s) in volume group "pve" now active

root@grml# vgchange -a y --partial /dev/pve
Partial mode. Incomplete logical volumes will be processed.
Couldn't find device with uuid VSKDXQ....
device-mapper: resume ioctl failed: Invalid argument
Unable to resume pve-data (254:3)
3 logical volume(s) in volume group "pve" now active

Even though it says 3 volumes active, the data volume does not show under /dev/pve and if I attempt to mount it:

root@grml# mount /dev/pve/data /mnt/pve
mount: special device /dev/pve/data does not exist


I am unsure what I can do next. I have checked the RAID there are no failures and the verify is sucessful. Is there anything I can do to attempt to restore the server to working, or if it is a lost cause, are there any other avenues I can explore to try and recover the data from this LV? I have copies of all except 1 VM and I would really like to recover it if I can.

Matthew
 
I have a problem that has just occurred after a reboot of my main proxmox server.

During boot I get an error that the server "Couldn't find device with UUID xxxxx...." [This UUID is for the /dev/pve/data LV]
I can Control-D to continue and the system boots, however, due to the error above, I cannot start or in any way use any VMs.

My configuration is this
LSI 2108 PCIe Card
320Gb RAID1 [/root, /swap, /boot]
6TB RAID5 [/data]

I have booted into grml in an attempt to recover the data of the RAID5 and rebuild the server, however, I am seeing the following:

root@grml# vgchange -a y /dev/pve
Couldn't find device with UUID VSKDXQ.....
Refusing activation of partial LV data. Use --partial to override.
2 logical volume(s) in volume group "pve" now active

root@grml# vgchange -a y --partial /dev/pve
Partial mode. Incomplete logical volumes will be processed.
Couldn't find device with uuid VSKDXQ....
device-mapper: resume ioctl failed: Invalid argument
Unable to resume pve-data (254:3)
3 logical volume(s) in volume group "pve" now active

Even though it says 3 volumes active, the data volume does not show under /dev/pve and if I attempt to mount it:

root@grml# mount /dev/pve/data /mnt/pve
mount: special device /dev/pve/data does not exist


I am unsure what I can do next. I have checked the RAID there are no failures and the verify is sucessful. Is there anything I can do to attempt to restore the server to working, or if it is a lost cause, are there any other avenues I can explore to try and recover the data from this LV? I have copies of all except 1 VM and I would really like to recover it if I can.

Matthew
Hi Matthew,
looks like your 6TB-Volume isn't ready... if you go in the lsi-bios - does the raid-volume and disks are showed as ok?
If you in the bios (server) you see the two raidvolumes?

On grml what is the output of "dmesg | grep sd" and "fdisk -l"?

Udo
 
Thanks for the reply. I thought the RAID could be the culprit at first, however, when I look at the BIOS setup, the array is listed as Status OK [I made a mistake on the card, the LSI is in a different server, this is a Highpoint 4320 and the mirror is 60Gb not 320gb]
I see both raid volumes as OK.

root@grml ~ # dmsg | grep sd
[ 1.664061] sd 0:0:0:0: [sda] 11718749952 512-byte logical blocks: (5.99 TB/5 .45 TiB)
[ 1.664097] sd 0:0:1:0: [sdb] 8789062272 512-byte logical blocks: (4.49 TB/4. 09 TiB)
[ 1.664185] sd 0:0:0:0: [sda] Write Protect is off
[ 1.664187] sd 0:0:0:0: [sda] Mode Sense: 2f 00 00 00
[ 1.664202] sd 0:0:1:0: [sdb] Write Protect is off
[ 1.664204] sd 0:0:1:0: [sdb] Mode Sense: 2f 00 00 00
[ 1.664259] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, does n't support DPO or FUA
[ 1.664276] sd 0:0:1:0: [sdb] Write cache: enabled, read cache: enabled, does n't support DPO or FUA
[ 1.664684] sda: unknown partition table
[ 1.665024] sd 0:0:0:0: [sda] Attached SCSI disk
[ 1.665141] sdb: unknown partition table
[ 1.665449] sd 0:0:1:0: [sdb] Attached SCSI disk
[ 2.707167] sd 1:0:0:0: [sdc] 117231408 512-byte logical blocks: (60.0 GB/55. 8 GiB)
[ 2.707759] sd 1:0:0:0: [sdc] Write Protect is off
[ 2.707762] sd 1:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[ 2.707787] sd 1:0:0:0: [sdc] Write cache: enabled, read cache: enabled, does n't support DPO or FUA
[ 2.708110] sdc: sdc1 sdc2
[ 2.708281] sd 1:0:0:0: [sdc] Attached SCSI disk
[ 4.101995] EXT3-fs (sdc1): mounted filesystem with ordered data mode
[ 4.995476] sd 5:0:0:0: [sdd] 7954432 512-byte logical blocks: (4.07 GB/3.79 GiB)
[ 4.996600] sd 5:0:0:0: [sdd] Write Protect is off
[ 4.996604] sd 5:0:0:0: [sdd] Mode Sense: 03 00 00 00
[ 4.997721] sd 5:0:0:0: [sdd] No Caching mode page present
[ 4.997723] sd 5:0:0:0: [sdd] Assuming drive cache: write through
[ 5.001460] sd 5:0:0:0: [sdd] No Caching mode page present
[ 5.001464] sd 5:0:0:0: [sdd] Assuming drive cache: write through
[ 5.002343] sdd: sdd1
[ 5.005823] sd 5:0:0:0: [sdd] No Caching mode page present
[ 5.005827] sd 5:0:0:0: [sdd] Assuming drive cache: write through
[ 5.005830] sd 5:0:0:0: [sdd] Attached SCSI removable disk
[ 5.568230] FAT-fs (sdd1): utf8 is not a recommended IO charset for FAT files ystems, filesystem will be case sensitive!
[ 58.439728] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).

-------

root@grml ~ # fdisk -l

Disk /dev/sda: 6000.0 GB, 5999999975424 bytes
255 heads, 63 sectors/track, 729458 cylinders, total 11718749952 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/sda doesn't contain a valid partition table

WARNING: GPT (GUID Partition Table) detected on '/dev/sdb'! The util fdisk doesn't support GPT. Use GNU Parted.


Disk /dev/sdb: 4500.0 GB, 4499999883264 bytes
255 heads, 63 sectors/track, 547093 cylinders, total 8789062272 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/sdb doesn't contain a valid partition table

Disk /dev/sdc: 60.0 GB, 60022480896 bytes
255 heads, 63 sectors/track, 7297 cylinders, total 117231408 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0004c82c

Device Boot Start End Blocks Id System
/dev/sdc1 * 2048 1048575 523264 83 Linux
/dev/sdc2 1048576 117229567 58090496 8e Linux LVM

Disk /dev/sdd: 4072 MB, 4072669184 bytes
255 heads, 63 sectors/track, 495 cylinders, total 7954432 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x19e80f24

Device Boot Start End Blocks Id System
/dev/sdd1 * 63 7954431 3977184+ c W95 FAT32 (LBA)

Disk /dev/mapper/pve-swap: 7381 MB, 7381975040 bytes
255 heads, 63 sectors/track, 897 cylinders, total 14417920 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/mapper/pve-swap doesn't contain a valid partition table

Disk /dev/mapper/pve-root: 14.8 GB, 14763950080 bytes
255 heads, 63 sectors/track, 1794 cylinders, total 28835840 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

Disk /dev/mapper/pve-root doesn't contain a valid partition table

The array appears visable, I don't know what to make of it.

Matthew
 
Hi Matthew,
looks like your 6TB-Volume isn't ready...

Udo

I swapped the Highpoint Card for another the same [had a spare here] and the problem is unchanged. Array still listed as OK, still shows the drive in fdisk, but fails to initialise during boot. Could this be some sort of corruption in the LVM itself? Is there a way to check it other than fsck as I cannot run that due to it failing to initialise.

Matthew
 
I swapped the Highpoint Card for another the same [had a spare here] and the problem is unchanged. Array still listed as OK, still shows the drive in fdisk, but fails to initialise during boot. Could this be some sort of corruption in the LVM itself? Is there a way to check it other than fsck as I cannot run that due to it failing to initialise.

Matthew
Hi Matthew,
you use the whole 6TB-raidvolume as volume for lvm?! Or do you create an partitiontable, which is gone?

If you use
Code:
pvscan
pvdisplay
what is displayed? On a working system you should see the missing uuid.

Udo
 
Hi Matthew,
you use the whole 6TB-raidvolume as volume for lvm?! Or do you create an partitiontable, which is gone?

If you use
Code:
pvscan
pvdisplay
what is displayed? On a working system you should see the missing uuid.

Udo

Yes, the whole 6TB is used as the /data LV.
I have made some progress I think. I have used the backup file in /etc/lvm/backup as follows
pvcreate --restorefile /etc/lvm/backup/pve --uuid <UUID of the PV as referenced in this file> /dev/sda [the physical volume]

At this point, pvs -v now shows the volume as available. Next I had to run fsck on it as it still cannot be mounted due to superblock errors. This process is running at the moment. Hopefully, once complete, I will be able to mount it. Will see what happens. Can you see any problems with this?

Matthew
 
Just an update to finalize and close this thread. I sucessfully recovered all data on this LV. However, as I write this, I am in the process of doing it again. I am taking this as a sign that the Highpoint 4320 controller has some kind of issue, causing this problem to re-occur. I will replace the card with an LSI 2108 in the coming days. For the sake of the repair [and possibly my future reference], steps are below

Boot server with grml [grml.org]
vgchange -a y /dev/pve [2 of 3 volumes come online]
mount /dev/pve/root /mnt/root [so you can access the backup LV info]
[in my case only the data lv was offline, so I could access the backup config at /etc/lvm/backup/pve - From this file you can recover the missing uuid for the data LV, alternatively, the missing uuid is displayed if you run 'pvs'

Once you have the missing uuid, run fdisk -l and find which physical drive your data should be on [not sure what you would do if your /data LV is spread across physical drives], once you have this, the command is as follows:

pvcreate --restorefile /mnt/root/etc/lvm/backup/pve --uuid <UUID as above> /dev/<device from fdisk as above>

Now vgchange -a y /dev/pve [all three volumes come online]

Even after this I could not mount the data LV as it contained an error in the file system, so to fix this:

mke2fs -n /dev/pve/data [this will list all superblock backups]

fsck -y -b <pick a superblock backup> /dev/pve/data

Once complete, all should be good and the data LV should mount successfully. That is at least my experience. Feel free to add to this if I have overlooked or could have done something different.

Matthew
 
Just an update to finalize and close this thread. I sucessfully recovered all data on this LV. However, as I write this, I am in the process of doing it again. I am taking this as a sign that the Highpoint 4320 controller has some kind of issue, causing this problem to re-occur. I will replace the card with an LSI 2108 in the coming days. For the sake of the repair [and possibly my future reference], steps are below

Boot server with grml [grml.org]
vgchange -a y /dev/pve [2 of 3 volumes come online]
mount /dev/pve/root /mnt/root [so you can access the backup LV info]
[in my case only the data lv was offline, so I could access the backup config at /etc/lvm/backup/pve - From this file you can recover the missing uuid for the data LV, alternatively, the missing uuid is displayed if you run 'pvs'

Once you have the missing uuid, run fdisk -l and find which physical drive your data should be on [not sure what you would do if your /data LV is spread across physical drives], once you have this, the command is as follows:

pvcreate --restorefile /mnt/root/etc/lvm/backup/pve --uuid <UUID as above> /dev/<device from fdisk as above>

Now vgchange -a y /dev/pve [all three volumes come online]

Even after this I could not mount the data LV as it contained an error in the file system, so to fix this:

mke2fs -n /dev/pve/data [this will list all superblock backups]

fsck -y -b <pick a superblock backup> /dev/pve/data

Once complete, all should be good and the data LV should mount successfully. That is at least my experience. Feel free to add to this if I have overlooked or could have done something different.

Matthew
 
I was successful in the recovery of this problem, and just when I thought all was good in the world, this server began to suffer high IO delays, so after shutting down all the running VMs, I thought, "..a server reboot should do the trick". Well, I have found myself in the same situation again. I am taking this as a sign that the Highpoint 4320 card has some sort of issue and needs replacement. I will replace the card with an LSI 2108 in the next few days. In the meantime for the sake of finalising this thread [and possibly my own future reference] below are the steps to solve this issue.

Boot the server using grml [grml.org]
vgchange -a y /dev/pve [in my case, only the data LV is offline so 2 of 3 volumes come online here]
mount /dev/pve/root /mnt/root [mount the root LV in order to use the LVM config backups]
pvs [this will show the missing LV UUID]
fdisk -l [look for the physical volume that holds the data LV [don't know what you would do if you have your LV across multiple PVs - anyone?]

With this information run:
pvcreate --restorefile /mnt/root/etc/lvm/backup/pve --uuid <UUID from above> /dev/<physical volume from fdisk above]

vgchange -a y /dev/pve [3 LVs should now come online]

If the file system is intact, you should now be able to mount the data LV

mount /dev/pve/data /mnt/data

In my case, the filesystem was not intact, meaning I need to check it

mk32fs -n /dev/pve/data [use this to find where the superblock backups are stored]

fsck -C -y -b <superblock backup> /dev/pve/data

Once complete, you should be able to sucessfully mount the data LV and either boot, or least recover your data from it.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!