Volume Group disappeared

jaytee129

Member
Jun 16, 2022
132
10
18
Yesterday after rebooting my proxmox host it stopped in "emergency mode" because it could not mount a volume it was using as a second non boot disk in a linux VM running Plexserver.

I got past the stalled boot up process by adding "nofail" to /etc/fstab mount line at the console:
LABEL=PlexStorage /mnt/pve/PlexMediaHDD ext4 defaults,nofail 0 2

While that allowed the host boot up to finish and my other VMs to run, the Plex VM won't start at all, and when I looked up the volume group that's been working fine for several weeks, it was gone. The disk it was using is still there and in good condition. The VG just dropped off the face of the earth.

Here's info on the (spinning) disk after running both short and long SMART tests:

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 149 148 021 Pre-fail Always - 3525
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 64
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 091 091 000 Old_age Always - 7049
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 25
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 6
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 125
194 Temperature_Celsius 0x0022 111 107 000 Old_age Always - 32
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 7039 -
# 2 Short offline Completed without error 00% 7038 -
lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 1 1.8T 0 disk

blkid
/dev/sda: PTUUID="cad175ee-13a4-411f-a8dc-5bd60dff40f0" PTTYPE="gpt"

What's missing is /dev/sda1, a 1.5TB volume group that uses the 2TB disk

The various commands to display VG, i.e. vgdisplay. vgscan, do not return anything.

vgcfgrestore --list PlexVG returns the following. Note that this VG was created for a "test" VM (250) then moved to the real one (220), in case that's an important detail. I will confirm, though, that the disk/VG has been running fine on VM 220 long after VM 250 was deleted.

File: /etc/lvm/archive/PlexVG_00000-1224222384.vg
VG name: PlexVG
Description: Created *before* executing '/sbin/vgcreate PlexVG /dev/sda1'
Backup Time: Sun Feb 12 14:24:03 2023


File: /etc/lvm/archive/PlexVG_00001-315613058.vg
VG name: PlexVG
Description: Created *before* executing '/sbin/lvcreate -aly -Wy --yes --size 1879048192k --name vm-250-disk-0 --addtag pve-vm-250 PlexVG'
Backup Time: Sun Feb 12 14:26:44 2023


File: /etc/lvm/archive/PlexVG_00002-187948522.vg
VG name: PlexVG
Description: Created *before* executing '/sbin/lvremove -f PlexVG/vm-250-disk-0'
Backup Time: Sun Feb 12 14:28:08 2023


File: /etc/lvm/archive/PlexVG_00003-716806561.vg
VG name: PlexVG
Description: Created *before* executing '/sbin/lvcreate -aly -Wy --yes --size 1610612736k --name vm-250-disk-0 --addtag pve-vm-250 PlexVG'
Backup Time: Sun Feb 12 14:29:23 2023


File: /etc/lvm/archive/PlexVG_00004-168062075.vg
VG name: PlexVG
Description: Created *before* executing '/sbin/lvrename PlexVG vm-250-disk-0 vm-220-disk-0'
Backup Time: Mon Feb 13 17:37:50 2023


File: /etc/lvm/backup/PlexVG
VG name: PlexVG
Description: Created *after* executing '/sbin/lvrename PlexVG vm-250-disk-0 vm-220-disk-0'
Backup Time: Mon Feb 13 17:37:50 2023

vgcfgrestore --test PlexVG returns:

TEST MODE: Metadata will NOT be updated and volumes will not be (de)activated.
WARNING: Couldn't find device with uuid s5jAky-3Nfc-i1kc-rYXw-rMbA-05qi-epEZTx.
Cannot restore Volume Group PlexVG with 1 PVs marked as missing.

The UUID here does not match the UUID above. Is that because /dev/sda1 gets a different UUID than /dev/sda, or is that because this command only remembers the UUID assigned when VM250 had this disk/VG?

Attached are screenshots from the GUI:

sda.PNG

plexVG.PNG


1. What happened to my VG?

2, Is the data recoverable somehow? It's just a week's worth of DVR stuff and rebuildable metadata. It would be nice to get it back but not critical.

3. Most importantly, how can I avoid/prevent or reduce chance of this happening again?

Any info would be appreciated.
 
Ok, I have to get my VM going so I decided just to just give up on old data and create new volume group over top.

While I can create a new VG on the disk (/dev/sda), I can't assign space to it in the VM. And, while I can detach the lost volume/disk from my VM, I can't remove it. A ghost of the old one remains.

How do I get rid of the remnants of the lost VG?

cantremovedisk.png
VGnotenoughspace.png
 
Hello? Is this thing on?

1. What happened to my VG?

2, Is the data recoverable somehow?

3. Most importantly, how can I avoid/prevent or reduce chance of this happening again?

4. How do I configure the VM so it is NOT dependent on this VG to run?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!