Yesterday after rebooting my proxmox host it stopped in "emergency mode" because it could not mount a volume it was using as a second non boot disk in a linux VM running Plexserver.
I got past the stalled boot up process by adding "nofail" to /etc/fstab mount line at the console:
LABEL=PlexStorage /mnt/pve/PlexMediaHDD ext4 defaults,nofail 0 2
While that allowed the host boot up to finish and my other VMs to run, the Plex VM won't start at all, and when I looked up the volume group that's been working fine for several weeks, it was gone. The disk it was using is still there and in good condition. The VG just dropped off the face of the earth.
Here's info on the (spinning) disk after running both short and long SMART tests:
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 149 148 021 Pre-fail Always - 3525
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 64
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 091 091 000 Old_age Always - 7049
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 25
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 6
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 125
194 Temperature_Celsius 0x0022 111 107 000 Old_age Always - 32
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 7039 -
# 2 Short offline Completed without error 00% 7038 -
lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 1 1.8T 0 disk
blkid
/dev/sda: PTUUID="cad175ee-13a4-411f-a8dc-5bd60dff40f0" PTTYPE="gpt"
What's missing is /dev/sda1, a 1.5TB volume group that uses the 2TB disk
The various commands to display VG, i.e.
vgdisplay
. vgscan
, do not return anything. vgcfgrestore --list PlexVG
returns the following. Note that this VG was created for a "test" VM (250) then moved to the real one (220), in case that's an important detail. I will confirm, though, that the disk/VG has been running fine on VM 220 long after VM 250 was deleted.File: /etc/lvm/archive/PlexVG_00000-1224222384.vg
VG name: PlexVG
Description: Created *before* executing '/sbin/vgcreate PlexVG /dev/sda1'
Backup Time: Sun Feb 12 14:24:03 2023
File: /etc/lvm/archive/PlexVG_00001-315613058.vg
VG name: PlexVG
Description: Created *before* executing '/sbin/lvcreate -aly -Wy --yes --size 1879048192k --name vm-250-disk-0 --addtag pve-vm-250 PlexVG'
Backup Time: Sun Feb 12 14:26:44 2023
File: /etc/lvm/archive/PlexVG_00002-187948522.vg
VG name: PlexVG
Description: Created *before* executing '/sbin/lvremove -f PlexVG/vm-250-disk-0'
Backup Time: Sun Feb 12 14:28:08 2023
File: /etc/lvm/archive/PlexVG_00003-716806561.vg
VG name: PlexVG
Description: Created *before* executing '/sbin/lvcreate -aly -Wy --yes --size 1610612736k --name vm-250-disk-0 --addtag pve-vm-250 PlexVG'
Backup Time: Sun Feb 12 14:29:23 2023
File: /etc/lvm/archive/PlexVG_00004-168062075.vg
VG name: PlexVG
Description: Created *before* executing '/sbin/lvrename PlexVG vm-250-disk-0 vm-220-disk-0'
Backup Time: Mon Feb 13 17:37:50 2023
File: /etc/lvm/backup/PlexVG
VG name: PlexVG
Description: Created *after* executing '/sbin/lvrename PlexVG vm-250-disk-0 vm-220-disk-0'
Backup Time: Mon Feb 13 17:37:50 2023
vgcfgrestore --test PlexVG
returns:TEST MODE: Metadata will NOT be updated and volumes will not be (de)activated.
WARNING: Couldn't find device with uuid s5jAky-3Nfc-i1kc-rYXw-rMbA-05qi-epEZTx.
Cannot restore Volume Group PlexVG with 1 PVs marked as missing.
The UUID here does not match the UUID above. Is that because /dev/sda1 gets a different UUID than /dev/sda, or is that because this command only remembers the UUID assigned when VM250 had this disk/VG?
Attached are screenshots from the GUI:
1. What happened to my VG?
2, Is the data recoverable somehow? It's just a week's worth of DVR stuff and rebuildable metadata. It would be nice to get it back but not critical.
3. Most importantly, how can I avoid/prevent or reduce chance of this happening again?
Any info would be appreciated.