Hello. New to the forum. Never had a PVE problem before.
I'm hoping this thread can be educational to some. Test your backups!
Last week while i was away one of my Proxmox servers "died". When i got back i examined.
Cables between my HPE 420i raid controller and my drives had come disconnected while proxmox was running.
I am running 4 drives in a Fault Tolerance RAID 1/RAID 1+0. To be exact 3 of the 4 drives disconnected while operational.
I did a re-seat for the cables and all drives are back online. Rebuild was fast.
When i opened my iLO console, Proxmox was frozen. So i restarted. While boot-up was commencing i saw a few errors:
(Dont mind the firmware bugs.) NB! After these errors all the required pve services also failed to start.
So a volume group containing 3 volumes called "pve". Has an issue with a volume group called "data". That's not good.
So my SQLite db had a duplicate. Backed it up and nuked it. Restarted. Services back online.
But "local-lvm" and my VMs are not showing up in the GUI.
I got lvdisplay to show "pve-data" as available by running "lvconvert --repair /dev/pve/data".
But this does not get me very far.
So okay. Follow a few guides and realize i have to fix the pve-data superblock.
Except one thing:
It seems i cannot get superblock backups from this LV.
I only need one critical Windows VM out of this instance.
I did have backups on a NFS share but as i did not check or test them...you guys all know the story.
What are my options?
I'm hoping this thread can be educational to some. Test your backups!
Last week while i was away one of my Proxmox servers "died". When i got back i examined.
Cables between my HPE 420i raid controller and my drives had come disconnected while proxmox was running.
I am running 4 drives in a Fault Tolerance RAID 1/RAID 1+0. To be exact 3 of the 4 drives disconnected while operational.
I did a re-seat for the cables and all drives are back online. Rebuild was fast.
When i opened my iLO console, Proxmox was frozen. So i restarted. While boot-up was commencing i saw a few errors:
(Dont mind the firmware bugs.) NB! After these errors all the required pve services also failed to start.
So a volume group containing 3 volumes called "pve". Has an issue with a volume group called "data". That's not good.
So my SQLite db had a duplicate. Backed it up and nuked it. Restarted. Services back online.
But "local-lvm" and my VMs are not showing up in the GUI.
I got lvdisplay to show "pve-data" as available by running "lvconvert --repair /dev/pve/data".
But this does not get me very far.
Bash:
root@pve:/dev/mapper# thin_check /dev/mapper/pve-data
examining superblock
superblock is corrupt
bad checksum in superblock, wanted 3059633164
Except one thing:
Code:
root@pve:/dev/mapper# dumpe2fs -h /dev/mapper/pve-data
dumpe2fs 1.46.5 (30-Dec-2021)
dumpe2fs: Bad magic number in super-block while trying to open /dev/mapper/pve-data
Couldn't find valid filesystem superblock.
It seems i cannot get superblock backups from this LV.
I only need one critical Windows VM out of this instance.
I did have backups on a NFS share but as i did not check or test them...you guys all know the story.
What are my options?