[SOLVED] Bad Magic Number / Superblock Invalid

ernieg92

New Member
Mar 15, 2024
5
1
3
Recently migrated from ESXi to PVE with very little issues. I installed a new HDD to backup primary VMs (critical data is rsync'd to cloud storage) in the event I needed to restore for some reason. A few days after installing the HDD, I started to get two daily notifications from S.M.A.R.T. daemon.
Code:
Device: /dev/sdb [SAT], 2 Currently unreadable (pending) sectors

Device: /dev/sdb [SAT], 2 Offline uncorrectable sectors

Of course, now I get them everyday as they have not been fixed. I tried using fsck and e2fschk to see about repairing the sectors but get additional errors.
Code:
root@pve:~# fsck -t ext4 /dev/sdb
fsck from util-linux 2.38.1
e2fsck 1.47.0 (5-Feb-2023)
ext2fs_open2: Bad magic number in super-block
fsck.ext4: Superblock invalid, trying backup blocks...
fsck.ext4: Bad magic number in super-block while trying to open /dev/sdb

The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
 or
    e2fsck -b 32768 <device>

Found a gpt partition table in /dev/sdb
and
Code:
root@pve:~# e2fsck /dev/sdb
e2fsck 1.47.0 (5-Feb-2023)
ext2fs_open2: Bad magic number in super-block
e2fsck: Superblock invalid, trying backup blocks...
e2fsck: Bad magic number in super-block while trying to open /dev/sdb

The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
 or
    e2fsck -b 32768 <device>

Found a gpt partition table in /dev/sdb

Also, fdisk found that the partition doesn't start on the physical sector (which may be a limitation of fdisk due to GPT partition table).
Code:
root@pve:~# fdisk -l | grep /sd
Partition 1 does not start on physical sector boundary.
Disk /dev/sda: 3.64 TiB, 4000787030016 bytes, 7814037168 sectors
/dev/sda1       34       2047       2014 1007K BIOS boot
/dev/sda2     2048    2099199    2097152    1G EFI System
/dev/sda3  2099200 7814037134 7811937935  3.6T Linux LVM
Disk /dev/sdb: 931.51 GiB, 1000200658432 bytes, 1953516911 sectors
/dev/sdb1   2048 1953515519 1953513472 931.5G Linux filesystem
Partition 2 does not start on physical sector boundary.
As stated, critical data from the VMs is backed up elsewhere, but I like the idea of having VM backups local if needed. My questions are:
  1. Are these critical errors (I believe unreadable pending sectors will be moved if a write there is attempted) to be addressed immediately?
  2. Is there another utility to fix/mark the bad sectors?
  3. Should I wipe the partition table and reformat to get rid of the errors (and obviously run fresh backups)?
I have tried using badblocks and other commands I found trying to Google a solution, but I none of them seem to work (still getting the daily message). I'll eventually replace the HDD with an SSD but it's a brand new drive and I'd like to utilize it for awhile before doing so.

Thanks in advance.
-Ernie
 
The fsck command works on disk partitions, not (usually) whole disks. You would want to fsck /dev/sdb1 or something like that. Whatever the partition number is.

Using "badblocks" won't fix the SMART errors because it operates at the OS level while SMART is reading status from the drive hardware. And the hardware thinks the disk is faulty. You should replace the disk ASAP.
 
  • Like
Reactions: Kingneutron
First of all, you're trying to run operations on sdb, whereas your ext4 partition is on sdb1.

Do not try to continue using an obviously bad drive. You risk not only data loss, but data corruption.

Quit playing around with it, RMA the drive under warranty and replace it. Don't try fooling about trying to mark the bad sectors, etc. and hoping for the best, it's not going to limp along like a car with a flat tire. It's more like you're trying to run the engine with sand in it instead of oil.


Before putting any drive into service, it's a good idea to run a burn-in test on it to weed out shipping damage.

https://github.com/kneutron/ansitest/blob/master/SMART/scandisk-bigdrive-2tb+.sh

Note that this script writes zeros to the entire drive, followed by a SMART long test, so don't use it on any drive that has data (that you care about / isn't backed up)
 
Update: I misspoke. The drive is not a new one. I did a bunch of drive re-configurations when migrating, including buying a new drive for PVE VMs (no issues, thankfully), and forgot this backup drive was an older drive used for some ESXi VMs. Guess I'll be buying that new SSD sooner than later.

Thank you both for the details and recommendations. (As an aside, I went back to the sites I previously visited and none of them used the partition number for fsck, just the drive. Weird but now I know for future. Again, thanks.)
 
  • Like
Reactions: Kingneutron

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!