HDD I/O errors after shutdown

donallw

New Member
Jun 18, 2022
3
0
1
Hello, huge noob here and looking for some help with my HDD. When trying to shutdown proxmox I got a kernel error:

kernel panic - not syncing attempted to kill init

When I tried to reboot into the same hdd, I got the following error:

kernel: blk_update_request: I/O error, dev sde

This kept repeating until eventually it booted from the installation media. I restarted again, this time directly into the installation media, and tried to reinstall proxmox on the drive. A similar error appeared:

unable to partition harddisk /dev/sdb

At this point I knew something was wrong with the HDD, so I plugged in an ssd with windows installed and opened up the disk manager. It detected both of the new hard drives, but when I tried to make them accessible through windows, I got another I/O error. This is where I currently am at.

Any thoughts as to why this might have happened / what could have caused this? Is the HDD dead? Any solutions?

Thanks in advance!

Some background:

I converted an old gaming rig to a home server and last night I installed proxmox on an old HDD I had laying around that had windows on it. I cloned the proxmox 7.2 VE installer onto a flash drive and booted into that, and followed the installation guide successfully. This morning I was able to navigate the proxmox web interface just fine.

I wanted to install another HDD I received in the mail this morning and so I shutdown the node I setup (without doing anything at all within it). Note that I clicked shutdown and then confirmed once, and after no results for about 20 seconds I did so again (I would not imagine this is the problem though). After a while it began shutting down, and eventually the web interface could not connect so I assumed it was off. Since the computer itself was still on, I plugged in a monitor to observe what was being displayed and found the first error shown above. I then did hard shutdown on the computer and rebooted into the HDD with proxmox installed. This is when I began getting the second error.
 
If you think your HDD is bad you should check its SMART stats (smartctl -a /dev/sde) and maybe run a long SMART selftest. Maybe you will see the selftest failing or a increased "uncorrectible sectors" counter. Using another cable sometimes helps too.
 
Is the PVE failing to boot? Can you login or access the web gui at all?

As @Dunuin said I would run a smart stat on the drive on some other bootable OS but ultimately I’ve had drives act really weird before with smart reading fine.

In my experience proxmox is super picky about drives and works well only when they have been wiped. It sounds like you installed proxmox over HDD 1 which had windows on it and didn’t even get to use HDD 2 (right?). So that could be related.

What I would do if I were you is wipe the drives clean and try again. While you check smart status I would also run ‘fsck [disk-name]’ to check the file system. If you’re lucky the disk was just acting a bit funky. In general you just want to check the health of the disk and if it’s healthy wipe it and try again.

Also what boot mode are you using? UEFI or BIOS? Sometimes you can get issues there especially if you installed as BIOS but boot as UEFI or vice versa.
 
In my experience proxmox is super picky about drives and works well only when they have been wiped
Hi,

This is not true. If a HDD is OK, then it will work just fine, even if before PMX install was other OS on it.

You could make an extensive check on your HDD using badblock(attention, you must run the a R/W badblock test, so ANY data on this disk will be deleted)

You can use clonezilla live cd(who has badblock and smartctl), and in your system, you must have ONLY your HDD that you need to test.

See here some examples!


Good luck / Bafta !
 
  • Like
Reactions: liamlows
Hi,

This is not true. If a HDD is OK, then it will work just fine, even if before PMX install was other OS on it.

You could make an extensive check on your HDD using badblock(attention, you must run the a R/W badblock test, so ANY data on this disk will be deleted)

You can use clonezilla live cd(who has badblock and smartctl), and in your system, you must have ONLY your HDD that you need to test.

See here some examples!


Good luck / Bafta !
Really good answer! It is true that you don't need the disk to be wiped but i had an issue once on two installs and after wiping the disk it finally worked so i figured I'd mention it.
 
Is the PVE failing to boot? Can you login or access the web gui at all?

As @Dunuin said I would run a smart stat on the drive on some other bootable OS but ultimately I’ve had drives act really weird before with smart reading fine.

In my experience proxmox is super picky about drives and works well only when they have been wiped. It sounds like you installed proxmox over HDD 1 which had windows on it and didn’t even get to use HDD 2 (right?). So that could be related.

What I would do if I were you is wipe the drives clean and try again. While you check smart status I would also run ‘fsck [disk-name]’ to check the file system. If you’re lucky the disk was just acting a bit funky. In general you just want to check the health of the disk and if it’s healthy wipe it and try again.

Also what boot mode are you using? UEFI or BIOS? Sometimes you can get issues there especially if you installed as BIOS but boot as UEFI or vice versa.
PVE is fully failing to boot, no access to web GUI.

As suggested by @Dunuin I have ran both WD Data Lifeguard (It is a WD hdd) on Windows and a smarttest on ubuntu. Both are showing a failing drive. I have included the smarttest test output below, in case I am incorrectly diagnosing this issue.

Code:
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%      6251         5480
# 2  Short offline       Completed: read failure       90%      6250         5480

I ran both a short and extended test on the drive, and found that there were read failures at LBA 5480. I understand that about 90% of the drive had been remaining before this error, and as such there are likely more errors in that 90%. As a result I will be discarding the drive. I suppose once it found this error it did not finish scanning the drive.

Does anyone have an idea why this might've happened? The drive was functioning just fine when I had Windows10 installed on it, but after installing Proxmox and running it once the drive failed. Is it possible that the drive was already faulty when Win10 was installe,d and was simply not showing the symptoms yet? I had not ran any drive health checks at that point.
 
Hi,

This is not true. If a HDD is OK, then it will work just fine, even if before PMX install was other OS on it.

You could make an extensive check on your HDD using badblock(attention, you must run the a R/W badblock test, so ANY data on this disk will be deleted)

You can use clonezilla live cd(who has badblock and smartctl), and in your system, you must have ONLY your HDD that you need to test.

See here some examples!


Good luck / Bafta !

Interesting, would this perhaps fix my drive? I suppose a more generalized question would be if a drive with faulty sectors can be used again.

When you say:
You can use clonezilla live cd(who has badblock and smartctl), and in your system, you must have ONLY your HDD that you need to test.
Does this mean the only HDD I can have connected is the faulty one? Moreover, I have Ubuntu installed on an external SSD, which is what I used to run the smart tests. Would this be alright in terms of additionaly drives connected? Or must I use some form of clonezilla live cd on a usb key?
 
If your drive is starting to fail I would backup data and replace it. Once first errors start to appear it can go fast until the complete disk is dead.

It could be that the disk already was damaged but Windows used other parts of he disk that were still healthy but you are now trying to access the damaged sectors. Or windows just doesn't informed you about the problems (no one actually reads windows log files if you never use a console there). Another thing might be that the HDD was already not in a great condition and it couldn't handle the heavier server workload.

Or its simple bad timing and not related at all.
 
Last edited:
  • Like
Reactions: Neobin

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!