[SOLVED] Oh no... wake-up with "SMART error (FailedOpenDevice) detected on host..." message!

After disk has been added I start a lsblk command and see that there are already part 1 & part 9 on new disk.
Oh, that's interesting. Did something (currently unknown) in your setup automatically create the partition structure on the new drive without you needing to do it manually using the sgdisk commands?

If that's the case, that kind of worries me as whatever did that might have its own plans for the data in your partition. It might be trying to live copy data from a different drive already, at the same time ZFS is resilvering things.

Or am I misunderstanding stuff? :eek:
 
  • Like
Reactions: GazdaJezda
Just to be suuuuuuuppppper safe, it's probably best to run the command which tells ZFS to read and verify the complete drive again, just in case:

Bash:
# zpool scrub VMSTORE

ZFS normally does it as part of the resilvering process. However (unless I misunderstood above), it sounds like something else on your server is also trying to copy data around. So, I'd run the above scrub command to start the verify process (then go to the Moto GP :)) so that ZFS will double check everything is ok on disk after all.

Just in case something is modifying stuff on disk in the background somehow.
 
Last edited:
  • Like
Reactions: GazdaJezda
Just to be suuuuuuuppppper safe, it's probably best to run the command which tells ZFS to read and verify the complete drive again, just in case:

Bash:
# zpool scrub VMSTORE

ZFS normally does it as part of the resilvering process. However (unless I misunderstood above), it sounds like something else on you server is also trying to copy data around. So, I'd run the above scrub command to start the verify process (then go to the Moto GP :)) so that ZFS will double check everything is ok on disk after all.

Just in case something is modifying stuff on disk in the background somehow.

Yes, I have started it now. It will run for approx. 4 and a half 3 hours. Will see tommorow :)
 
Last edited:
  • Like
Reactions: justinclift
Scrub finished without errors. It seems fine.

Thank you all again! Cheers!

Now I can freely go to Tuscany :)

P.S. - about p1,p9 partitions: since new device have been put to same 'slot', is that maybe reason why partitions were there? I dont know other reasons, definitely i wasn't copy structure from other drive in pool. There was no need to do that.
 

Attachments

  • Screenshot_20240601-064654.png
    Screenshot_20240601-064654.png
    180.2 KB · Views: 5
Last edited:
  • Like
Reactions: justinclift
UPDATE - Just to let you know: I return both failed discs (NVME EVO 970 and SSD EVO 860) to my HW provider.

First disc (970 - boot disc) is definitely bad, but still in 5 year warranty so will be changed for free.

Second one (860 - this thread's main actor) was examined and after some partition treatment (delete all, create new) and Samsung Magician (flashing latest FW), disc now work. Apparently problem was with bad partitions, something was happen to him that night when it became unrecognizable by system. I don't have a clue what happened. System and settings is same for last 3 years. All that time just worked without any problem, till that happened. Maybe there was some cosmic effect, who knows. Disc will now be used as cold spare for current USB backup drive.

I'm happy of course, but with mixed feelings...
 
Last edited:
  • Like
Reactions: justinclift
and Samsung Magician (flashing latest FW), disc now work.
Interesting. That really sounds more like there was a problem with the firmware rather than the partitions, and once the firmware was updated things started working again.

Samsung have had firmware problems a few times over the years (including last year), but I didn't think the EVO 860 were affected. Looks like they were though. :rolleyes:
 
  • Like
Reactions: GazdaJezda
Oh. If you want to check the firmware level for them all, then smartctl will do it (one by one):

Bash:
# smartctl -a /path/to/the/device

It's listed in one of the fields near the start of the output.
 
  • Like
Reactions: GazdaJezda
That RVT04B6Q is the new and improved firmware version yeah? :)

I do not know. I bought discs, put in server and install PVE. After configuration I pur server in rack and forget about it. Didn't bother with disc's firmware. But yes, Samsung page said so:
1717757179010.png
So then there was possible failed write operation or similar. Eh, forget it... :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!