Periodic randomly stuck service

next40

Member
May 31, 2021
11
0
6
38
Hi all, i have some servers on latest proxmox 6.4.
I look periodic randomly stuck service(panic) and fixing only with reset via IPMI.
Hardware :
server - 1014s-wtrt
ram: 256gb
OS installed on small device SSD-DM128-SMCMVN1 ( fs - ext4 )

On IPMI console i see fastly repeated error:
sda - access beyond end of device

I try to replace boot device, but this not fixing problem. first time i think about hardware problem, but problem locating on random server, possible( more loaded )
Any ideas for tshoot \ fix this problem?
 

Attachments

  • 2022-01-05_03-21.png
    2022-01-05_03-21.png
    337 KB · Views: 16
  • 2022-01-05_03-23.png
    2022-01-05_03-23.png
    396.4 KB · Views: 16
well it looks like a hardware problem... AFAIU you did replace that disk? what about the cable/controller ?
 
well it looks like a hardware problem... AFAIU you did replace that disk? what about the cable/controller ?
Yes, i replaced problem disk on first server with this problem , but after random time i have same problem on server with new disk.
No cable for SATADOM disk, it connects directly to the motherboard.
for example:
https://www.supermicro.com/products/nfo/SATADOM.cfm

server platform spec:
https://www.supermicro.com/en/Aplus/system/1U/1014/AS-1014S-WTRT.cfm
 
Last edited:
Problem not fixed . Randomly present 1-2 times per month. Tested a lot of kernels versions, but without success. Any Ideas?
 
Problem not fixed . Randomly present 1-2 times per month. Tested a lot of kernels versions, but without success. Any Ideas?
Replace your Supermicro SATA-DOM again.... we had several which report SMART-Error very soon after initially used... and some which are now years old.... SATA-DOM is a cool small SSD-Sata-Disk, but there are many which just fail very soon. So maybe you got 2 or 3 bad ones in a row..... Happens sometimes....

Also if you have 2 of them, and 2 DOM-Connectors on your Mainboard..... you can go with ZFS-Mirror and see if ZFS detects error, without crashing you whole system all the time...
 
Replace your Supermicro SATA-DOM again.... we had several which report SMART-Error very soon after initially used... and some which are now years old.... SATA-DOM is a cool small SSD-Sata-Disk, but there are many which just fail very soon. So maybe you got 2 or 3 bad ones in a row..... Happens sometimes....

Also if you have 2 of them, and 2 DOM-Connectors on your Mainboard..... you can go with ZFS-Mirror and see if ZFS detects error, without crashing you whole system all the time...
ohh, ok. I will try to replace again and send feedback after
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!