node have been delay after upgrade (apt-get dist-upgrade)

Xing

Member
Jan 23, 2019
13
0
6
28
Hello

i get problem after i upgrade promox (apt-get dist-upgrade), my server node get hang after message " No /etc/kernel/pve-efiboot-uuids found, skipping ESP sync.


i can't console VM and server node has been delay.


i had attach log below.

Please help me

Thank you,
 

Attachments

  • pveversion.txt
    1.3 KB · Views: 0
  • syslog1.txt
    51.6 KB · Views: 1
  • photo_2020-05-12_21-58-35.jpg
    photo_2020-05-12_21-58-35.jpg
    98.5 KB · Views: 2
hi,

from the syslog it looks to me like a failing drive..

Code:
May 12 19:41:11 pxGAME47 kernel: [2260311.627269] sd 0:0:0:0: [sda] tag#131 Sense Key : Medium Error [current] 
May 12 19:41:11 pxGAME47 kernel: [2260311.627271] sd 0:0:0:0: [sda] tag#131 Add. Sense: Unrecovered read error - auto reallocate failed
May 12 19:41:11 pxGAME47 kernel: [2260311.627274] sd 0:0:0:0: [sda] tag#131 CDB: Read(10) 28 00 00 00 08 08 00 00 08 00
May 12 19:41:11 pxGAME47 kernel: [2260311.627276] blk_update_request: I/O error, dev sda, sector 2061 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
May 12 19:41:11 pxGAME47 kernel: [2260311.628199] Buffer I/O error on dev sda1, logical block 1, async page read
May 12 19:41:11 pxGAME47 kernel: [2260311.629068] ata7: EH complete
May 12 19:41:11 pxGAME47 kernel: [2260311.629079] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 1 tries: 1
May 12 19:41:14 pxGAME47 kernel: [2260314.430134] sas: Enter sas_scsi_recover_host busy: 1 failed: 1
May 12 19:41:14 pxGAME47 kernel: [2260314.430139] sas: ata7: end_device-0:0: cmd error handler
May 12 19:41:14 pxGAME47 kernel: [2260314.430158] sas: ata7: end_device-0:0: dev error handler
May 12 19:41:14 pxGAME47 kernel: [2260314.430163] sas: ata8: end_device-0:1: dev error handler
May 12 19:41:14 pxGAME47 kernel: [2260314.430165] ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
May 12 19:41:14 pxGAME47 kernel: [2260314.430169] sas: ata9: end_device-0:2: dev error handler
May 12 19:41:14 pxGAME47 kernel: [2260314.430169] ata7.00: failed command: READ DMA
May 12 19:41:14 pxGAME47 kernel: [2260314.430174] ata7.00: cmd c8/00:08:08:08:00/00:00:00:00:00/e0 tag 29 dma 4096 in
May 12 19:41:14 pxGAME47 kernel: [2260314.430174]          res 51/40:00:0d:08:00/00:00:00:00:00/00 Emask 0x9 (media error)
May 12 19:41:14 pxGAME47 kernel: [2260314.430175] ata7.00: status: { DRDY ERR }
May 12 19:41:14 pxGAME47 kernel: [2260314.430176] ata7.00: error: { UNC }
May 12 19:41:14 pxGAME47 kernel: [2260314.485489] ata7.00: configured for UDMA/133
May 12 19:41:14 pxGAME47 kernel: [2260314.485511] sd 0:0:0:0: [sda] tag#143 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
May 12 19:41:14 pxGAME47 kernel: [2260314.485514] sd 0:0:0:0: [sda] tag#143 Sense Key : Medium Error [current] 
May 12 19:41:14 pxGAME47 kernel: [2260314.485517] sd 0:0:0:0: [sda] tag#143 Add. Sense: Unrecovered read error - auto reallocate failed
May 12 19:41:14 pxGAME47 kernel: [2260314.485521] sd 0:0:0:0: [sda] tag#143 CDB: Read(10) 28 00 00 00 08 08 00 00 08 00
May 12 19:41:14 pxGAME47 kernel: [2260314.485525] blk_update_request: I/O error, dev sda, sector 2061 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
May 12 19:41:14 pxGAME47 kernel: [2260314.487108] Buffer I/O error on dev sda1, logical block 1, async page read
May 12 19:41:14 pxGAME47 kernel: [2260314.488448] ata7: EH complete
May 12 19:41:14 pxGAME47 kernel: [2260314.488459] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 1 tries: 1

try to check with smartct;, but my guess is most likely you will need to fix/replace the drive.

taking a dd or clonezilla backup can come in handy (probably can still recover data from the drive)
 
Hello

after i using smartct commandline it's show like this



Best Regards,
 

Attachments

  • smartctl.txt
    10.3 KB · Views: 3
Hello

and here are the log before upgrade the disk before upgrade the disk look ok .

Best Regards,
 

Attachments

  • syslogbeforeup.txt
    62.9 KB · Views: 1
Hello

and here are the log before upgrade the disk before upgrade the disk look ok .

Best Regards,

there's still disk errors here around 16:41

smartctl also says ATA Error Count: 8084 so it doesn't look too good.

please check the cables (power and data) and make sure to take a backup

i'd also look into getting a replacement disk
 
Hello

from yesterday , i come to check the Node fix on it own don't know idea why it's alread has been fixed.

and when i check the disk health it's become normal again no error count .

BTW, thank you for supporting me, later on i will replace it.

Thank you,
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!