Backup Issues - NVME Drive failing ?

RNab

Member
Jun 20, 2021
31
3
13
34
Hi all,

Hope i can get some help here. Recently, when backing up one of my VM, it would always fail at 78% with error -125. I checked multiple times, it doesnt seem to be related to resources (OOM or else). I restored an old backup, and it was "all ok" for probably around a month or 2, until the same issue started to happen again.

In the syslog, after the backup, I see these lines :


Code:
Jul 24 22:41:53 proxmox kernel: [25604.027744] blk_update_request: critical medium error, dev nvme0n1, sector 404283648 op 0x0:(READ) flags 0x0 phys_seg 16 prio class 0
Jul 24 22:41:53 proxmox kernel: [25604.029399] blk_update_request: critical medium error, dev nvme0n1, sector 404283520 op 0x0:(READ) flags 0x0 phys_seg 16 prio class 0
Jul 24 22:41:53 proxmox kernel: [25604.029473] blk_update_request: critical medium error, dev nvme0n1, sector 404283392 op 0x0:(READ) flags 0x0 phys_seg 16 prio class 0
Jul 24 22:41:53 proxmox kernel: [25604.031500] blk_update_request: critical medium error, dev nvme0n1, sector 404327936 op 0x0:(READ) flags 0x0 phys_seg 16 prio class 0
Jul 24 22:41:53 proxmox kernel: [25604.033304] blk_update_request: critical medium error, dev nvme0n1, sector 404327808 op 0x0:(READ) flags 0x0 phys_seg 16 prio class 0
Jul 24 22:41:53 proxmox kernel: [25604.033769] blk_update_request: critical medium error, dev nvme0n1, sector 404361600 op 0x0:(READ) flags 0x0 phys_seg 9 prio class 0

Which sounds like its not a good news.

However, when I run the SMART on the GUI I get these :


Code:
SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        52 Celsius
Available Spare:                    95%
Available Spare Threshold:          5%
Percentage Used:                    4%
Data Units Read:                    324,217,443 [165 TB]
Data Units Written:                 59,578,220 [30.5 TB]
Host Read Commands:                 2,711,635,111
Host Write Commands:                2,019,567,071
Controller Busy Time:               26,677
Power Cycles:                       61
Power On Hours:                     18,327
Unsafe Shutdowns:                   16
Media and Data Integrity Errors:    612
Error Information Log Entries:      671
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

I'm fairly new to all this, but it looks reasonably ok ?
How can I fix this issue ?

I've already started (continued) to backup everything, in the event of a hard failure, but I'm hoping i could still use this drive (its barely 2years old).

One thing I just recall now : my proxmox server suffered one or two hard resets without any notice thanks to my 1 year old daughter that thought pressing on the blinking button would be a fun thing to do. It might or might not have been after that that it started to have these issues.

Thanks
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!