Raid partially down

cglmicro

Member
Oct 12, 2020
98
10
13
51
Hi guys.

I know it may not be a PVE specific question, but I'm taking a chance here just in case.

One of my PVE server crash with a RAID error, and I had to reboot it so the GUI come back online and I start migrating my VM away on a secondary server. Once it will be empty, I'll be able to play with it and try to rebuild this RAID, but I may need a few advices on how to. I know I can reformat and reinstall PVE and join back the cluster, but I want to learn how to repair these kind of problem in Linux.

Here are some informations with the limited knowledge I have with RAID, correct my if I'm wrong please.

This show that I have 2 disks of equal size NVME1N1 and NVME0N1, and also 2 RAID arrays MD2 and MD4
Code:
root@proxmox13s:~# fdisk -l
Disk /dev/nvme1n1: 953.87 GiB, 1024209543168 bytes, 2000409264 sectors
Disk model: WDC CL SN720 SDAQNTW-1T00-2000
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: BD86A76A-6FF1-4636-A9AB-A43E47CDEB3B

Device            Start        End    Sectors   Size Type
/dev/nvme1n1p1     2048    1048575    1046528   511M EFI System
/dev/nvme1n1p2  1048576   42989567   41940992    20G Linux RAID
/dev/nvme1n1p3 42989568   45084671    2095104  1023M Linux swap
/dev/nvme1n1p4 45084672 2000394239 1955309568 932.4G Linux RAID


Disk /dev/nvme0n1: 953.87 GiB, 1024209543168 bytes, 2000409264 sectors
Disk model: WDC CL SN720 SDAQNTW-1T00-2000
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 8E102BE3-D90D-47B4-A323-7B4772C33370

Device              Start        End    Sectors   Size Type
/dev/nvme0n1p1       2048    1048575    1046528   511M EFI System
/dev/nvme0n1p2    1048576   42989567   41940992    20G Linux RAID
/dev/nvme0n1p3   42989568   45084671    2095104  1023M Linux RAID
/dev/nvme0n1p4   45084672 2000394239 1955309568 932.4G Linux RAID
/dev/nvme0n1p5 2000406528 2000408575       2048     1M Linux filesystem


Disk /dev/md2: 20 GiB, 21473722368 bytes, 41940864 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/md4: 932.36 GiB, 1001118433280 bytes, 1955309440 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/pve-data: 928.36 GiB, 996818288640 bytes, 1946910720 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

This block show that my RAID array MD2 only contain the partition NVME0N1P2 of the disk NVME0N1, so it's missing the partition NVME1N1P2 of the disk NVME1N1 :
Code:
root@proxmox13s:~# cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md4 : active raid1 nvme1n1p4[1] nvme0n1p4[0]
      977654720 blocks [2/2] [UU]
      bitmap: 7/8 pages [28KB], 65536KB chunk

md2 : active raid1 nvme0n1p2[0]
      20970432 blocks [2/1] [U_]

unused devices: <none>

Details about this MD2 RAID array:
Code:
root@proxmox13s:~# mdadm --detail /dev/md2
/dev/md2:
           Version : 0.90
     Creation Time : Mon Oct 26 19:15:03 2020
        Raid Level : raid1
        Array Size : 20970432 (20.00 GiB 21.47 GB)
     Used Dev Size : 20970432 (20.00 GiB 21.47 GB)
      Raid Devices : 2
     Total Devices : 1
   Preferred Minor : 2
       Persistence : Superblock is persistent

       Update Time : Wed Aug  3 11:08:02 2022
             State : clean, degraded
    Active Devices : 1
   Working Devices : 1
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : resync

              UUID : 688c4347:5b16e05d:a4d2adc2:26fd5302
            Events : 0.8160

    Number   Major   Minor   RaidDevice State
       0     259        7        0      active sync   /dev/nvme0n1p2
       -       0        0        1      removed

Details about this NVME1N1 disk:
Code:
root@proxmox13s:~# smartctl /dev/nvme1n1 -a
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.39-2-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       WDC CL SN720 SDAQNTW-1T00-2000
Serial Number:                      1851AF801711
Firmware Version:                   10109122
PCI Vendor/Subsystem ID:            0x15b7
IEEE OUI Identifier:                0x001b44
Total NVM Capacity:                 1,024,209,543,168 [1.02 TB]
Unallocated NVM Capacity:           0
Controller ID:                      8215
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,024,209,543,168 [1.02 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            001b44 8b441dc53a
Local Time is:                      Wed Aug  3 11:10:21 2022 EDT
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x001f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Log Page Attributes (0x02):         Cmd_Eff_Lg
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     80 Celsius
Critical Comp. Temp. Threshold:     85 Celsius
Namespace 1 Features (0x02):        NA_Fields

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     6.00W       -        -    0  0  0  0        0       0
 1 +     3.50W       -        -    1  1  1  1        0       0
 2 +     3.00W       -        -    2  2  2  2        0       0
 3 -   0.1000W       -        -    3  3  3  3     4000   10000
 4 -   0.0025W       -        -    4  4  4  4     4000   45000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2
 1 -    4096       0         1

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        37 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    25%
Data Units Read:                    1,270,546,616 [650 TB]
Data Units Written:                 290,354,003 [148 TB]
Host Read Commands:                 5,258,258,506
Host Write Commands:                5,830,127,572
Controller Busy Time:               30,587
Power Cycles:                       27
Power On Hours:                     24,639
Unsafe Shutdowns:                   23
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

Will I only need to remove and readd the partition NVME1N1P2 of the disk NVME1N1 to the array MD2 with command lines like this; will it be enough to rebuild this partition ?
Code:
mdadm /dev/md2 --manage --remove /dev/nvme1n1p2
mdadm /dev/md2 --manage --add /dev/nvme1n1p2

Anything telling you this drive is bad and need to be replaced, or is it just a software issue that damaged the RAID?

I also saw somewhere on this forum something about the Linux boot partition that was missing on the second disk, so in case of a failure with the first disk, the server won't boot. Do you think it's also the case here with the partition "/dev/nvme0n1p5 2000406528 2000408575 2048 1M Linux filesystem" that isn't present on the NVME1N1 disk? How to correct this?

Thanks A LOT for your help btw !!
 
From the output you provided nothing appears wrong with the drive physically . Disks don't normally fail a partition at a time...

You should examine your system's log and find the time when the disk(partition) was removed or not added to to the affected raid. May be it was busy due to something else getting a hold of it.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Also keep in mind that mdadm isn't supported by Proxmox. It will work and people will try to help here, but its still not officially supported. There are some problems where mdadm SW raid can cause data corruption, so that it is recomended to use ZFS instead.
 
Last edited:
From the output you provided nothing appears wrong with the drive physically . Disks don't normally fail a partition at a time...

You should examine your system's log and find the time when the disk(partition) was removed or not added to to the affected raid. May be it was busy due to something else getting a hold of it.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
Hi bbgeek.
At the time of the crash, the mdstat report I received was also showing the MD4 degraded, but by the time I sshed in the machine, it was showing that it was in sync.
Code:
This is an automatically generated mail message from mdadm running on proxmox13s

A Fail event had been detected on md device /dev/md2.

It could be related to component device /dev/nvme1n1p2.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md4 : active raid1 nvme0n1p4[0] nvme1n1p4[2](F)
      977654720 blocks [2/1] [U_]
      bitmap: 8/8 pages [32KB], 65536KB chunk

md2 : active raid1 nvme1n1p2[2](F) nvme0n1p2[0]
      20970432 blocks [2/1] [U_]

unused devices: <none>

============

This is an automatically generated mail message from mdadm running on proxmox13s

A Fail event had been detected on md device /dev/md4.

It could be related to component device /dev/nvme1n1p4.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md4 : active raid1 nvme0n1p4[0] nvme1n1p4[2](F)
      977654720 blocks [2/1] [U_]
      bitmap: 6/8 pages [24KB], 65536KB chunk

md2 : active raid1 nvme1n1p2[2](F) nvme0n1p2[0]
      20970432 blocks [2/1] [U_]

unused devices: <none>

I know that mdstat do a RAID CHECK once a month, I assume it was degraded during this check.

Thanks for your answer, I'll wait for advice on how to recover from this degraded array.
 
Also keep in mind that mdadm isn't supported by Proxmox. It will work and people will try to help here, but its still not officially supported. There are some problems where mdadm SW raid can cause data corruption, so that it is recomended to use ZFS instead.
Thanks Dunuin. I know that its isn't supported and that people that help do their best; it's more than appreciated and I try to help other users when I know the answer.
 
@cglmicro that seems just a summary of the event, there ought to be more messages about the sequence of events.
Without having seen any of those, my educated guess is you should be able to repair/recover with what you have now. Search of mdadm troubleshooting articles and runs some diagnostic commands to get better idea.
Again, very unlikely (yet anything is possible) that this is a hardware problem.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Thanks to all, my first problem is solved. To rebuild my RAID, I had to simply "mdadm --add /dev/md2 /dev/nvme1n1p2".

What about my other question :
I also saw somewhere on this forum something about the Linux boot partition that was missing on the second disk, so in case of a failure with the first disk, the server won't boot. Do you think it's also the case here with the partition "/dev/nvme0n1p5 2000406528 2000408575 2048 1M Linux filesystem" that isn't present on the NVME1N1 disk? How to correct this?

Will my PVE be able to boot if the first disk crash with this setup? If not, how to correct this?
Code:
root@proxmox13s:~# fdisk -l
Disk /dev/nvme0n1: 953.87 GiB, 1024209543168 bytes, 2000409264 sectors
Disk model: WDC CL SN720 SDAQNTW-1T00-2000
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 8E102BE3-D90D-47B4-A323-7B4772C33370

Device              Start        End    Sectors   Size Type
/dev/nvme0n1p1       2048    1048575    1046528   511M EFI System
/dev/nvme0n1p2    1048576   42989567   41940992    20G Linux RAID
/dev/nvme0n1p3   42989568   45084671    2095104  1023M Linux RAID
/dev/nvme0n1p4   45084672 2000394239 1955309568 932.4G Linux RAID
/dev/nvme0n1p5 2000406528 2000408575       2048     1M Linux filesystem


Disk /dev/nvme1n1: 953.87 GiB, 1024209543168 bytes, 2000409264 sectors
Disk model: WDC CL SN720 SDAQNTW-1T00-2000
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: BD86A76A-6FF1-4636-A9AB-A43E47CDEB3B

Device            Start        End    Sectors   Size Type
/dev/nvme1n1p1     2048    1048575    1046528   511M EFI System
/dev/nvme1n1p2  1048576   42989567   41940992    20G Linux RAID
/dev/nvme1n1p3 42989568   45084671    2095104  1023M Linux swap
/dev/nvme1n1p4 45084672 2000394239 1955309568 932.4G Linux RAID
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!