mdadm error since upgrading to 6.3.

Brian Read

Renowned Member
Jan 4, 2017
123
7
83
75
Can someone explain?

Code:
This is an automatically generated mail message from mdadm
running on pve

A DegradedArray event had been detected on md device /dev/md/0.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 zd112p1[0]
      255936 blocks super 1.0 [2/1] [U_]
     
unused devices: <none>


Code:
2020-11-30
pve-manager/6.3-2/22f57405 (running kernel: 5.4.73-1-pve)
NAME                           USED  AVAIL     REFER  MOUNTPOINT
rpool                         2.38T  2.86T      139K  /rpool
rpool/ROOT                    22.3G  2.86T      160K  /rpool/ROOT
rpool/ROOT/pve-1              22.3G  2.86T     22.3G  /
rpool/data                    2.35T  2.86T      160K  /rpool/data
rpool/data/base-110-disk-0    3.68G  2.86T     3.68G  -
rpool/data/base-112-disk-0    2.47G  2.86T     4.74G  -
rpool/data/base-118-disk-0    4.57G  2.86T     4.57G  -
rpool/data/subvol-105-disk-1  1.34G  18.7G     1.34G  /rpool/data/subvol-105-disk-1
rpool/data/subvol-107-disk-0  2.98G  17.0G     2.98G  /rpool/data/subvol-107-disk-0
rpool/data/subvol-109-disk-1   812M  31.2G      812M  /rpool/data/subvol-109-disk-1
rpool/data/subvol-114-disk-0   305M  19.7G      305M  /rpool/data/subvol-114-disk-0
rpool/data/vm-100-disk-1      2.08T  2.86T     2.08T  -
rpool/data/vm-101-disk-0      6.44G  2.86T     6.94G  -
rpool/data/vm-102-disk-0      5.18G  2.86T     5.18G  -
rpool/data/vm-104-disk-1      84.6G  2.86T     84.6G  -
rpool/data/vm-106-disk-1      77.8G  2.86T     77.8G  -
rpool/data/vm-108-disk-1      5.48G  2.86T     5.48G  -
rpool/data/vm-111-disk-1      3.37G  2.86T     3.37G  -
rpool/data/vm-115-disk-0      4.85G  2.86T     7.48G  -
rpool/data/vm-120-disk-0      30.0G  2.86T     30.0G  -
rpool/data/vm-121-disk-0      39.6G  2.86T     39.6G  -
rpool/swap                    10.3G  2.86T     10.3G  -
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
rpool  8.12T  3.57T  4.55T        -         -    22%    43%  1.00x    ONLINE  -
  pool: rpool
state: ONLINE
status: One or more devices could not be used because the label is missing or
    invalid.  Sufficient replicas exist for the pool to continue
    functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub repaired 0B in 0 days 10:53:42 with 0 errors on Sun Nov  8 11:17:43 2020
config:

    NAME        STATE     READ WRITE CKSUM
    rpool       ONLINE       0     0     0
      raidz1-0  ONLINE       0     0     0
        sda2    ONLINE       0     0     0
        sdb     ONLINE       0     0     0
        sdc2    ONLINE       0     0     0
    logs  
      sdd1      ONLINE       0     0     0
    cache
      sdd2      FAULTED      0     0     0  corrupted data
      sdd2      ONLINE       0     0     0

errors: No known data errors
 
somehow mdraid picked up on of your VM's disks?
 
somehow mdraid picked up on of your VM's disks?

None of my VMs have /etc/md/0 AFAICT

Code:
root@pve:~# fdisk /dev/zd112p1
Welcome to fdisk (util-linux 2.33.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.
The old linux_raid_member signature will be removed by a write command.
Device does not contain a recognized partition table.
Created a new DOS disklabel with disk identifier 0xa8178922.
Command (m for help): p
Disk /dev/zd112p1: 250 MiB, 262144000 bytes, 512000 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 8192 bytes
I/O size (minimum/optimal): 8192 bytes / 8192 bytes
Disklabel type: dos
Disk identifier: 0xa8178922

I think its more likely to be the "ghost" /dev/sdd2 (see first message) which I have never been able to delete since the first /dev/sdd went faulty and I replaced it.

Some help deleting it would be great?
 
no, /dev/zd112p1 is definitely a partition on a zvol, not a vdev..
 
Perhaps its left over from an earlier attempt at loading pve on one of the 3 discs in the ZFS set. I seem to have two partitions and a full disc in it.
I may have had to install debian, then pve on top, rather than using the dedicated iso.
I had a number of goes at it before getting something that worked for me. 2 years ago as you can see. I'll just live with it I think, a bit too dangerous to stop the raid array I think.

Code:
root@pve:~# mdadm --detail /dev/md0
/dev/md0:
           Version : 1.0
     Creation Time : Thu Nov 23 10:07:59 2017
        Raid Level : raid1
        Array Size : 255936 (249.94 MiB 262.08 MB)
     Used Dev Size : 255936 (249.94 MiB 262.08 MB)
      Raid Devices : 2
     Total Devices : 1
       Persistence : Superblock is persistent

       Update Time : Sat Nov 28 09:04:33 2020
             State : clean, degraded
    Active Devices : 1
   Working Devices : 1
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : resync

              Name : localhost.localdomain:0
              UUID : a8dad329:be8f6a48:f913d6f8:d60ce5e6
            Events : 3910

    Number   Major   Minor   RaidDevice State
       0     230      113        0      active sync   /dev/zd112p1
       -       0        0        1      removed
root@pve:~#
 
you can find out which zvol it belongs to by looking at /dev/zvol/rpool/...
 
Code:
root@pve:~# ls /dev/zvol/rpool/
data/ swap 
root@pve:~# ls /dev/zvol/rpool/
data  swap
root@pve:~# ls /dev/zvol/rpool/data/
base-102-disk-0        vm-102-disk-0-part1  vm-113-disk-0
base-110-disk-0        vm-102-disk-0-part2  vm-113-disk-0-part1
base-110-disk-0-part1  vm-103-disk-0        vm-113-disk-0-part2
base-110-disk-0-part2  vm-103-disk-0-part1  vm-115-disk-0
base-112-disk-0        vm-103-disk-0-part2  vm-115-disk-0-part1
base-112-disk-0-part1  vm-104-disk-1        vm-115-disk-0-part2
base-112-disk-0-part2  vm-104-disk-1-part1  vm-116-disk-0
base-118-disk-0        vm-106-disk-1        vm-116-disk-0-part1
base-118-disk-0-part1  vm-106-disk-1-part1  vm-116-disk-0-part2
base-118-disk-0-part2  vm-106-disk-1-part2  vm-120-disk-0
vm-100-disk-1           vm-108-disk-1        vm-120-disk-0-part1
vm-100-disk-1-part1    vm-108-disk-1-part1  vm-120-disk-0-part2
vm-100-disk-1-part2    vm-108-disk-1-part2  vm-120-disk-0-part5
vm-101-disk-0           vm-111-disk-1        vm-121-disk-0
vm-101-disk-0-part1    vm-111-disk-1-part1  vm-121-disk-0-part1
vm-101-disk-0-part2    vm-111-disk-1-part2
root@pve:~#
 
if you do ls -lh you will see the link targets ;)
 
Code:
root@pve:~# ls -lh /dev/zvol/rpool/data/ | grep zd112p1
lrwxrwxrwx 1 root root 16 Nov 28 09:24 vm-100-disk-1-part1 -> ../../../zd112p1
root@pve:~#

VM 100 is my main email and shared folder server. It certainly does have degraded partitions, that must be the boot partition.

Why is it being visible to mdadm in the pve (ie the host)?

And only started happeningf since 6.3 was installed?
 
Last edited:
see /etc/mdadm/mdadm.conf , where you can disable mdadm scanning. PVE does not touch any mdraid stuff in any way, we don't even support it.
 
Code:
# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

# definitions of existing MD arrays
ARRAY /dev/md/0  metadata=1.0 UUID=a8dad329:be8f6a48:f913d6f8:d60ce5e6 name=localhost.localdomain:0

# This configuration was auto-generated on Thu, 26 Nov 2020 16:54:47 +0000 by mkconf

Must have been a debian update then. Have commented out that line. Thanks for the help.
 
I rebooted, and now get this:

Code:
This is an automatically generated mail message from mdadm
running on pve

A DegradedArray event had been detected on md device /dev/md127.

Faithfully yours, etc.

P.S. The /proc/mdstat file currently contains the following:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md127 : active (auto-read-only) raid1 zd80p1[0]
      255936 blocks super 1.0 [2/1] [U_]
      
unused devices: <none>

Code:
root@pve:~# mdadm --examine /dev/md127
mdadm: No md superblock detected on /dev/md127.
root@pve:~# mdadm --examine /dev/zd80p1
/dev/zd80p1:
          Magic : a92b4efc
        Version : 1.0
    Feature Map : 0x0
     Array UUID : a8dad329:be8f6a48:f913d6f8:d60ce5e6
           Name : localhost.localdomain:0
  Creation Time : Thu Nov 23 10:07:59 2017
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 511968 (249.98 MiB 262.13 MB)
     Array Size : 255936 (249.94 MiB 262.08 MB)
  Used Dev Size : 511872 (249.94 MiB 262.08 MB)
   Super Offset : 511984 sectors
   Unused Space : before=0 sectors, after=104 sectors
          State : clean
    Device UUID : d05e722e:ae25e67b:d24839a8:ab7640c0

    Update Time : Fri Dec  4 10:01:58 2020
  Bad Block Log : 512 entries available at offset -8 sectors
       Checksum : d9f353bd - correct
         Events : 3930


   Device Role : Active device 0
   Array State : A. ('A' == active, '.' == missing, 'R' == replacing)
root@pve:~#
root@pve:~# ls -lh /dev/zd80p1
brw-rw---- 1 root disk 230, 81 Dec  4 10:00 /dev/zd80p1
root@pve:~# ls -lh /dev/zvol/rpool/data/ | grep zd80p1
lrwxrwxrwx 1 root root 15 Dec  4 10:00 vm-100-disk-1-part1 -> ../../../zd80p1
root@pve:~#

Looks like it has picked up another, however:

Code:
root@pve:~# cat /etc/mdadm/mdadm.conf
# mdadm.conf
#
# !NB! Run update-initramfs -u after updating this file.
# !NB! This will ensure that initramfs has an uptodate copy.
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default (built-in), scan all partitions (/proc/partitions) and all
# containers for MD superblocks. alternatively, specify devices to scan, using
# wildcards if desired.
#DEVICE partitions containers

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR root

# definitions of existing MD arrays
#ARRAY /dev/md/0  metadata=1.0 UUID=a8dad329:be8f6a48:f913d6f8:d60ce5e6 name=localhost.localdomain:0

# This configuration was auto-generated on Thu, 26 Nov 2020 16:54:47 +0000 by mkconf

So nothing in mdadm.conf.
 
see the comment in that file ;) I suggest to uninstall mdadm if you don't use it one the host..