Mdraid changing link in /dev/md/

Dunuin

Distinguished Member
Jun 30, 2020
14,351
4,210
243
Germany
Hi,

A while ago I installed PVE 7.2 and had setup a LVM-Thin pool on top of a LUKS container on top of a mdadm raid1. The LUKS container gets automatically unlocked on boot by "/etc/crypttab" with a keyfile and the PVEs LVM-Thin storage then uses that Thin-pool. That worked totally fine so far. Even 2 or 3 days ago, where I rebooted my PVE node.
But today I upgraded my PVE node (the packages that should be released today or yesterday) and it wanted me to reboot again, because of the new firmware package, so I did this, but then my LVM-Thin storage wasn't working anymore.

APT upgrades since the last reboot where it still worked:
Code:
Start-Date: 2022-12-15  15:46:42
Commandline: apt-get dist-upgrade
Upgrade: pve-firmware:amd64 (3.5-6, 3.6-1), libproxmox-acme-perl:amd64 (1.4.2, 1.4.3), libproxmox-acme-plugins:amd64 (1.4.2, 1.4.3), pve-kernel-helper:amd64 (7.2-14, 7.3-1)
End-Date: 2022-12-15  15:48:51
Didn't change any host configs and didn't install anything new.

I checked my mdadm raid1 and it was healthy:
Code:
root@j3710:~# cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 
md127 : active raid1 sda5[0] sdb5[1]
      31439872 blocks super 1.2 [2/2] [UU]
      
unused devices: <none>

lsblk showed that my LUKS container wasn't mounted. So I checked my crypttab:
Code:
root@j3710:~# cat /etc/crypttab
# <target name> <source device>         <key file>      <options>
luks_raid1      /dev/md/j3710:md_1      /root/.keys/luks_raid1.key  luks

Then I check the "/dev/md" folder:
Code:
root@j3710:~# ls -l /dev/md
total 0
lrwxrwxrwx 1 root root 8 Dec 15 16:16 md_1 -> ../md127

So what previously always was called "/dev/md/j3710:md_1" is now only called "/dev/md/md_1".
So I changed the crypttab to "/dev/md/md_1", rebooted and everything was working again.

The question is now, why has that changed? And will it in the future switch back to "/dev/md/j3710:md_1" and fail again?

I did a bit of research and found this post explaining it a bit:
https://unix.stackexchange.com/a/533941
HOMEHOST
The homehost line gives a default value for the --homehost= option to mdadm. There should normally be only one other word on the line. It should either be a host name, or one of the special words <system>, <none> and <ignore>. If <system> is given, then the gethostname(2) systemcall is used to get the host name. This is the default.
[...]
When arrays are created, this host name will be stored in the metadata. When arrays are assembled using auto-assembly, arrays which do not record the correct homehost name in their metadata will be assembled using a "foreign" name. A "foreign" name alway ends with a digit string preceded by an underscore to differentiate it from any possible local name. e.g. /dev/md/1_1 or /dev/md/home_0.

My mdadm.conf is indeed set to use "HOMEHOST <system>":
Code:
root@j3710:~# cat /etc/mdadm/mdadm.conf
# mdadm.conf
#
# !NB! Run update-initramfs -u after updating this file.
# !NB! This will ensure that initramfs has an uptodate copy.
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default (built-in), scan all partitions (/proc/partitions) and all
# containers for MD superblocks. alternatively, specify devices to scan, using
# wildcards if desired.
#DEVICE partitions containers

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR <redacted>

# definitions of existing MD arrays

# This configuration was auto-generated on Wed, 16 Nov 2022 18:27:10 +0100 by mkconf

And my hostname is still "j3710" and also was "j3710" when the array was created.

My array is still named "j3710:md_1":
Code:
root@j3710:~# mdadm --examine --brief --scan  --config=partitions
ARRAY /dev/md/md_1  metadata=1.2 UUID=2cdcdb2b:faa6069f:4d0b4501:842554f6 name=j3710:md_1

root@j3710:~# mdadm --detail /dev/md/md_1

/dev/md/md_1:
           Version : 1.2
     Creation Time : Wed Nov 16 18:29:13 2022
        Raid Level : raid1
        Array Size : 31439872 (29.98 GiB 32.19 GB)
     Used Dev Size : 31439872 (29.98 GiB 32.19 GB)
      Raid Devices : 2
     Total Devices : 2
       Persistence : Superblock is persistent

       Update Time : Sat Dec  3 22:28:10 2022
             State : clean 
    Active Devices : 2
   Working Devices : 2
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : resync

              Name : j3710:md_1  (local to host j3710)
              UUID : 2cdcdb2b:faa6069f:4d0b4501:842554f6
            Events : 71

    Number   Major   Minor   RaidDevice State
       0       8        5        0      active sync   /dev/sda5
       1       8       21        1      active sync   /dev/sdb5

So why is this now called "/dev/md/md_1" instead of the previous "/dev/md/j3710:md_1"?

And could I fix this by using the above UUID in my crypttab, like this?:

Code:
# <target name> <source device>         <key file>      <options>
luks_raid1      UUID=2cdcdb2b:faa6069f:4d0b4501:842554f6      /root/.keys/luks_raid1.key  luks

If that works it will fix the problem with this host. But I got other PVE hosts that also got a LUKS encrypted swap on top of mdadm raid1 (I know this is problematic, but no one can tell me a better solution to have a mirrored swap) and there I wouldn`t be able to use UUIDs, as cryptsetup will format the swap partition on each reboot so the UUID would always change.

So the question is...why is it now called "/dev/md/md_1" when my mdadm.conf isn't set to "HOMEHOST <none>" nor "HOMEHOST <ignore>"?

@Stoiko Ivanov:
Thread is a continuation of this post.
 
could you post the complete journals (until the system is completely bootet - e.g. when pve-guests.service has finished running) from before/after the change?

think this should be the best way from here

When was the last time you rebooted that host before the issue occurred (i.e. when was the sencond-to-last reboot)

* mdadm does not seem to have gotten any updates recently
* systemd (which should be the source of udev) neither ...

which makes me think it could be something with a driver which behaves differently due to different firmware
 
could you post the complete journals (until the system is completely bootet - e.g. when pve-guests.service has finished running) from before/after the change?
Sure. Added them from start of boot to last "startall" of pve-guests.service.

When was the last time you rebooted that host before the issue occurred (i.e. when was the sencond-to-last reboot)

* mdadm does not seem to have gotten any updates recently
* systemd (which should be the source of udev) neither ...

which makes me think it could be something with a driver which behaves differently due to different firmware
Last reboot with "/dev/md/j3710:md_1" working was "Dec 14 13:11:43": Dec 14 13:11:43 j3710 systemd[1]: Found device /dev/md/j3710:md_1.

Then I did the apt dist-upgrade and rebooted at "Dec 15 16:16:06" where I then got this: Dec 15 16:16:06 j3710 systemd[1]: Found device /dev/md/md_1.

The "rpool/ROOT/pve-1" and "/rpool/data" are by the way encrypted, in case that is important. Partitioning looks like this:
Code:
root@j3710:~# lsblk
NAME                       MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                          8:0    0 186.3G  0 disk 
├─sda1                       8:1    0  1007K  0 part 
├─sda2                       8:2    0   512M  0 part 
├─sda3                       8:3    0  31.5G  0 part 
├─sda4                       8:4    0     4G  0 part 
├─sda5                       8:5    0    30G  0 part 
│ └─md127                    9:127  0    30G  0 raid1
│   └─luks_raid1           253:0    0    30G  0 crypt
│     ├─vg01-tpool_tmeta   253:1    0    32M  0 lvm   
│     │ └─vg01-tpool-tpool 253:3    0  29.9G  0 lvm   
│     │   └─vg01-tpool     253:4    0  29.9G  1 lvm   
│     └─vg01-tpool_tdata   253:2    0  29.9G  0 lvm   
│       └─vg01-tpool-tpool 253:3    0  29.9G  0 lvm   
│         └─vg01-tpool     253:4    0  29.9G  1 lvm   
└─sda6                       8:6    0 120.3G  0 part 
sdb                          8:16   0 186.3G  0 disk 
├─sdb1                       8:17   0  1007K  0 part 
├─sdb2                       8:18   0   512M  0 part 
├─sdb3                       8:19   0  31.5G  0 part 
├─sdb4                       8:20   0     4G  0 part 
├─sdb5                       8:21   0    30G  0 part 
│ └─md127                    9:127  0    30G  0 raid1
│   └─luks_raid1           253:0    0    30G  0 crypt
│     ├─vg01-tpool_tmeta   253:1    0    32M  0 lvm   
│     │ └─vg01-tpool-tpool 253:3    0  29.9G  0 lvm   
│     │   └─vg01-tpool     253:4    0  29.9G  1 lvm   
│     └─vg01-tpool_tdata   253:2    0  29.9G  0 lvm   
│       └─vg01-tpool-tpool 253:3    0  29.9G  0 lvm   
│         └─vg01-tpool     253:4    0  29.9G  1 lvm   
└─sdb6                       8:22   0 120.3G  0 part
Where partition 1 to 3 are created by the PVE installer as a ZFS mirror (with datasets of part 3 later encrypted). Partition 4 isn't used yet but should be encrypted swap later, but didn't found a reliable way yet to mirror the swap. Partition 5 is my mdadm raid1 with LUKS on top with LVM-Thin on top. Partition 6 is another encrypted ZFS pool as VM/LXC storage.
 

Attachments

  • apt_history.log
    271 bytes · Views: 1
  • apt_term.log
    2.2 KB · Views: 1
  • syslog(removed spamming nut and filebeat lines).txt
    661.2 KB · Views: 1
  • journal_2022-12-14_13_02_27_to_2022-12-14_13_13_47.txt
    158.2 KB · Views: 1
  • journal_2022-12-15_15_52_59_to_2022-12-15_15_57_31.txt
    157.3 KB · Views: 1
Then I did the apt dist-upgrade and rebooted at "Dec 15 16:16:06" where I then got this: Dec 15 16:16:06 j3710 systemd[1]: Found device /dev/md/md_1.
this is not present in the journal of that boot ... however the timeout for /dev/md/j3710:md_1 is visible (this one gets created due to the crypttab entry referencing it)

the logs look quite similar (if you remove the timestamps and look at them in vimdiff )

from apt_term.log:
Code:
update-initramfs: Generating /boot/initrd.img-5.15.74-1-pve
cryptsetup: ERROR: Couldn't resolve device rpool/ROOT/pve-1
cryptsetup: WARNING: Couldn't determine root device
is this always printed when you upgrade (and initrd gets updated)?

sadly since this wasn't a new kernel the initrd got overwritten - it still might be worth checking what its contents are - see unmkinitramfs(8)
* does /etc/hostname look ok, /etc/mdadm/mdadm.conf ...

to be really sure - maybe even use the image from one of the ESPs (`proxmox-boot-tool status` should tell you the UUIDs for mounting with /dev/disk/by-uuid/<UUID>)

finally - since all other packages should play no role here ... try downgrading pve-firmware to the older version

(pve-kernel-helper got upgraded, because it shares the source with the kernel-meta-packages (pve-kernel-5.15), which got a new version due to the new kernel)

I hope this helps
 
this is not present in the journal of that boot ... however the timeout for /dev/md/j3710:md_1 is visible (this one gets created due to the crypttab entry referencing it)
Right, I cut that off. That was from the reboot I did after changing the crypttab from "/dev/md/j3710:md_1" to "/dev/md/md_1".
the logs look quite similar (if you remove the timestamps and look at them in vimdiff )

from apt_term.log:
Code:
update-initramfs: Generating /boot/initrd.img-5.15.74-1-pve
cryptsetup: ERROR: Couldn't resolve device rpool/ROOT/pve-1
cryptsetup: WARNING: Couldn't determine root device
is this always printed when you upgrade (and initrd gets updated)?
That is caused by the cryptsetup package. The ZFS documentation recommends to remove the cryptsetup package to get rid of that error message:

Unlocking a ZFS encrypted root over SSH​

To use this feature:
  1. Install the dropbear-initramfs package. You may wish to uninstall the cryptsetup-initramfs package to avoid warnings.
I once removed that cryptsetup package in October where I wasn't using LUKS yet. There this error wouldn't show up when rebuilding the initramfs. But later in novemer I installed the cryptsetup package again because I wanted to add the LUKS encrypted LVM-Thin. Then this error message started again. But was never a problem so far. I for example installed new new pve-kernels without a problem.

sadly since this wasn't a new kernel the initrd got overwritten - it still might be worth checking what its contents are - see unmkinitramfs(8)
I extracted the initramfs contents. But not sure what I should I look for.
* does /etc/hostname look ok, /etc/mdadm/mdadm.conf ...
/etc/hostname and mdadm.conf look normal:
Code:
cat /etc/hostname
j3710

cat /etc/mdadm/mddm.conf
cat: /etc/mdadm/mddm.conf: No such file or directory
root@j3710:~# cat /etc/mdadm/mdadm.conf
# mdadm.conf
#
# !NB! Run update-initramfs -u after updating this file.
# !NB! This will ensure that initramfs has an uptodate copy.
#
# Please refer to mdadm.conf(5) for information about this file.
#

# by default (built-in), scan all partitions (/proc/partitions) and all
# containers for MD superblocks. alternatively, specify devices to scan, using
# wildcards if desired.
#DEVICE partitions containers

# automatically tag new arrays as belonging to the local system
HOMEHOST <system>

# instruct the monitoring daemon where to send mail alerts
MAILADDR <redacted>

# definitions of existing MD arrays

# This configuration was auto-generated on Wed, 16 Nov 2022 18:27:10 +0100 by mkconf

According to the mtime both config files didn't changed. But I got a weekly "/etc" and "/var/" from last sunday in case I should check something.
finally - since all other packages should play no role here ... try downgrading pve-firmware to the older version

(pve-kernel-helper got upgraded, because it shares the source with the kernel-meta-packages (pve-kernel-5.15), which got a new version due to the new kernel)

I hope this helps
Will try that later.
 
But later in novemer I installed the cryptsetup package again because I wanted to add the LUKS encrypted LVM-Thin. Then this error message started again. But was never a problem so far. I for example installed new new pve-kernels without a problem.
Ok - as I expected - so probably unrelated

/etc/hostname and mdadm.conf look normal:
would have been also what I would had checked.

However - rereading what you posted:
A "foreign" name alway ends with a digit string preceded by an underscore to differentiate it from any possible local name. e.g. /dev/md/1_1 or /dev/md/home_0.

and
luks_raid1 /dev/md/j3710:md_1 /root/.keys/luks_raid1.key luks
I would read that as a foreign array?

as said - has been a while since I dealt with md-raid - and when I did I think this hostnames were either not present or not enabled by default - additionally I think I usually had the arrays explicitly enumerated in /etc/mdadm.conf - thus I never ran into such issues.
 
I would read that as a foreign array?
I labeled that array as "md_1" when creating it. So I guess a foreign array should be "md_1_1" or similar.

From my documentation:
Create array:
mdadm --create /dev/md/md_1 --level=1 --raid-devices=2 /dev/disk/by-id/ata-INTEL_SSDSC2BA200G4_BTHV5463013V200MGN-part5 /dev/disk/by-id/ata-INTEL_SSDSC2BA200G4_BTHV514406A4200MGN-part5 YES

Check that it was created:
mdadm --examine --brief --scan --config=partitions
Returns:
ARRAY /dev/md/md_1 metadata=1.2 UUID=2cdcdb2b:faa6069f:4d0b4501:842554f6 name=j3710:md_1
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!