[SOLVED] Proxmox 8 - systemd-shutdown[1] - Failed to get MD_LEVEL property

emunt6

Member
Oct 3, 2022
105
15
18
HI!

I upgraded the test server from pve7 to pve8 - I found the following annoying bug:
> I issue the server shutdown - ( init 6 ), but the shutdown process is hanging and looping the following message:
"sytemd-shutdown[1] - Failed to get MD_LEVEL property for /dev/mdX, ignoring: No such file or directory"

The storage assembly on boot is done by a script due the complexity.
I'm using stacked/layered/nested storage:
1st layer : 24-bay disk shelf with a dual controller connected to the HBA card (HDDs with dual SAS connector)
2nd layer: Multipath is created to each HDD ( /dev/dm-0 , /dev/dm-1, ... )
3rd layer: "3way RAID-1" arrays created using mdadm ( 8x separated RAID-1 arrays each has 3x HDD )
4th layer: LVM is created (LVM striped) on the top of the RAID-1 arrays ( storage pool )
5th layer: EXT4 filesystem is created on top of the LVM

This setup is working and robust without error until Proxmox7, where "shutdown" is not working anymore and causes "degraded sate" of the array due the improper "shutdown".

I found the following about this:
https://github.com/canonical/probert/issues/125
According to this, it is an "udev or systemd" bug,
"udevadm does not show the MD_LEVEL information on an inactive RAID"


This is not fixed/patched in the Debian12 version?
Any help?


Code:
Hardware:
HPE ML350p gen8

~$ pveversion -v
proxmox-ve: 8.0.1 (running kernel: 6.2.16-3-pve)
pve-manager: 8.0.3 (running version: 8.0.3/bbf3993334bfa916)
pve-kernel-6.2: 8.0.2
pve-kernel-5.15: 7.4-3
pve-kernel-6.2.16-3-pve: 6.2.16-3
pve-kernel-5.15.107-2-pve: 5.15.107-2
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown: residual config
ifupdown2: 3.2.0-1+pmx2
libjs-extjs: 7.0.0-3
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.0
libpve-access-control: 8.0.3
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.5
libpve-guest-common-perl: 5.0.3
libpve-http-server-perl: 5.0.3
libpve-rs-perl: 0.8.3
libpve-storage-perl: 8.0.1
libqb0: 1.0.5-1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
openvswitch-switch: 3.1.0-2
proxmox-backup-client: 2.99.0-1
proxmox-backup-file-restore: 2.99.0-1
proxmox-kernel-helper: 8.0.2
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.1
proxmox-widget-toolkit: 4.0.5
pve-cluster: 8.0.1
pve-container: 5.0.3
pve-docs: 8.0.3
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.2
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.4
pve-qemu-kvm: 8.0.2-3
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1
 
Last edited:
***Update
proxmox-pve8:
pve-kernel-6.2.16-3-pve 6.2.16-3
mdadm 4.2-5
systemd 252.6-1

Now investigating, narrowed it down to the "mdadm+udev" in the pve-kernel ( bug? ).
Problem: Proxmox host is unable to restart/reboot due the "MD: mdX stopped." message is looping with "high speed".
 

Attachments

  • export.jpg
    export.jpg
    356.7 KB · Views: 12
Last edited:
I have the same problem after upgrading from PVE 7 to 8, and editing /usr/lib/systemd/system-shutdown/mdadm.shutdown didn't helped...
 
Update: 2023-04-17

It seems, proxmox developers solved it, they released new custom packages for pve8( -pmx1):

Code:
$> dpkg --list | grep -i pmx1

ii  libnss-myhostname:amd64                                     252.12-pmx1                                          amd64        nss module providing fallback resolution for the current hostname
ii  libnss-systemd:amd64                                        252.12-pmx1                                          amd64        nss module providing dynamic user and group name resolution
ii  libpam-systemd:amd64                                        252.12-pmx1                                          amd64        system and service manager - PAM module
ii  libsystemd-shared:amd64                                     252.12-pmx1                                          amd64        systemd shared private library
ii  libsystemd0:amd64                                           252.12-pmx1                                          amd64        systemd utility library
ii  libudev-dev:amd64                                           252.12-pmx1                                          amd64        libudev development files
ii  libudev1:amd64                                              252.12-pmx1                                          amd64        libudev shared library
ii  systemd                                                     252.12-pmx1                                          amd64        system and service manager
ii  systemd-sysv                                                252.12-pmx1                                          amd64        system and service manager - SysV compatibility symlinks
ii  udev                                                        252.12-pmx1                                          amd64        /dev/ and hotplug management daemon
 
Last edited:
After the update, it was looking like the problem was solved on first reboot, but after extensive testing, it's not. I still see this errors:
Code:
54.882310] watchdog: watchdogo: watchdog did not stop!
55.047108] systemd-shutdown [1]: Could not stop MD /dev/md1: No such device
55.047134] watchdog: watchdog: watchdog did not stop!
55.055696] systemd-shutdown [1]: Failed to finalize MD devices, ignoring.
Or this one spamming the console:
Code:
43.811775] systemd-shutdown [1]: Failed to get MD_LEVEL property for /dev/md0, ignoring: No such file or directory
43.812342] systemd-shutdown[1]: Failed to get MD_LEVEL property for /dev/md0, ignoring: No such file or directory
43.812905] systemd-shutdown[1]: Failed to get MD_LEVEL property for /dev/md0, ignoring: No such file or directory
43.813468] systemd-shutdown[1]: Failed to get MD_LEVEL property for /dev/md0, ignoring: No such file or directory
43.814082] systemd-shutdown[1]: Failed to get MD_LEVEL property for /dev/md0, ignoring: No such file or directory
The server hangs on shutdown and needs a manual reset...
 
After the update, it was looking like the problem was solved on first reboot, but after extensive testing, it's not. I still see this errors:
Code:
54.882310] watchdog: watchdogo: watchdog did not stop!
55.047108] systemd-shutdown [1]: Could not stop MD /dev/md1: No such device
55.047134] watchdog: watchdog: watchdog did not stop!
55.055696] systemd-shutdown [1]: Failed to finalize MD devices, ignoring.
Or this one spamming the console:
Code:
43.811775] systemd-shutdown [1]: Failed to get MD_LEVEL property for /dev/md0, ignoring: No such file or directory
43.812342] systemd-shutdown[1]: Failed to get MD_LEVEL property for /dev/md0, ignoring: No such file or directory
43.812905] systemd-shutdown[1]: Failed to get MD_LEVEL property for /dev/md0, ignoring: No such file or directory
43.813468] systemd-shutdown[1]: Failed to get MD_LEVEL property for /dev/md0, ignoring: No such file or directory
43.814082] systemd-shutdown[1]: Failed to get MD_LEVEL property for /dev/md0, ignoring: No such file or directory
The server hangs on shutdown and needs a manual reset...

Hi!

I'm testing again, maybe I found what is the problem.
First, it is necesseary to install this package: ( mdadm package has dependencies in the systemd-service files, but not correctly set in package dep file).
Code:
$> apt-get install dracut-core

As far as I can tell you, the "/etc/fstab" mount option "errors=remount-ro" causing this loop problem. ( I'm using this option in every mount ).
The systemd process reaching "reboot.target" / "shutdown.target" - it tries executing the "sd-remount.service" and "sd-mount.service" this causing the loop ( the mapper devices are already stopped/deleted - it is impossible to mount/remount the filesystem)
This is a systemd bug.

Can you show me your "/etc/fstab" file?
 
Last edited:
Can you show me your "/etc/fstab" file?
Sure, here it is, and indeed it contains errors=remount-ro but just for md2:
Code:
# / was on /dev/md2 during installation
UUID=45380162-b151-4e50-9af3-a9a549ca1757 /               ext4    discard,noatime,nodiratime,relatime,errors=remount-ro 0       1
# /boot was on /dev/md0 during installation
UUID=c5bc77dd-33b6-4691-bcf0-ba540d755bde /boot           ext4    discard,noatime,nodiratime,relatime 0       2
# swap was on /dev/md1 during installation
UUID=b6067ac4-c4fa-451e-835e-29a4e2deb606 none            swap    sw              0       0
 
could you please post the full journal output of a failed shutdown? thanks! note that we don't support MDRAID officially (for other reasons than this issue, but still something you might want to be aware of!)
 
could you please post the full journal output of a failed shutdown? thanks! note that we don't support MDRAID officially (for other reasons than this issue, but still something you might want to be aware of!)
There is no log, it wont finish the reboot/shutdown process

I attached video from ILO (2mb):
(download the file, the webbrowser player makes it low quality )
video-link

After the video, testing with the following:
Code:
$> systemctl mask sd-mount.service
$> systemctl mask sd-remount.service
 

Attachments

  • Screenshot.png
    Screenshot.png
    790.7 KB · Views: 9
Last edited:
I am not sure, but in the video, I see blk-availability being stopped at the same time - possibly that triggers a premature stopping of the MD devices? without a full (textual ;)) log of the shutdown it's hard to tell what's going on..
 
I am not sure, but in the video, I see blk-availability being stopped at the same time - possibly that triggers a premature stopping of the MD devices? without a full (textual ;)) log of the shutdown it's hard to tell what's going on..

After some testing, it seems solved:

Code:
$> apt-get install dracut-core
$> systemctl edit blk-availability.service
[Unit]
After=mdadm-shutdown.service
Requires=mdadm-shutdown.service

Code:
$> systemctl edit mdadm-shutdown.service
[Unit]
After=multipathd.service
Requires=multipathd.service

Code:
$> systemctl daemon-reload

It seems, with this modification resolves the mdadm raid problem.


There are the another issue, this kernel and systemd version is "unstable" - not ready for production system:
Code:
-systemd 252.12-pmx1
-pve-kernel-6.2.16-4-pve 6.2.16-5

It happened more time, its deadlocking:
Code:
root@proxmox2:~# systemctl
Failed to list units: Transport endpoint is not connected

root@proxmox2:~# systemctl daemon-reexec
Failed to reload daemon: Transport endpoint is not connected

root@proxmox2:~# systemctl daemon-reload
Failed to reload daemon: Transport endpoint is not connected
 
Last edited:
After some testing, it seems solved:

Code:
$> apt-get install dracut-core
$> systemctl edit blk-availability.service
[Unit]
After=mdadm-shutdown.service
Requires=mdadm-shutdown.service

Code:
$> systemctl edit mdadm-shutdown.service
[Unit]
After=multipathd.service
Requires=multipathd.service

Code:
$> systemctl daemon-reload

It seems, with this modification resolves the mdadm raid problem.
great!
There are the another issue, this kernel and systemd version is "unstable" - not ready for production system:
Code:
-systemd 252.12-pmx1
-pve-kernel-6.2.16-4-pve 6.2.16-5

It happened more time, its deadlocking:
Code:
root@proxmox2:~# systemctl
Failed to list units: Transport endpoint is not connected

root@proxmox2:~# systemctl daemon-reexec
Failed to reload daemon: Transport endpoint is not connected

root@proxmox2:~# systemctl daemon-reload
Failed to reload daemon: Transport endpoint is not connected
could you post more info about that? e.g., the journal leading up to the problem starting? it sounds like an issue with dbus, but I am not sure..
 
great!

could you post more info about that? e.g., the journal leading up to the problem starting? it sounds like an issue with dbus, but I am not sure..
Hi!

This happened, when you edit multiple times the same ".service" file, and calling the "systemctl daemon-reload", after the above error message appears. I did some more testing, hard to reproduce.

Another problem, there is a .service file, about 10/8 times is executed properly, but the rest it did not.
Code:
$> systemctl status pve-qm-start.service

● pve-qm-start.service - PVE-QM-START
     Loaded: loaded (/etc/systemd/system/pve-qm-start.service; enabled; preset: enabled)
     Active: inactive (dead)

The .service file:

Code:
[Unit]
Description=PVE-QM-START
After=pve-guests.service multipathd.service
Requires=pve-guests.service multipathd.service

[Service]
Type=oneshot
TimeoutSec=0
KillMode=
RemainAfterExit=yes
Restart=no
ExecStart=/bin/bash /etc/init.d/pve-qm start

[Install]
WantedBy=multi-user.target
 
that service file is not provided by us, I have no idea what it does ;)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!