Boot issues with 7.x: system-udevd " Failed to update device symlinks: Too many levels of symbolic links"

xed

Active Member
Jun 28, 2018
28
1
43
124
Just narrowed down my search for the culprit of my boot issues to this:


Code:
░░ Subject: A start job for unit systemd-rfkill.socket has finished successfully
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit systemd-rfkill.socket has finished successfully.
░░
░░ The job identifier is 169.
Oct 04 04:35:54 XXX kernel: ipmi_si IPI0001:00: IPMI kcs interface initialized
Oct 04 04:35:54 XXX kernel: ipmi_ssif: IPMI SSIF Interface driver
Oct 04 04:35:55 XXX systemd-udevd[10006]: sdh1: Failed to update device symlinks: Too many levels of symbolic links
Oct 04 04:35:55 XXX systemd-udevd[10017]: sdl1: Failed to update device symlinks: Too many levels of symbolic links
Oct 04 04:35:55 XXX systemd-udevd[10032]: sdm1: Failed to update device symlinks: Too many levels of symbolic links
Oct 04 04:35:55 XXX systemd-udevd[10043]: sda1: Failed to update device symlinks: Too many levels of symbolic links
Oct 04 04:35:55 XXX systemd-udevd[10037]: sdc1: Failed to update device symlinks: Too many levels of symbolic links
Oct 04 04:35:55 XX systemd-udevd[10050]: sdb1: Failed to update device symlinks: Too many levels of symbolic links
Oct 04 04:35:55 XXX systemd-udevd[10011]: sde1: Failed to update device symlinks: Too many levels of symbolic links
Oct 04 04:35:55 XXX systemd-udevd[10027]: sdf1: Failed to update device symlinks: Too many levels of symbolic links
Oct 04 04:35:55 XXX systemd-udevd[10025]: sdd1: Failed to update device symlinks: Too many levels of symbolic links
Oct 04 04:35:55 XXX systemd-udevd[10037]: sdc1: Failed to update device symlinks: Too many levels of symbolic links
Oct 04 04:35:55 XXX systemd-udevd[10050]: sdb1: Failed to update device symlinks: Too many levels of symbolic links
Oct 04 04:35:55 XXX systemd-udevd[10011]: sde1: Failed to update device symlinks: Too many levels of symbolic links
...
 04 04:37:54 teva systemd[1]: ifupdown2-pre.service: Main process exited, code=exited, status=1/FAILURE
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ An ExecStart= process belonging to unit ifupdown2-pre.service has exited.
░░
░░ The process' exit code is 'exited' and its exit status is 1.
Oct 04 04:37:54 XXX systemd[1]: ifupdown2-pre.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ The unit ifupdown2-pre.service has entered the 'failed' state with result 'exit-code'.
Oct 04 04:37:54 XXX systemd[1]: Failed to start Helper to synchronize boot up for ifupdown.
░░ Subject: A start job for unit ifupdown2-pre.service has failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit ifupdown2-pre.service has finished with a failure.
░░
░░ The job identifier is 42 and the job result is failed.
Oct 04 04:37:54 XXX systemd[1]: Dependency failed for Network initialization.
░░ Subject: A start job for unit networking.service has failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ A start job for unit networking.service has finished with a failure.
░░
░░ The job identifier is 41 and the job result is dependency.
Oct 04 04:37:54 XXX systemd[1]: networking.service: Job networking.service/start failed with result 'dependency'.
Oct 04 04:37:54 XX systemd[1]: ifupdown2-pre.service: Consumed 2.208s CPU time.
░░ Subject: Resources consumed by unit runtime
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support

16 SATA drives, 3 SAS3 drives and 2NVMe system.


Code:
# pveversion -v
proxmox-ve: 7.0-2 (running kernel: 5.11.22-4-pve)
pve-manager: 7.0-11 (running version: 7.0-11/63d82f4e)
pve-kernel-helper: 7.1-2
pve-kernel-5.11: 7.0-7
pve-kernel-5.11.22-4-pve: 5.11.22-9
ceph-fuse: 15.2.14-pve1
corosync: 3.1.5-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve1
libproxmox-acme-perl: 1.3.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-9
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-2
libpve-storage-perl: 7.0-11
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.10-1
proxmox-backup-file-restore: 2.0.10-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.0-10
pve-docs: 7.0-5
pve-edk2-firmware: 3.20200531-1
pve-firewall: 4.2-3
pve-firmware: 3.3-1
pve-ha-manager: 3.3-1
pve-i18n: 2.5-1
pve-qemu-kvm: 6.0.0-4
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-14
smartmontools: 7.2-1
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1

rpool with native ZFS encryption, everything else is a fresh install :(
 
Quick update: It might not be the culprit. The symptoms are: all requests time out for the Storage tabs (cannot access anything at all), or reports 'communication failure'. No NFS in use either.
 
This could be the same sort of symptoms that I reported [1], but until now nobody has reported the same issues.
I'm lucky that my host had no boot issues, just the messages, but after reading the bug report, there were reports of unbootable systems, so I was cautious.
If this is the same issue, it depends on processor speed, how many cores, how many block devices with the same label and uuid (zfs), perhaps more variables.
I've read in my search that systemd-udev could be timing out and bring down ifupdown2-pre.service with it.

You could look at your boot times with systemd-analyze blame
Or maybe the same workaround I used in /etc/udev/udev.conf, but you should than try what value works for you and watch if other issues appear.

[1] https://forum.proxmox.com/threads/a...ives-too-many-levels-of-symbolic-links.96565/
 
Hi! I'm on it, already reviewing the 'blame' output. I can confirm udevadm settle is the real culprit, and it is mostly occurring when someone leaves an iODD "multi image" USB3 disk plugged. I think any drive that is dying/faulty will also cause this issue, or any controller that responds in some non-standard way.

There is a workaround that I do not recommend without further investigation: masking the ifupdown2-pre service. That will remove the 'settle' requirement for udevadm from the boot process for the NICs and things will continue as normal. Most folks will be fine doing this, but I don't recommend it purely out of caution, because udevadm settle should finish just fine unless you have issues with your controllers. If you are using HW RAID I also don't know the potential side effects, usually the RAID controller is already initialized at that point (it does so via its BIOS-launched internal firmware), but I can't extrapolate that to all cases and scenarios.

tl;dr this is often related to problems with drives/storage components.

I'll keep testing.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!