[SOLVED] After upgrade PVE from 6.4 to 7 systemd-udevd gives: Too many levels of symbolic links

janssensm

Well-Known Member
Dec 18, 2016
234
81
48
Hi,

Yesterday I upgraded from 6.4-1 to 7.0-2, and the host boots fine.
Before starting all vm's and ct's I just started 1 ct and 1 vm, they work fine. Zpools report no issues.
In the log however I found multiple messages like these, they were never in the logs in pve 6.x or 5.x, so it started in 7:

Sep 20 10:06:05 pve systemd-udevd[7196]: sdg1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:05 pve systemd-udevd[7391]: sdh3: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:05 pve systemd-udevd[7386]: sde3: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:06 pve systemd-udevd[7178]: zd240p3: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:06 pve systemd-udevd[7377]: zd16p3: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:06 pve systemd-udevd[7395]: zd640p1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:06 pve systemd-udevd[7396]: zd848p1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:07 pve systemd-udevd[7153]: sda1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:07 pve systemd-udevd[7171]: sdd1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:07 pve systemd-udevd[7200]: sdb1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:07 pve systemd-udevd[7216]: sdk1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:07 pve systemd-udevd[7390]: sdj1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:07 pve systemd-udevd[7391]: sdh3: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:07 pve systemd-udevd[7196]: sdg1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:07 pve systemd-udevd[7174]: sdc1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:07 pve systemd-udevd[7170]: zd192p1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:07 pve systemd-udevd[7398]: zd160p1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:07 pve systemd-udevd[7388]: zd464p1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:07 pve systemd-udevd[7175]: zd480p1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:07 pve systemd-udevd[7178]: zd240p3: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:07 pve systemd-udevd[7377]: zd16p3: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:07 pve systemd-udevd[7186]: zd880p1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:07 pve systemd-udevd[7177]: zd720p1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:07 pve systemd-udevd[7395]: zd640p1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:08 pve systemd-udevd[7171]: sdd1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:08 pve systemd-udevd[7200]: sdb1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:08 pve systemd-udevd[7216]: sdk1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:09 pve systemd-udevd[7170]: zd192p1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:09 pve systemd-udevd[7388]: zd464p1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:09 pve systemd-udevd[7186]: zd880p1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:09 pve systemd-udevd[7177]: zd720p1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:09 pve systemd-udevd[7398]: zd160p1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:09 pve systemd-udevd[7175]: zd480p1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:09 pve systemd-udevd[7200]: sdb1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:09 pve systemd-udevd[7171]: sdd1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:09 pve systemd-udevd[7216]: sdk1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 10:06:09 pve systemd-udevd[7174]: sdc1: Failed to update device symlinks: Too many levels of symbolic links

The devices reporting these messages are not always the same every boot, so there is some randomness.
So I decided to boot with udev_log=debug in /etc/udev/udev.conf.
This is part of this debug log, I can provide more if necessary:

Sep 20 12:32:56 pve systemd-udevd[7007]: zd464p1: Found 'b230:465' claiming '/run/udev/links/\x2fdisk\x2fby-label\x2ftank1'
Sep 20 12:32:56 pve systemd-udevd[7007]: zd464p1: Found 'b230:193' claiming '/run/udev/links/\x2fdisk\x2fby-label\x2ftank1'
Sep 20 12:32:56 pve systemd-udevd[7249]: 19: Handling device node '/dev/vfio/19', devnum=c241:0
Sep 20 12:32:56 pve systemd-udevd[6976]: 20: sd-device-monitor: Passed 189 byte to netlink monitor
Sep 20 12:32:56 pve systemd-udevd[6976]: 21: Device (SEQNUM=4635, ACTION=add) is queued
Sep 20 12:32:56 pve systemd-udevd[6993]: zd880p1: Found 'b230:465' claiming '/run/udev/links/\x2fdisk\x2fby-label\x2ftank1'
Sep 20 12:32:56 pve systemd-udevd[7020]: zd192p1: Found 'b8:161' claiming '/run/udev/links/\x2fdisk\x2fby-label\x2ftank1'
Sep 20 12:32:56 pve systemd-udevd[6993]: zd880p1: Found 'b230:193' claiming '/run/udev/links/\x2fdisk\x2fby-label\x2ftank1'
Sep 20 12:32:56 pve systemd-udevd[7018]: vcsa1: Preserve already existing symlink '/dev/char/7:129' to '../vcsa1'
Sep 20 12:32:56 pve systemd-udevd[7257]: 20: Processing device (SEQNUM=4634, ACTION=add)
Sep 20 12:32:56 pve systemd-udevd[7004]: vcsu: Device (SEQNUM=4631, ACTION=add) processed
Sep 20 12:32:56 pve systemd-udevd[6993]: zd880p1: Found 'b230:161' claiming '/run/udev/links/\x2fdisk\x2fby-label\x2ftank1'
Sep 20 12:32:56 pve systemd-udevd[7252]: zd480p1: Atomically replace '/dev/disk/by-label/tank1'
Sep 20 12:32:56 pve systemd-udevd[6993]: zd880p1: Found 'b8:49' claiming '/run/udev/links/\x2fdisk\x2fby-label\x2ftank1'
Sep 20 12:32:56 pve systemd-udevd[7257]: 20: Handling device node '/dev/vfio/20', devnum=c241:1
Sep 20 12:32:56 pve systemd-udevd[7249]: 19: Preserve permissions of /dev/vfio/19, uid=0, gid=0, mode=0600
Sep 20 12:32:56 pve systemd-udevd[7021]: vcsu1: Setting permissions /dev/vcsu1, uid=0, gid=5, mode=0660
Sep 20 12:32:56 pve systemd-udevd[7021]: vcsu1: Preserve already existing symlink '/dev/char/7:65' to '../vcsu1'
Sep 20 12:32:56 pve systemd-udevd[7242]: zd160p1: Found 'b8:145' claiming '/run/udev/links/\x2fdisk\x2fby-label\x2ftank1'
Sep 20 12:32:56 pve systemd-udevd[6993]: zd880p1: Found 'b8:33' claiming '/run/udev/links/\x2fdisk\x2fby-label\x2ftank1'
Sep 20 12:32:56 pve systemd-udevd[7022]: sdk1: Found 'b8:33' claiming '/run/udev/links/\x2fdisk\x2fby-label\x2ftank1'
Sep 20 12:32:56 pve systemd-udevd[7004]: vcsu: sd-device-monitor: Passed 183 byte to netlink monitor
Sep 20 12:32:56 pve systemd-udevd[7252]: zd480p1: Found 'b230:881' claiming '/run/udev/links/\x2fdisk\x2fby-label\x2ftank1'
Sep 20 12:32:56 pve systemd-udevd[7011]: sdc1: Found 'b230:193' claiming '/run/udev/links/\x2fdisk\x2fby-label\x2ftank1'
Sep 20 12:32:56 pve systemd-udevd[7020]: zd192p1: Atomically replace '/dev/disk/by-label/tank1'
Sep 20 12:32:56 pve systemd-udevd[7257]: 20: Preserve permissions of /dev/vfio/20, uid=0, gid=0, mode=0600
Sep 20 12:32:56 pve systemd-udevd[7249]: 19: Preserve already existing symlink '/dev/char/241:0' to '../vfio/19'
Sep 20 12:32:56 pve systemd-udevd[7011]: sdc1: Found 'b230:161' claiming '/run/udev/links/\x2fdisk\x2fby-label\x2ftank1'
Sep 20 12:32:56 pve systemd-udevd[7022]: sdk1: Found 'b8:17' claiming '/run/udev/links/\x2fdisk\x2fby-label\x2ftank1'
Sep 20 12:32:56 pve systemd-udevd[7243]: zd720p1: Found 'b230:161' claiming '/run/udev/links/\x2fdisk\x2fby-label\x2ftank1'
Sep 20 12:32:56 pve systemd-udevd[7007]: zd464p1: Found 'b230:161' claiming '/run/udev/links/\x2fdisk\x2fby-label\x2ftank1'
Sep 20 12:32:56 pve systemd-udevd[7018]: vcsa1: Device (SEQNUM=4630, ACTION=add) processed
Sep 20 12:32:56 pve systemd-udevd[7022]: sdk1: Found 'b8:145' claiming '/run/udev/links/\x2fdisk\x2fby-label\x2ftank1'
Sep 20 12:32:56 pve systemd-udevd[7018]: vcsa1: sd-device-monitor: Passed 186 byte to netlink monitor
Sep 20 12:32:56 pve systemd-udevd[6993]: zd880p1: Found 'b8:17' claiming '/run/udev/links/\x2fdisk\x2fby-label\x2ftank1'
Sep 20 12:32:56 pve systemd-udevd[7242]: zd160p1: Found 'b8:1' claiming '/run/udev/links/\x2fdisk\x2fby-label\x2ftank1'
Sep 20 12:32:56 pve systemd-udevd[7252]: zd480p1: Found 'b230:721' claiming '/run/udev/links/\x2fdisk\x2fby-label\x2ftank1'
Sep 20 12:32:56 pve systemd-udevd[6976]: 21: Device (SEQNUM=4635, ACTION=add) ready for processing
Sep 20 12:32:56 pve systemd-udevd[7022]: sdk1: Found 'b8:1' claiming '/run/udev/links/\x2fdisk\x2fby-label\x2ftank1'
Sep 20 12:32:56 pve systemd-udevd[7249]: 19: sd-device: Created empty file '/run/udev/data/c241:0' for '/devices/virtual/vfio/19'
Sep 20 12:32:56 pve systemd-udevd[7242]: zd160p1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 12:32:56 pve systemd-udevd[7011]: sdc1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 12:32:56 pve systemd-udevd[7020]: zd192p1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 12:32:56 pve systemd-udevd[7007]: zd464p1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 12:32:57 pve systemd-udevd[7252]: zd480p1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 12:32:57 pve systemd-udevd[7243]: zd720p1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 12:32:57 pve systemd-udevd[7022]: sdk1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 12:32:57 pve systemd-udevd[6993]: zd880p1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 12:32:57 pve systemd-udevd[7242]: zd160p1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 12:32:57 pve systemd-udevd[7007]: zd464p1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 12:32:57 pve systemd-udevd[7020]: zd192p1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 12:32:57 pve systemd-udevd[7252]: zd480p1: Failed to update device symlinks: Too many levels of symbolic links
Sep 20 12:33:01 pve systemd-udevd[7021]: Using default interface naming scheme 'v247'.
Sep 20 12:33:01 pve systemd-udevd[7021]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Sep 20 12:33:01 pve systemd-udevd[7021]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Sep 20 12:33:01 pve systemd-udevd[7021]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Sep 20 12:33:01 pve systemd-udevd[7021]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Sep 20 12:33:01 pve systemd-udevd[7021]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Sep 20 12:33:12 pve systemd-udevd[11477]: Using default interface naming scheme 'v247'.
Sep 20 12:33:12 pve systemd-udevd[11477]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
.....
Sep 20 12:37:20 pve systemd-journald[6883]: Suppressed 21507 messages from systemd-udevd.service

It seems there is a fight for linking the device label.
I found a possible related bug report for Debian [1], and one where this bug is reported in systemd [2].

Has anyone seen these messages in their logs too?
Please advise if these messages can be safely ignored (as most mounting and such is done by uuid), and if not, what could be a work around?
I read in the reports that clearing the device label could be a possible work around.

proxmox-ve: 7.0-2 (running kernel: 5.11.22-4-pve)
pve-manager: 7.0-11 (running version: 7.0-11/63d82f4e)
pve-kernel-5.11: 7.0-7
pve-kernel-helper: 7.0-7
pve-kernel-5.4: 6.4-5
pve-kernel-5.11.22-4-pve: 5.11.22-8
pve-kernel-5.11.22-2-pve: 5.11.22-4
pve-kernel-5.4.128-1-pve: 5.4.128-2
ceph-fuse: 14.2.21-1
corosync: 3.1.2-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.21-pve1
libproxmox-acme-perl: 1.3.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-6
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-2
libpve-storage-perl: 7.0-10
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.9-2
proxmox-backup-file-restore: 2.0.9-2
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.0-9
pve-docs: 7.0-5
pve-edk2-firmware: 3.20200531-1
pve-firewall: 4.2-2
pve-firmware: 3.3-1
pve-ha-manager: 3.3-1
pve-i18n: 2.4-1
pve-qemu-kvm: 6.0.0-3
pve-xtermjs: 4.12.0-1
pve-zsync: 2.2
qemu-server: 7.0-13
smartmontools: 7.2-pve2
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=993738
[2] https://github.com/systemd/systemd/issues/20212
 
Last edited:

janssensm

Well-Known Member
Dec 18, 2016
234
81
48
Hmm, I was able to reproduce the messages inside a pve 7 instance installed as nested vm on the same host.
With just 1 vm and 1 ct inside and both not running, there were no messages. So I started cloning the vm multiple times to get more block devices.
Now with 55 block devices (blkid | wc -l) I get the same messages, in one boot they occured just 6 times, another boot 18 times, for different block devices so also a bit random.
On my host I have 133 block devices.

Just for completenes from the nested pve instance, with nosubscription repo:
proxmox-ve: 7.0-2 (running kernel: 5.11.22-4-pve)
pve-manager: 7.0-11 (running version: 7.0-11/63d82f4e)
pve-kernel-5.11: 7.0-7
pve-kernel-helper: 7.0-7
pve-kernel-5.4: 6.4-6
pve-kernel-5.11.22-4-pve: 5.11.22-8
pve-kernel-5.4.140-1-pve: 5.4.140-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve1
libproxmox-acme-perl: 1.3.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-6
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-2
libpve-storage-perl: 7.0-11
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.9-2
proxmox-backup-file-restore: 2.0.9-2
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.0-9
pve-docs: 7.0-5
pve-edk2-firmware: 3.20200531-1
pve-firewall: 4.2-3
pve-firmware: 3.3-1
pve-ha-manager: 3.3-1
pve-i18n: 2.5-1
pve-qemu-kvm: 6.0.0-4
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-13
smartmontools: 7.2-pve2
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1
 
Last edited:

janssensm

Well-Known Member
Dec 18, 2016
234
81
48
A small update.
I was starting to believe that it had something to do with the nested pve instance. In a way it has.
I set up the nested instance with an identical drive setup, so also zfs with pools: rpool and tank1 (same names as on the host).
That means that the virtual drives in the instance are formatted with same LABEL property as on the host and they get a /dev/zdX block device assigned.
Host pool tank1 has 6 drives, rpool has 4 drives, nested instance the same.
Without the nested instance: blkid on host gives 3 times LABEL "rpool", 6 times LABEL "tank1"
With the nested pve instance: blkid on host gives 7 times LABEL "rpool", 12 times LABEL "tank1"
The symlink messages seem to be all linked to LABEL property "tank1".

To rule out that the nested instance is the cause of these messages, I deleted all virtual drives from this instance.
Instead of 133 block devices on the host, now 111.
The messages did not disappear. But the amount of them is a lot less.
With the nested instance (so the extra vdisks), about 24-36 messages.
Without the nested instance, 6-8 messages.

My educated guess:
This is still pointing to the mentioned bug reports.
The reason that the messages are pointing to LABEL tank1 is that there are a lot more block devices with this label.
Perhaps I will test again with extra drives added to rpool.
 

janssensm

Well-Known Member
Dec 18, 2016
234
81
48
another update.
I can rule out that it's related to tank1 pool, it's also reproducible on rpool.

For testing I used a nested pve instance again, but inside this instance just rpool raid10, no other pools configured, and no vm's.
So now on the host I had 7 block devices with LABEL rpool instead of 3 (because of the added zvols, so 4 extra block devices /dev/zdX).
Rebooting the host still gives just 10 udev messages, only linked to tank1 on the host. So nothing interesting yet.

So now I added 10 virtual hard disks (small size) to the nested instance, booted the instance and add 5 extra mirrors with those 10 disks to rpool inside the nested instance.
Shutdown nested instance and reboot host.

Now instead of 7 block devices there are 17, seems right (10 more zvols), with LABEL rpool.
And now there are 58 udev messages, now linked to both tank1 and rpool. But interestingly also linked to rpool zfs members from the host.
To extend the test I added another 10 disks in the same way, so now 27 block devices with LABEL rpool.
After host reboot there are 109 udev messages, the first 12 linked to tank1. The remaining 97 linked to rpool.

Conclusion so far: it seems, the more block devices with the same LABEL property, the more of these messages.
Next step is to reproduce this again inside the nested instance and see if some of the workarounds in the mentioned but reports have effect.
 

janssensm

Well-Known Member
Dec 18, 2016
234
81
48
I'm getting stuck now.

I ruled out that kernel 5.11 is the cause, by executing the same tests on pve 6.4 with kernel 5.11 package installed.
Even with 20 extra disks added in rpool, no messages about failing symlinks.

In my pve test vm I can reproduce the messages. Same behaviour, the more block devices in the same pool, the more messages.
So I tried workarounds mentioned in the bug report.
1. changing the dm udev rules, no or very little effect. The pain is that the udev debug log is hard to decypher, or it's just me not reading right.
2. trying to clear the partition label seems not possible, I think something zfs specific, as the bugreport is about lvm.
parted and gdisk don't see partition LABEL property rpool. But blkid does.
So these workarounds don't apply here.

Now this is blkid output from my pve test vm.
root@pvetest01:~# blkid | grep rpool
/dev/sda3: LABEL="rpool" UUID="15963537336132728091" UUID_SUB="64361347852075173" BLOCK_SIZE="4096" TYPE="zfs_member" PARTUUID="00d8f6a5-5dd2-4c0b-8fb9-ac890729fa4a"
/dev/sdi1: LABEL="rpool" UUID="15963537336132728091" UUID_SUB="2229389602074200712" BLOCK_SIZE="4096" TYPE="zfs_member" PARTLABEL="zfs-5951fc41d1a5f410" PARTUUID="496bc0db-2b12-3946-8923-d55a129de0b9"
/dev/sdh3: LABEL="rpool" UUID="15963537336132728091" UUID_SUB="8526412988097208054" BLOCK_SIZE="4096" TYPE="zfs_member" PARTUUID="50ca1418-2dda-4365-865f-e69ad1b9a46a"
/dev/sdj1: LABEL="rpool" UUID="15963537336132728091" UUID_SUB="11947469194290195371" BLOCK_SIZE="4096" TYPE="zfs_member" PARTLABEL="zfs-abe3d7fc56f952a4" PARTUUID="ecebc962-9470-f142-9d2e-46a2f23b774b"
/dev/zd0p1: LABEL="rpool" UUID="1663152718640802657" UUID_SUB="16430258557809525224" BLOCK_SIZE="4096" TYPE="zfs_member" PARTLABEL="zfs-68e053fe10a062ca" PARTUUID="b427e9b8-223f-604a-ae2e-3ee8a9ee432c"
/dev/zd16p1: LABEL="rpool" UUID="1663152718640802657" UUID_SUB="3465855802208914313" BLOCK_SIZE="4096" TYPE="zfs_member" PARTLABEL="zfs-9a9da348e1d9936b" PARTUUID="0e6b149f-1000-8e42-849b-db8c98e1c0c6"
/dev/zd32p3: LABEL="rpool" UUID="1663152718640802657" UUID_SUB="323646303060921881" BLOCK_SIZE="4096" TYPE="zfs_member" PARTUUID="f029d25b-5702-400c-9d94-e6e44d647e5a"
/dev/zd48p1: LABEL="rpool" UUID="1663152718640802657" UUID_SUB="7628896676516681458" BLOCK_SIZE="4096" TYPE="zfs_member" PARTLABEL="zfs-5374f3b9452a56fb" PARTUUID="8b76dc16-7e92-d540-90fa-218976513578"
/dev/zd64p1: LABEL="rpool" UUID="1663152718640802657" UUID_SUB="2402272218613529280" BLOCK_SIZE="4096" TYPE="zfs_member" PARTLABEL="zfs-c9488a36b9d8b34b" PARTUUID="10d070db-b5df-c14a-bdac-f97b0932c856"
/dev/zd80p1: LABEL="rpool" UUID="1663152718640802657" UUID_SUB="1802517009106966335" BLOCK_SIZE="4096" TYPE="zfs_member" PARTLABEL="zfs-84b9775c9babe75c" PARTUUID="773194e0-af2b-4b48-89a1-735f2c2efffe"
/dev/zd96p1: LABEL="rpool" UUID="1663152718640802657" UUID_SUB="10150065923296955224" BLOCK_SIZE="4096" TYPE="zfs_member" PARTLABEL="zfs-b69a07d7a0d36ce6" PARTUUID="b472f2f5-44e1-5247-991d-1f6c5aa14c4c"
/dev/zd112p1: LABEL="rpool" UUID="1663152718640802657" UUID_SUB="11144845472099510612" BLOCK_SIZE="4096" TYPE="zfs_member" PARTLABEL="zfs-72897e390b645da4" PARTUUID="3403d4c5-eeb9-5e43-a7f7-83aab2f74e4d"
/dev/zd128p3: LABEL="rpool" UUID="1663152718640802657" UUID_SUB="6490435610840024359" BLOCK_SIZE="4096" TYPE="zfs_member" PARTUUID="71ddca6c-1ddc-4afe-9c60-0a0b6c0c03e8"
/dev/zd144p1: LABEL="rpool" UUID="1663152718640802657" UUID_SUB="11793495064852435837" BLOCK_SIZE="4096" TYPE="zfs_member" PARTLABEL="zfs-761a330ee090bdbf" PARTUUID="16791151-3708-dc4b-946a-4d1e30233e6f"
/dev/zd160p1: LABEL="rpool" UUID="1663152718640802657" UUID_SUB="1828916353884903494" BLOCK_SIZE="4096" TYPE="zfs_member" PARTLABEL="zfs-62a7423dd9bc6a2f" PARTUUID="83349eb9-28f3-244a-8e63-a4bc260bf958"
/dev/zd176p1: LABEL="rpool" UUID="1663152718640802657" UUID_SUB="10790391302021209705" BLOCK_SIZE="4096" TYPE="zfs_member" PARTLABEL="zfs-9ef85df4cac6e1a8" PARTUUID="131d10f9-6449-a341-8dec-d55a2e8a5984"
/dev/zd192p1: LABEL="rpool" UUID="1663152718640802657" UUID_SUB="3600882472946922795" BLOCK_SIZE="4096" TYPE="zfs_member" PARTLABEL="zfs-0da77ba3b076c4ab" PARTUUID="3e1ca865-a6ed-d742-809b-8831637bdb2a"
/dev/zd208p1: LABEL="rpool" UUID="1663152718640802657" UUID_SUB="3993893083999477790" BLOCK_SIZE="4096" TYPE="zfs_member" PARTLABEL="zfs-d1768516c33ca81d" PARTUUID="e0ec525d-6b0f-2f4a-8fd0-b7c0e20f80c5"
/dev/zd224p1: LABEL="rpool" UUID="1663152718640802657" UUID_SUB="17679669644136509984" BLOCK_SIZE="4096" TYPE="zfs_member" PARTLABEL="zfs-0ed4f9083d9880b8" PARTUUID="6f68df29-ddc8-9b45-8396-823bc52220a1"
/dev/zd240p1: LABEL="rpool" UUID="1663152718640802657" UUID_SUB="1198089305393809664" BLOCK_SIZE="4096" TYPE="zfs_member" PARTLABEL="zfs-49761a84ee37d39d" PARTUUID="69481546-ea37-0346-abcd-ae16737bc105"
/dev/zd256p1: LABEL="rpool" UUID="1663152718640802657" UUID_SUB="2175727136936245016" BLOCK_SIZE="4096" TYPE="zfs_member" PARTLABEL="zfs-f91ea2727b756fdb" PARTUUID="2c25a19b-87a9-d94a-8506-44a605213d6b"
/dev/zd272p1: LABEL="rpool" UUID="1663152718640802657" UUID_SUB="5034861954546063597" BLOCK_SIZE="4096" TYPE="zfs_member" PARTLABEL="zfs-1f9abf0d3168c670" PARTUUID="22e925e2-453c-434d-9d7b-14807d92f092"
/dev/zd288p1: LABEL="rpool" UUID="1663152718640802657" UUID_SUB="1530538509635275866" BLOCK_SIZE="4096" TYPE="zfs_member" PARTLABEL="zfs-45c561cc46da6fee" PARTUUID="54da6589-8537-2a46-95a2-4e3111844281"
/dev/zd304p1: LABEL="rpool" UUID="1663152718640802657" UUID_SUB="5300682650130785178" BLOCK_SIZE="4096" TYPE="zfs_member" PARTLABEL="zfs-4e10f5c78fb7ef81" PARTUUID="2b63d73a-4bf1-7647-b1d9-526188a6293c"
/dev/zd320p1: LABEL="rpool" UUID="1663152718640802657" UUID_SUB="1422483502213026549" BLOCK_SIZE="4096" TYPE="zfs_member" PARTLABEL="zfs-87239f27eb8568f4" PARTUUID="eafccf63-ba71-cb4d-adb6-64b3e9efccae"
/dev/zd336p1: LABEL="rpool" UUID="1663152718640802657" UUID_SUB="13095444920378993911" BLOCK_SIZE="4096" TYPE="zfs_member" PARTUUID="51e1a565-3f3d-174f-9720-023522639637"
/dev/zd352p1: LABEL="rpool" UUID="1663152718640802657" UUID_SUB="18267015628681465075" BLOCK_SIZE="4096" TYPE="zfs_member" PARTLABEL="zfs-6275fee8224f4ba6" PARTUUID="e61c9ad0-6b5b-fb47-9b66-897e1d07ee37"
/dev/zd368p1: LABEL="rpool" UUID="1663152718640802657" UUID_SUB="10279324953265170538" BLOCK_SIZE="4096" TYPE="zfs_member" PARTLABEL="zfs-d52e6ed4a58528ad" PARTUUID="a6b54e85-1b29-094e-bab1-ef21ead50430"

And I see that not only LABEL rpool is set on every disk, but also the same UUID. (1 uuid for vm rpool, 1 uuid for nestedvm with 20 extra disks)
This too appears to be something zfs specific, as you can find the uuid via zpool get guid rpool
In the udev debug log I also saw a lot of messages claiming for a UUID symlink, about half the number as for claiming label symlinks.

So far I did not run this host in full production, still minimal, 1 vm, 1 ct.
If this is still linked to the mentioned systemd udev bug report, it appears that 2 commits are being reverted by the debian systemd package maintainer.
But it's hard to say when this fix will arrive in stable bullseye repo.

Perhaps one of the staff members could chime in and advise if my host system is still ready for production.
This host is on pve-enterprise repo.
 

janssensm

Well-Known Member
Dec 18, 2016
234
81
48
So after pulling some more hairs out (not many left :rolleyes:) I was able to reproduce above scenarios on my laptop which is more recent but has just 2 cores 2 HT instead of 32 cores from my tests above. It's Intel instead of AMD and a more recent one (i7-7500U).
I'm not very familiar with udev, but found out that I could trigger the messages with command udevadm trigger, during boot there are more messages, but this is a nice extra for testing faster.
I also kept an eye on systemd-analyze blame | grep udev during boot.

By coincidence I once set the number of cores for the test vm very low and saw that the messages almost disappeared.
And systemctl status systemd-udevd | grep children gives the max allocated number of children. This number is calculated by (total number cpu)*2+16 so for my laptop with 4 cores, that's 24 children max.
Perhaps here is a relation I can control by setting the children_max setting in /etc/udev/udev.conf.
Conclusion after testing this on both laptop and server:

Setting children_max=2 makes all failing udev symlink messages disappear.
Tested during boot and with udevadm trigger.
So this is a possible workaround.
systemd-analyze blame | grep udev times also got lower. No errors in journal.
Now I wonder, does udev really need that much children?
In other words, is it wise to set this value this low?

I'm also wondering, now I can reproduce this on 2 different systems, why nobody else seems to have these messages.
 

janssensm

Well-Known Member
Dec 18, 2016
234
81
48
After testing on my affected host I decided to go with children_max=4.
That's the highest value on this system where the symlink failing messages stop appearing, during boot as well as with udevadm trigger.
For now I haven't seen any issues, boot time got somewhat faster, udev is indeed somewhat faster, but nothing is slower or timing out.

In my search I found an interesting discussion [1]
Seems that the default value for children_max is not suited for every system, and that is why the default has been changed over time.
But still in my case I see this as a workaround for now, perhaps it's time for me to consult the debian systemd package maintainer.

[1] https://lore.kernel.org/linux-lvm/dd395e820a4cf9b5e807e8de1e456786eebd2044.camel@suse.com/
 

janssensm

Well-Known Member
Dec 18, 2016
234
81
48
It's been a while. But yesterday Debian Bullseye point release 11.3 was released.
https://www.debian.org/News/2022/20220326

Including an update for udev/systemd version 247.3-7. Changelog says.
* Revert multipath symlink race fix.
Revert upstream commits which caused a regression in udev resulting in
long delays when processing partitions with the same label.
(Closes: #993738)
So I reverted my udev.conf setting for maxchildren and did a reboot to check if the issue was still there.
And I could confirm it was still present. A lot of these messages again.

So I did apt full-upgrade to install all updates for debian 11.3. Reboot.
No more messages for symbolic links. So this seems fixed now.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!