[SOLVED] disk/by-id Changing Randomly on USB Mass Storage Device

tths

Member
Oct 16, 2020
4
1
8
37
Hi,

I have a backup system which is a USB3 HDD bay for 5 HDDs.
The 5 HDDs had been added to the system as a ZFS raidz-1.
On the proxmox machine is a script running which is turning on the power supply of the HDD bay 5 minutes before the backup starts.
Then the device was added with the following commands:
Bash:
/usr/sbin/zpool import BACKUP
/usr/sbin/zfs mount BACKUP
/usr/sbin/pvesm add dir BACKUP -path /BACKUP -content backup

This worked perfectly for the past 2 years.
Since I updated proxmox around a month ago, the zfs was failing, at first I thought one of the HDDs is faulty.
So I bought a new one and tried to replace it, just to figure out that zfs was complaining about another HDD.
Long story short, I connected the HDD bay to my windows machine and check all HDDs with HDDscan.
Everything worked perfectly fine (just that afterwards all data on the backup was gone).
So I went back to proxmox and connected the device again.
The I figured out, that randomly one of the devices has the wrong name.
So if I restart the HDD Bay, one of the HDDs has suddenly a differen id.
1677811581032.png
Also under /dev/disk/by-id the ID is wrong.
One of the devices is getting the USB id as an identifier.
Since this is happening randomly (probably to the HDD which is spinning up first!?) the device identifier is not reliable.
1677811465266.png

Proxmox: 7.3-6
Kernel: Linux pvetmy1 5.15.85-1-pve #1 SMP PVE 5.15.85-1 (2023-02-01T00:00Z) x86_64 GNU/Linux

How can I fix this behavior?
How can I ensure the HDD is getting the correct device id, and not the device if of the USB bay?

Thank you.
 
  • Like
Reactions: Slyer_CH
Hi,

I found the issue.
It seems that the udev tool ata_id is randomly failing:
Code:
sdi: /usr/lib/udev/rules.d/60-persistent-storage.rules:60 Importing properties from results of 'ata_id --export /dev/sdi'
sdi: Starting 'ata_id --export /dev/sdi'
Successfully forked off '(spawn)' as PID 1164637.
sdi: Process 'ata_id --export /dev/sdi' failed with exit code 2.

This causes udev to take the id of the USB-SCSI converter.
I tried to add a loop to the udev command, but strangely it is also sometimes running without output and error code 0 if it is started by udev. (doesn't happen if I run it manually after the device is already connected)
So running it in a loop until it is not failing was also not helping.
So the only solution is to sleep at the beginning and then run it in a loop.
This seems to work, so it seems that the exit code 0 is just happening sometimes after ata_id is executed to early.

My current solution is to add a new rule files with the id 59, which is executing the following batch script:
Code:
#!/bin/bash

DevPath=$1

sleep 3
while : ; do
  /lib/udev/ata_id --export $DevPath
  if [ $? = 0 ]; then
    break
  fi
  echo "# extra round"
  sleep 1
done