ZFS issues with disk images after reboot

tlex

Member
Mar 9, 2021
103
14
23
43
I do have a ZFS pool (R1_1.6TB_SSD_EVO860) where I store my vm os disk images.
Recently, after I had to reboot the host, some vm were not starting (while others from the same zfs pool did) and I got the following log events in syslog (see below).
"timeout: no zvol device link for 'vm-1002-disk-0' found after 300 sec found."

The thing is I can see the disk images in the gui and some vms from the same zpool are booting without any issue.
I tried rebooting again and some times one of the vm will find it's way mount the disks some other times it wont.
The only way I could "workaround" this problem is by detaching the problematic disks from the vm and re-attach them. After that the vm will boot.
But that workaround does not seem to be persistent after a reboot.
The only difference the problematic vm had over the time was that a few weeks ago, I had to migrate temporarly the disk images to another zfs pool before migrating them back. At the time, everything went smoothly.
Anybody had the same problems ? ideas ?
Physical disks seems to be sane and I didn't see any zfs error
In the logs below, the problematic disk image is "R1_1.6TB_SSD_EVO860/vm-1002-disk-0"

tail -f /var/log/syslog Mar 11 06:25:40 pve zvol_wait[4215]: R1_1.6TB_SSD_EVO860/vm-1002-disk-0 Mar 11 06:25:40 pve zvol_wait[4215]: R1_1.6TB_SSD_EVO860/vm-1004-disk-0 Mar 11 06:25:40 pve zvol_wait[4215]: RZ2-2_5-8_2TB/vm-1002-disk-0 Mar 11 06:26:10 pve zvol_wait[4215]: Still waiting on 3 zvol links ... Mar 11 06:26:10 pve zvol_wait[4215]: No progress since last loop. Mar 11 06:26:10 pve zvol_wait[4215]: Checking if any zvols were deleted. Mar 11 06:26:10 pve zvol_wait[4215]: Remaining zvols: Mar 11 06:26:10 pve zvol_wait[4215]: R1_1.6TB_SSD_EVO860/vm-1002-disk-0 Mar 11 06:26:10 pve zvol_wait[4215]: R1_1.6TB_SSD_EVO860/vm-1004-disk-0 Mar 11 06:26:10 pve zvol_wait[4215]: RZ2-2_5-8_2TB/vm-1002-disk-0 Mar 11 06:27:19 pve pve-guests[16140]: timeout: no zvol device link for 'vm-1002-disk-0' found after 300 sec found.
zfs get mounted,mountpoint,canmount NAME PROPERTY VALUE SOURCE R1_1.6TB_SSD_EVO860 mounted yes - R1_1.6TB_SSD_EVO860 mountpoint /R1_1.6TB_SSD_EVO860 default R1_1.6TB_SSD_EVO860 canmount on default R1_1.6TB_SSD_EVO860/base-2000-disk-0 mounted - - R1_1.6TB_SSD_EVO860/base-2000-disk-0 mountpoint - - R1_1.6TB_SSD_EVO860/base-2000-disk-0 canmount - - R1_1.6TB_SSD_EVO860/base-2000-disk-0@__base__ mounted - - R1_1.6TB_SSD_EVO860/base-2000-disk-0@__base__ mountpoint - - R1_1.6TB_SSD_EVO860/base-2000-disk-0@__base__ canmount - - R1_1.6TB_SSD_EVO860/subvol-100-disk-0 mounted yes - R1_1.6TB_SSD_EVO860/subvol-100-disk-0 mountpoint /R1_1.6TB_SSD_EVO860/subvol-100-disk-0 default R1_1.6TB_SSD_EVO860/subvol-100-disk-0 canmount on default R1_1.6TB_SSD_EVO860/subvol-101-disk-0 mounted yes - R1_1.6TB_SSD_EVO860/subvol-101-disk-0 mountpoint /R1_1.6TB_SSD_EVO860/subvol-101-disk-0 default R1_1.6TB_SSD_EVO860/subvol-101-disk-0 canmount on default R1_1.6TB_SSD_EVO860/vm-1001-disk-0 mounted - - R1_1.6TB_SSD_EVO860/vm-1001-disk-0 mountpoint - - R1_1.6TB_SSD_EVO860/vm-1001-disk-0 canmount - - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 mounted - - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 mountpoint - - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 canmount - - R1_1.6TB_SSD_EVO860/vm-1003-disk-0 mounted - - R1_1.6TB_SSD_EVO860/vm-1003-disk-0 mountpoint - - R1_1.6TB_SSD_EVO860/vm-1003-disk-0 canmount - - R1_1.6TB_SSD_EVO860/vm-1004-disk-0 mounted - - R1_1.6TB_SSD_EVO860/vm-1004-disk-0 mountpoint - - R1_1.6TB_SSD_EVO860/vm-1004-disk-0 canmount - - R1_1.6TB_SSD_EVO860/vm-1004-disk-1 mounted - - R1_1.6TB_SSD_EVO860/vm-1004-disk-1 mountpoint - - R1_1.6TB_SSD_EVO860/vm-1004-disk-1 canmount - - R1_1.6TB_SSD_EVO860/vm-1005-disk-0 mounted - - R1_1.6TB_SSD_EVO860/vm-1005-disk-0 mountpoint - - R1_1.6TB_SSD_EVO860/vm-1005-disk-0 canmount - - R1_1.6TB_SSD_EVO860/vm-102-disk-0 mounted - - R1_1.6TB_SSD_EVO860/vm-102-disk-0 mountpoint - - R1_1.6TB_SSD_EVO860/vm-102-disk-0 canmount - -
zfs list NAME USED AVAIL REFER MOUNTPOINT R1_1.6TB_SSD_EVO860 273G 1.31T 96K /R1_1.6TB_SSD_EVO860 R1_1.6TB_SSD_EVO860/base-2000-disk-0 57.9G 1.35T 16.6G - R1_1.6TB_SSD_EVO860/subvol-100-disk-0 1.19G 830M 1.19G /R1_1.6TB_SSD_EVO860/subvol-100-disk-0 R1_1.6TB_SSD_EVO860/subvol-101-disk-0 1.22G 6.78G 1.22G /R1_1.6TB_SSD_EVO860/subvol-101-disk-0 R1_1.6TB_SSD_EVO860/vm-1001-disk-0 33.0G 1.31T 30.6G - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 61.9G 1.34T 33.8G - R1_1.6TB_SSD_EVO860/vm-1003-disk-0 33.0G 1.34T 4.59G - R1_1.6TB_SSD_EVO860/vm-1004-disk-0 3M 1.31T 192K - R1_1.6TB_SSD_EVO860/vm-1004-disk-1 33.0G 1.33T 8.42G - R1_1.6TB_SSD_EVO860/vm-1005-disk-0 10.3G 1.31T 4.72G - R1_1.6TB_SSD_EVO860/vm-102-disk-0 41.3G 1.33T 16.4G - RZ2-1_1-4_4TB 419G 6.51T 140K /RZ2-1_1-4_4TB RZ2-1_1-4_4TB/vm-1003-disk-0 55.3G 6.53T 36.7G - RZ2-1_1-4_4TB/vm-1003-disk-1 363G 6.51T 363G - RZ2-2_5-8_2TB 2.16T 1.25T 140K /RZ2-2_5-8_2TB RZ2-2_5-8_2TB/vm-1002-disk-0 2.16T 1.94T 1.47T - find /dev | grep 1002 /dev/RZ2-2_5-8_2TB/vm-1002-disk-0-part2 /dev/RZ2-2_5-8_2TB/vm-1002-disk-0-part1 /dev/RZ2-2_5-8_2TB/vm-1002-disk-0 /dev/R1_1.6TB_SSD_EVO860/vm-1002-disk-0-part1 /dev/R1_1.6TB_SSD_EVO860/vm-1002-disk-0 /dev/R1_1.6TB_SSD_EVO860/vm-1002-disk-0-part3 /dev/R1_1.6TB_SSD_EVO860/vm-1002-disk-0-part2 /dev/zvol/RZ2-2_5-8_2TB/vm-1002-disk-0-part2 /dev/zvol/RZ2-2_5-8_2TB/vm-1002-disk-0-part1 /dev/zvol/RZ2-2_5-8_2TB/vm-1002-disk-0 /dev/zvol/R1_1.6TB_SSD_EVO860/vm-1002-disk-0-part1 /dev/zvol/R1_1.6TB_SSD_EVO860/vm-1002-disk-0 /dev/zvol/R1_1.6TB_SSD_EVO860/vm-1002-disk-0-part3 /dev/zvol/R1_1.6TB_SSD_EVO860/vm-1002-disk-0-part2

zpool list NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT R1_1.6TB_SSD_EVO860 1.62T 118G 1.51T - - 16% 7% 1.00x ONLINE - RZ2-1_1-4_4TB 14.5T 826G 13.7T - - 0% 5% 1.00x ONLINE - RZ2-2_5-8_2TB 7.27T 3.04T 4.23T - - 2% 41% 1.00x ONLINE -

zpool status -v R1_1.6TB_SSD_EVO860 pool: R1_1.6TB_SSD_EVO860 state: ONLINE scan: scrub repaired 0B in 00:02:40 with 0 errors on Sun Feb 13 00:26:41 2022 config: NAME STATE READ WRITE CKSUM R1_1.6TB_SSD_EVO860 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ata-Samsung_SSD_860_EVO_2TB_S597NJ0NB19827A ONLINE 0 0 0 ata-Samsung_SSD_860_EVO_2TB_S597NJ0NB19834W ONLINE 0 0 0 errors: No known data errors

nano /etc/pve/qemu-server/1002.conf agent: 1 bios: ovmf boot: order=scsi0 cores: 16 hotplug: disk,network,usb,memory,cpu machine: pc-q35-6.0 memory: 16384 name: BlueIris net0: virtio=HIDDEN,bridge=vmbr0 numa: 1 onboot: 1 ostype: win10 scsi0: R1_1.6TB_SSD_EVO860:vm-1002-disk-0,size=60G scsi1: RZ2-2_5-8_2TB:vm-1002-disk-0,backup=0,size=2000G scsihw: virtio-scsi-pci smbios1: uuid=HIDDEN sockets: 1 startup: order=3,up=0 vga: memory=8 vmgenid: HIDDEN

zfs-storage.png


vmdisks.png
 
Latest updates installed today and that's how I saw the problem.

pveversion -v proxmox-ve: 7.1-1 (running kernel: 5.13.19-6-pve) pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe) pve-kernel-helper: 7.1-12 pve-kernel-5.13: 7.1-9 pve-kernel-5.13.19-6-pve: 5.13.19-14 pve-kernel-5.13.19-5-pve: 5.13.19-13 pve-kernel-5.13.19-4-pve: 5.13.19-9 ceph-fuse: 15.2.15-pve1 corosync: 3.1.5-pve2 criu: 3.15-1+pve-1 glusterfs-client: 9.2-1 ifupdown2: 3.1.0-1+pmx3 ksm-control-daemon: 1.4-1 libjs-extjs: 7.0.0-1 libknet1: 1.22-pve2 libproxmox-acme-perl: 1.4.1 libproxmox-backup-qemu0: 1.2.0-1 libpve-access-control: 7.1-6 libpve-apiclient-perl: 3.2-1 libpve-common-perl: 7.1-3 libpve-guest-common-perl: 4.1-1 libpve-http-server-perl: 4.1-1 libpve-storage-perl: 7.1-1 libspice-server1: 0.14.3-2.1 lvm2: 2.03.11-2.1 lxc-pve: 4.0.11-1 lxcfs: 4.0.11-pve1 novnc-pve: 1.3.0-2 proxmox-backup-client: 2.1.5-1 proxmox-backup-file-restore: 2.1.5-1 proxmox-mini-journalreader: 1.3-1 proxmox-widget-toolkit: 3.4-7 pve-cluster: 7.1-3 pve-container: 4.1-4 pve-docs: 7.1-2 pve-edk2-firmware: 3.20210831-2 pve-firewall: 4.2-5 pve-firmware: 3.3-5 pve-ha-manager: 3.3-3 pve-i18n: 2.6-2 pve-qemu-kvm: 6.1.1-2 pve-xtermjs: 4.16.0-1 qemu-server: 7.1-4 smartmontools: 7.2-1 spiceterm: 3.2-2 swtpm: 0.7.1~bpo11+1 vncterm: 1.7-1 zfsutils-linux: 2.1.2-pve1
 
hmm - tried to reproduce the issue here - but without success...
could you:
* paste the output (in code tags) of `zfs get all R1_1.6TB_SSD_EVO860/vm-1002-disk-0`
* try installing `pve-kernel-5.15` to see if this changes anything
* attach your journal since booting (journalctl -b)

do you have any customized/non-default setting on the system/the zpools/the zvols?
 
same problem : (and no I don't have anything customized/nondefault for the zpool/system/zvols
zfs get all R1_1.6TB_SSD_EVO860/vm-1002-disk-0 NAME PROPERTY VALUE SOURCE R1_1.6TB_SSD_EVO860/vm-1002-disk-0 type volume - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 creation Wed Mar 2 15:55 2022 - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 used 61.9G - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 available 1.34T - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 referenced 33.8G - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 compressratio 1.00x - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 reservation none default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 volsize 60G local R1_1.6TB_SSD_EVO860/vm-1002-disk-0 volblocksize 8K default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 checksum on default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 compression off inherited from R1_1.6TB_SSD_EVO860 R1_1.6TB_SSD_EVO860/vm-1002-disk-0 readonly off default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 createtxg 381946 - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 copies 1 default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 refreservation 61.9G local R1_1.6TB_SSD_EVO860/vm-1002-disk-0 guid 3858109402526119173 - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 primarycache all default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 secondarycache all default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 usedbysnapshots 0B - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 usedbydataset 33.8G - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 usedbychildren 0B - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 usedbyrefreservation 28.1G - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 logbias latency default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 objsetid 121040 - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 dedup off default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 mlslabel none default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 sync disabled inherited from R1_1.6TB_SSD_EVO860 R1_1.6TB_SSD_EVO860/vm-1002-disk-0 refcompressratio 1.00x - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 written 33.8G - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 logicalused 33.6G - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 logicalreferenced 33.6G - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 volmode default default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 snapshot_limit none default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 snapshot_count none default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 snapdev hidden default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 context none default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 fscontext none default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 defcontext none default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 rootcontext none default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 redundant_metadata all default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 encryption off default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 keylocation none default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 keyformat none default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 pbkdf2iters 0 default

pve-kernel-5.15 installed
pveversion -v proxmox-ve: 7.1-1 (running kernel: 5.15.19-2-pve) pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe) pve-kernel-helper: 7.1-12 pve-kernel-5.15: 7.1-11 pve-kernel-5.13: 7.1-9 pve-kernel-5.15.19-2-pve: 5.15.19-3 pve-kernel-5.13.19-6-pve: 5.13.19-14 pve-kernel-5.13.19-5-pve: 5.13.19-13 pve-kernel-5.13.19-4-pve: 5.13.19-9 ceph-fuse: 15.2.15-pve1 corosync: 3.1.5-pve2 criu: 3.15-1+pve-1 glusterfs-client: 9.2-1 ifupdown2: 3.1.0-1+pmx3 ksm-control-daemon: 1.4-1 libjs-extjs: 7.0.0-1 libknet1: 1.22-pve2 libproxmox-acme-perl: 1.4.1 libproxmox-backup-qemu0: 1.2.0-1 libpve-access-control: 7.1-6 libpve-apiclient-perl: 3.2-1 libpve-common-perl: 7.1-3 libpve-guest-common-perl: 4.1-1 libpve-http-server-perl: 4.1-1 libpve-storage-perl: 7.1-1 libspice-server1: 0.14.3-2.1 lvm2: 2.03.11-2.1 lxc-pve: 4.0.11-1 lxcfs: 4.0.11-pve1 novnc-pve: 1.3.0-2 proxmox-backup-client: 2.1.5-1 proxmox-backup-file-restore: 2.1.5-1 proxmox-mini-journalreader: 1.3-1 proxmox-widget-toolkit: 3.4-7 pve-cluster: 7.1-3 pve-container: 4.1-4 pve-docs: 7.1-2 pve-edk2-firmware: 3.20210831-2 pve-firewall: 4.2-5 pve-firmware: 3.3-5 pve-ha-manager: 3.3-3 pve-i18n: 2.6-2 pve-qemu-kvm: 6.1.1-2 pve-xtermjs: 4.16.0-1 qemu-server: 7.1-4 smartmontools: 7.2-1 spiceterm: 3.2-2 swtpm: 0.7.1~bpo11+1 vncterm: 1.7-1 zfsutils-linux: 2.1.2-pve1
reboot
 

Attachments

After another reboot, one of the vm that didn't boot at the previous reboot is now booting but I have the same issue with other vm using the same zpool.

zvol_wait
Testing 11 zvol links Still waiting on 1 zvol links ... Still waiting on 1 zvol links ... No progress since last loop. Checking if any zvols were deleted. Remaining zvols: RZ2-2_5-8_2TB/vm-1002-disk-0 Still waiting on 1 zvol links ... No progress since last loop. Checking if any zvols were deleted. Remaining zvols: RZ2-2_5-8_2TB/vm-1002-disk-0 Still waiting on 1 zvol links ... No progress since last loop. Checking if any zvols were deleted. Remaining zvols: RZ2-2_5-8_2TB/vm-1002-disk-0
 
Last edited:
hm - could you run the following:
Code:
for i in $(ls -1  /dev/zd* |grep -v '/dev/zd[0-9]*p[0-9]*'); do echo $i;  /lib/udev/zvol_id $i ; done  |grep -B1 <name of missing zvol>
(replace <name of missing zvol> by the name of a zvol which has the problem (in your previous outputs: R1_1.6TB_SSD_EVO860/vm-1002-disk-0)

EDIT: if the above produces output - e.g. /dev/zd128
please also run
Code:
udevadm trigger  /dev/zd128
and check if /dev/zvol/R1_1.6TB_SSD_EVO860/vm-1002-disk-0 exists
(easiest done by running `zvol_wait`)

additionally - could you please try booting an older kernel:
pve-kernel-5.13.19-3-pve (or older)

Thanks!
 
Last edited:
hm - could you run the following:
Code:
for i in $(ls -1  /dev/zd* |grep -v '/dev/zd[0-9]*p[0-9]*'); do echo $i;  /lib/udev/zvol_id $i ; done  |grep -B1 <name of missing zvol>
(replace <name of missing zvol> by the name of a zvol which has the problem (in your previous outputs: R1_1.6TB_SSD_EVO860/vm-1002-disk-0)

additionally - could you please try booting an older kernel:
pve-kernel-5.13.19-3-pve (or older)

Thanks!
for i in $(ls -1 /dev/zd* |grep -v '/dev/zd[0-9]*p[0-9]*'); do echo $i; /lib/udev/zvol_id $i ; done |grep -B1 RZ2-2_5-8_2TB/vm-1002-disk-0
/dev/zd0
RZ2-2_5-8_2TB/vm-1002-disk-0
 
sorry - you were too fast for me:
EDIT: if the above produces output - e.g. /dev/zd128
please also run
Code:
udevadm trigger /dev/zd128
and check if /dev/zvol/R1_1.6TB_SSD_EVO860/vm-1002-disk-0 exists
(easiest done by running `zvol_wait`)
 
for i in $(ls -1 /dev/zd* |grep -v '/dev/zd[0-9]*p[0-9]*'); do echo $i; /lib/udev/zvol_id $i ; done |grep -B1 RZ2-2_5-8_2TB/vm-1002-disk-0
/dev/zd0
RZ2-2_5-8_2TB/vm-1002-disk-0
root@pve:~# udevadm trigger /dev/zd128
root@pve:~# udevadm trigger /dev/zd0
root@pve:~# ls /dev/zvol/R1_1.6TB_SSD_EVO860
base-2000-disk-0 vm-1001-disk-0 vm-1002-disk-0 vm-1003-disk-0-part5 vm-1004-disk-1-part3 vm-102-disk-0 base-2000-disk-0-part1 vm-1001-disk-0-part1 vm-1002-disk-0-part3 vm-1004-disk-0 vm-1005-disk-0 vm-102-disk-0-part1 base-2000-disk-0-part2 vm-1001-disk-0-part2 vm-1003-disk-0 vm-1004-disk-1 vm-1005-disk-0-part1 vm-102-disk-0-part2 base-2000-disk-0-part3 vm-1001-disk-0-part3 vm-1003-disk-0-part1 vm-1004-disk-1-part1 vm-1005-disk-0-part2 vm-102-disk-0-part3 base-2000-disk-0-part4 vm-1001-disk-0-part4 vm-1003-disk-0-part2 vm-1004-disk-1-part2 vm-1005-disk-0-part3 vm-102-disk-0-part4


so in a second window I still had zpool_wait that was running and loop on RZ2-2_5-8_2TB/vm-1002-disk-0 like this :
Still waiting on 1 zvol links ... No progress since last loop. Checking if any zvols were deleted. Remaining zvols: RZ2-2_5-8_2TB/vm-1002-disk-0 Still waiting on 1 zvol links ... No progress since last loop. Checking if any zvols were deleted. Remaining zvols: RZ2-2_5-8_2TB/vm-1002-disk-0 Still waiting on 1 zvol links ... No progress since last loop. Checking if any zvols were deleted. Remaining zvols: RZ2-2_5-8_2TB/vm-1002-disk-0

As soon as this timed out I was able to start a problematic VM and the vm can access the disks from that pool without any issue.

But strange enough now it's a different Zpool than earlier...
 
Hmm - seems like the issue is similar to https://bugzilla.proxmox.com/show_bug.cgi?id=3572
(which boils down to an issue in systemd)

But strange enough now it's a different Zpool than earlier...
If I understand the issue correctly it's got little to do with the zpool - but is rather an issue that udev messes up - resulting in the symlinks not getting create (this is non-deterministic)

Unless the issue completely goes away with an older kernel (pve-kernel-5.13.19-3-pve (or older))
currently I think that the following might help:
* try running (after a reboot): `systemctl restart systemd-udevd; udevadm trigger; udevadm settle` - then start your vms
* if this does not help - I think running:
Code:
for i in $(ls -1 /dev/zd* |grep -v '/dev/zd[0-9]*p[0-9]*'); do udevadm trigger $i; done
should work
 
Hmm - seems like the issue is similar to https://bugzilla.proxmox.com/show_bug.cgi?id=3572
(which boils down to an issue in systemd)


If I understand the issue correctly it's got little to do with the zpool - but is rather an issue that udev messes up - resulting in the symlinks not getting create (this is non-deterministic)

Unless the issue completely goes away with an older kernel (pve-kernel-5.13.19-3-pve (or older))
currently I think that the following might help:
* try running (after a reboot): `systemctl restart systemd-udevd; udevadm trigger; udevadm settle` - then start your vms
* if this does not help - I think running:
Code:
for i in $(ls -1 /dev/zd* |grep -v '/dev/zd[0-9]*p[0-9]*'); do udevadm trigger $i; done
should work
ok I will give it a try.
In the meantime, would you mind just telling me how to revert to the previous kernel and remove pve-kernel-5.15 ?
 
Hmm - seems like the issue is similar to https://bugzilla.proxmox.com/show_bug.cgi?id=3572
(which boils down to an issue in systemd)


If I understand the issue correctly it's got little to do with the zpool - but is rather an issue that udev messes up - resulting in the symlinks not getting create (this is non-deterministic)

Unless the issue completely goes away with an older kernel (pve-kernel-5.13.19-3-pve (or older))
currently I think that the following might help:
* try running (after a reboot): `systemctl restart systemd-udevd; udevadm trigger; udevadm settle` - then start your vms
* if this does not help - I think running:
Code:
for i in $(ls -1 /dev/zd* |grep -v '/dev/zd[0-9]*p[0-9]*'); do udevadm trigger $i; done
should work
So while running the following code is working temporarly to fix the problem I was wondering if anything could be done to fix this completely or if any fix would be created in the near future to avoid running the command at every reboot ? Should a bug report be openned ? I'm just asking because I would like to know where to look for any updates regarding that while I put in place a temporary solution on my side..

Thanks for your help determining the problem btw :)

Code:
for i in $(ls -1 /dev/zd* |grep -v '/dev/zd[0-9]*p[0-9]*'); do udevadm trigger $i; done
 
I was wondering if there was any chance that this could be fixed in the near future ?
For now I don't know if this is the best thing to do but I added a cron job at each boot to run :
for i in $(ls -1 /dev/zd* |grep -v '/dev/zd[0-9]*p[0-9]*'); do udevadm trigger $i; done
If I dont, my vms dont start.
I see that the similar issue "https://bugzilla.proxmox.com/show_bug.cgi?id=3572" was fixed by installing libpve-storage-perl >= 7.0-12 but this is not the case here since I already run 7.1-1.

Any help / suggestion appreciated :)
 
Just updated to latest Proxmox ( 7.2-11) and started having same issue. The command described above will fix the issue for this boot.
 
Same problem here:
proxmox-ve: 7.2-1 (running kernel: 5.15.64-1-pve)
pve-manager: 7.2-11 (running version: 7.2-11/b76d3178)
pve-kernel-5.15: 7.2-13
pve-kernel-helper: 7.2-13
pve-kernel-5.15.64-1-pve: 5.15.64-1
pve-kernel-5.15.60-2-pve: 5.15.60-2
pve-kernel-5.15.60-1-pve: 5.15.60-1
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-3
libpve-guest-common-perl: 4.1-4
libpve-http-server-perl: 4.1-4
libpve-storage-perl: 7.2-10
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.7-1
proxmox-backup-file-restore: 2.2.7-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-2
pve-container: 4.2-3
pve-docs: 7.2-2
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-6
pve-firmware: 3.5-6
pve-ha-manager: 3.4.0
pve-i18n: 2.7-2
pve-qemu-kvm: 7.0.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.6-pve1


Hope it gets fixed soon...

kind regards
 
After today installation of proxmox + os updates we have the similar situation in our cluster.
One of VM reported TASK ERROR: timeout: no zvol device link for 'vm-121-disk-0' found after 300 sec found.

Code:
for i in $(ls -1  /dev/zd* |grep -v '/dev/zd[0-9]*p[0-9]*'); do echo $i;  /lib/udev/zvol_id $i ; done  |grep -B1 <name of missing zvol>
returned nothing

Code:
zvol_wait
Testing 18 zvol links
Still waiting on 1 zvol links ...
Still waiting on 1 zvol links ...
No progress since last loop.
Checking if any zvols were deleted.
Remaining zvols:
ZFSHDD/vm-121-disk-0
Still waiting on 1 zvol links ...
No progress since last loop.
Checking if any zvols were deleted.
Remaining zvols:
ZFSHDD/vm-121-disk-0
^C

Hopefully, after reboot everything was fine.

For debug purposes, pveversion bellow

Code:
proxmox-ve: 7.2-1 (running kernel: 5.15.74-1-pve)
pve-manager: 7.2-14 (running version: 7.2-14/65898fbc)
pve-kernel-5.15: 7.2-14
pve-kernel-helper: 7.2-14
pve-kernel-5.13: 7.1-9
pve-kernel-5.11: 7.0-10
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.15.60-2-pve: 5.15.60-2
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-4-pve: 5.11.22-9
ceph-fuse: 15.2.14-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-7
libpve-guest-common-perl: 4.2-2
libpve-http-server-perl: 4.1-5
libpve-network-perl: 0.7.2
libpve-storage-perl: 7.2-12
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
openvswitch-switch: 2.15.0+ds1-2+deb11u1
proxmox-backup-client: 2.2.7-1
proxmox-backup-file-restore: 2.2.7-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.0-1
proxmox-widget-toolkit: 3.5.2
pve-cluster: 7.2-3
pve-container: 4.3-4
pve-docs: 7.2-3
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.5-6
pve-ha-manager: 3.4.0
pve-i18n: 2.7-2
pve-qemu-kvm: 7.1.0-3
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-10
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.6-pve1