ZFS issues with disk images after reboot

tlex

Member
Mar 9, 2021
92
8
13
42
I do have a ZFS pool (R1_1.6TB_SSD_EVO860) where I store my vm os disk images.
Recently, after I had to reboot the host, some vm were not starting (while others from the same zfs pool did) and I got the following log events in syslog (see below).
"timeout: no zvol device link for 'vm-1002-disk-0' found after 300 sec found."

The thing is I can see the disk images in the gui and some vms from the same zpool are booting without any issue.
I tried rebooting again and some times one of the vm will find it's way mount the disks some other times it wont.
The only way I could "workaround" this problem is by detaching the problematic disks from the vm and re-attach them. After that the vm will boot.
But that workaround does not seem to be persistent after a reboot.
The only difference the problematic vm had over the time was that a few weeks ago, I had to migrate temporarly the disk images to another zfs pool before migrating them back. At the time, everything went smoothly.
Anybody had the same problems ? ideas ?
Physical disks seems to be sane and I didn't see any zfs error
In the logs below, the problematic disk image is "R1_1.6TB_SSD_EVO860/vm-1002-disk-0"

tail -f /var/log/syslog Mar 11 06:25:40 pve zvol_wait[4215]: R1_1.6TB_SSD_EVO860/vm-1002-disk-0 Mar 11 06:25:40 pve zvol_wait[4215]: R1_1.6TB_SSD_EVO860/vm-1004-disk-0 Mar 11 06:25:40 pve zvol_wait[4215]: RZ2-2_5-8_2TB/vm-1002-disk-0 Mar 11 06:26:10 pve zvol_wait[4215]: Still waiting on 3 zvol links ... Mar 11 06:26:10 pve zvol_wait[4215]: No progress since last loop. Mar 11 06:26:10 pve zvol_wait[4215]: Checking if any zvols were deleted. Mar 11 06:26:10 pve zvol_wait[4215]: Remaining zvols: Mar 11 06:26:10 pve zvol_wait[4215]: R1_1.6TB_SSD_EVO860/vm-1002-disk-0 Mar 11 06:26:10 pve zvol_wait[4215]: R1_1.6TB_SSD_EVO860/vm-1004-disk-0 Mar 11 06:26:10 pve zvol_wait[4215]: RZ2-2_5-8_2TB/vm-1002-disk-0 Mar 11 06:27:19 pve pve-guests[16140]: timeout: no zvol device link for 'vm-1002-disk-0' found after 300 sec found.
zfs get mounted,mountpoint,canmount NAME PROPERTY VALUE SOURCE R1_1.6TB_SSD_EVO860 mounted yes - R1_1.6TB_SSD_EVO860 mountpoint /R1_1.6TB_SSD_EVO860 default R1_1.6TB_SSD_EVO860 canmount on default R1_1.6TB_SSD_EVO860/base-2000-disk-0 mounted - - R1_1.6TB_SSD_EVO860/base-2000-disk-0 mountpoint - - R1_1.6TB_SSD_EVO860/base-2000-disk-0 canmount - - R1_1.6TB_SSD_EVO860/base-2000-disk-0@__base__ mounted - - R1_1.6TB_SSD_EVO860/base-2000-disk-0@__base__ mountpoint - - R1_1.6TB_SSD_EVO860/base-2000-disk-0@__base__ canmount - - R1_1.6TB_SSD_EVO860/subvol-100-disk-0 mounted yes - R1_1.6TB_SSD_EVO860/subvol-100-disk-0 mountpoint /R1_1.6TB_SSD_EVO860/subvol-100-disk-0 default R1_1.6TB_SSD_EVO860/subvol-100-disk-0 canmount on default R1_1.6TB_SSD_EVO860/subvol-101-disk-0 mounted yes - R1_1.6TB_SSD_EVO860/subvol-101-disk-0 mountpoint /R1_1.6TB_SSD_EVO860/subvol-101-disk-0 default R1_1.6TB_SSD_EVO860/subvol-101-disk-0 canmount on default R1_1.6TB_SSD_EVO860/vm-1001-disk-0 mounted - - R1_1.6TB_SSD_EVO860/vm-1001-disk-0 mountpoint - - R1_1.6TB_SSD_EVO860/vm-1001-disk-0 canmount - - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 mounted - - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 mountpoint - - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 canmount - - R1_1.6TB_SSD_EVO860/vm-1003-disk-0 mounted - - R1_1.6TB_SSD_EVO860/vm-1003-disk-0 mountpoint - - R1_1.6TB_SSD_EVO860/vm-1003-disk-0 canmount - - R1_1.6TB_SSD_EVO860/vm-1004-disk-0 mounted - - R1_1.6TB_SSD_EVO860/vm-1004-disk-0 mountpoint - - R1_1.6TB_SSD_EVO860/vm-1004-disk-0 canmount - - R1_1.6TB_SSD_EVO860/vm-1004-disk-1 mounted - - R1_1.6TB_SSD_EVO860/vm-1004-disk-1 mountpoint - - R1_1.6TB_SSD_EVO860/vm-1004-disk-1 canmount - - R1_1.6TB_SSD_EVO860/vm-1005-disk-0 mounted - - R1_1.6TB_SSD_EVO860/vm-1005-disk-0 mountpoint - - R1_1.6TB_SSD_EVO860/vm-1005-disk-0 canmount - - R1_1.6TB_SSD_EVO860/vm-102-disk-0 mounted - - R1_1.6TB_SSD_EVO860/vm-102-disk-0 mountpoint - - R1_1.6TB_SSD_EVO860/vm-102-disk-0 canmount - -
zfs list NAME USED AVAIL REFER MOUNTPOINT R1_1.6TB_SSD_EVO860 273G 1.31T 96K /R1_1.6TB_SSD_EVO860 R1_1.6TB_SSD_EVO860/base-2000-disk-0 57.9G 1.35T 16.6G - R1_1.6TB_SSD_EVO860/subvol-100-disk-0 1.19G 830M 1.19G /R1_1.6TB_SSD_EVO860/subvol-100-disk-0 R1_1.6TB_SSD_EVO860/subvol-101-disk-0 1.22G 6.78G 1.22G /R1_1.6TB_SSD_EVO860/subvol-101-disk-0 R1_1.6TB_SSD_EVO860/vm-1001-disk-0 33.0G 1.31T 30.6G - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 61.9G 1.34T 33.8G - R1_1.6TB_SSD_EVO860/vm-1003-disk-0 33.0G 1.34T 4.59G - R1_1.6TB_SSD_EVO860/vm-1004-disk-0 3M 1.31T 192K - R1_1.6TB_SSD_EVO860/vm-1004-disk-1 33.0G 1.33T 8.42G - R1_1.6TB_SSD_EVO860/vm-1005-disk-0 10.3G 1.31T 4.72G - R1_1.6TB_SSD_EVO860/vm-102-disk-0 41.3G 1.33T 16.4G - RZ2-1_1-4_4TB 419G 6.51T 140K /RZ2-1_1-4_4TB RZ2-1_1-4_4TB/vm-1003-disk-0 55.3G 6.53T 36.7G - RZ2-1_1-4_4TB/vm-1003-disk-1 363G 6.51T 363G - RZ2-2_5-8_2TB 2.16T 1.25T 140K /RZ2-2_5-8_2TB RZ2-2_5-8_2TB/vm-1002-disk-0 2.16T 1.94T 1.47T - find /dev | grep 1002 /dev/RZ2-2_5-8_2TB/vm-1002-disk-0-part2 /dev/RZ2-2_5-8_2TB/vm-1002-disk-0-part1 /dev/RZ2-2_5-8_2TB/vm-1002-disk-0 /dev/R1_1.6TB_SSD_EVO860/vm-1002-disk-0-part1 /dev/R1_1.6TB_SSD_EVO860/vm-1002-disk-0 /dev/R1_1.6TB_SSD_EVO860/vm-1002-disk-0-part3 /dev/R1_1.6TB_SSD_EVO860/vm-1002-disk-0-part2 /dev/zvol/RZ2-2_5-8_2TB/vm-1002-disk-0-part2 /dev/zvol/RZ2-2_5-8_2TB/vm-1002-disk-0-part1 /dev/zvol/RZ2-2_5-8_2TB/vm-1002-disk-0 /dev/zvol/R1_1.6TB_SSD_EVO860/vm-1002-disk-0-part1 /dev/zvol/R1_1.6TB_SSD_EVO860/vm-1002-disk-0 /dev/zvol/R1_1.6TB_SSD_EVO860/vm-1002-disk-0-part3 /dev/zvol/R1_1.6TB_SSD_EVO860/vm-1002-disk-0-part2

zpool list NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT R1_1.6TB_SSD_EVO860 1.62T 118G 1.51T - - 16% 7% 1.00x ONLINE - RZ2-1_1-4_4TB 14.5T 826G 13.7T - - 0% 5% 1.00x ONLINE - RZ2-2_5-8_2TB 7.27T 3.04T 4.23T - - 2% 41% 1.00x ONLINE -

zpool status -v R1_1.6TB_SSD_EVO860 pool: R1_1.6TB_SSD_EVO860 state: ONLINE scan: scrub repaired 0B in 00:02:40 with 0 errors on Sun Feb 13 00:26:41 2022 config: NAME STATE READ WRITE CKSUM R1_1.6TB_SSD_EVO860 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ata-Samsung_SSD_860_EVO_2TB_S597NJ0NB19827A ONLINE 0 0 0 ata-Samsung_SSD_860_EVO_2TB_S597NJ0NB19834W ONLINE 0 0 0 errors: No known data errors

nano /etc/pve/qemu-server/1002.conf agent: 1 bios: ovmf boot: order=scsi0 cores: 16 hotplug: disk,network,usb,memory,cpu machine: pc-q35-6.0 memory: 16384 name: BlueIris net0: virtio=HIDDEN,bridge=vmbr0 numa: 1 onboot: 1 ostype: win10 scsi0: R1_1.6TB_SSD_EVO860:vm-1002-disk-0,size=60G scsi1: RZ2-2_5-8_2TB:vm-1002-disk-0,backup=0,size=2000G scsihw: virtio-scsi-pci smbios1: uuid=HIDDEN sockets: 1 startup: order=3,up=0 vga: memory=8 vmgenid: HIDDEN

zfs-storage.png


vmdisks.png
 
Latest updates installed today and that's how I saw the problem.

pveversion -v proxmox-ve: 7.1-1 (running kernel: 5.13.19-6-pve) pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe) pve-kernel-helper: 7.1-12 pve-kernel-5.13: 7.1-9 pve-kernel-5.13.19-6-pve: 5.13.19-14 pve-kernel-5.13.19-5-pve: 5.13.19-13 pve-kernel-5.13.19-4-pve: 5.13.19-9 ceph-fuse: 15.2.15-pve1 corosync: 3.1.5-pve2 criu: 3.15-1+pve-1 glusterfs-client: 9.2-1 ifupdown2: 3.1.0-1+pmx3 ksm-control-daemon: 1.4-1 libjs-extjs: 7.0.0-1 libknet1: 1.22-pve2 libproxmox-acme-perl: 1.4.1 libproxmox-backup-qemu0: 1.2.0-1 libpve-access-control: 7.1-6 libpve-apiclient-perl: 3.2-1 libpve-common-perl: 7.1-3 libpve-guest-common-perl: 4.1-1 libpve-http-server-perl: 4.1-1 libpve-storage-perl: 7.1-1 libspice-server1: 0.14.3-2.1 lvm2: 2.03.11-2.1 lxc-pve: 4.0.11-1 lxcfs: 4.0.11-pve1 novnc-pve: 1.3.0-2 proxmox-backup-client: 2.1.5-1 proxmox-backup-file-restore: 2.1.5-1 proxmox-mini-journalreader: 1.3-1 proxmox-widget-toolkit: 3.4-7 pve-cluster: 7.1-3 pve-container: 4.1-4 pve-docs: 7.1-2 pve-edk2-firmware: 3.20210831-2 pve-firewall: 4.2-5 pve-firmware: 3.3-5 pve-ha-manager: 3.3-3 pve-i18n: 2.6-2 pve-qemu-kvm: 6.1.1-2 pve-xtermjs: 4.16.0-1 qemu-server: 7.1-4 smartmontools: 7.2-1 spiceterm: 3.2-2 swtpm: 0.7.1~bpo11+1 vncterm: 1.7-1 zfsutils-linux: 2.1.2-pve1
 
hmm - tried to reproduce the issue here - but without success...
could you:
* paste the output (in code tags) of `zfs get all R1_1.6TB_SSD_EVO860/vm-1002-disk-0`
* try installing `pve-kernel-5.15` to see if this changes anything
* attach your journal since booting (journalctl -b)

do you have any customized/non-default setting on the system/the zpools/the zvols?
 
same problem : (and no I don't have anything customized/nondefault for the zpool/system/zvols
zfs get all R1_1.6TB_SSD_EVO860/vm-1002-disk-0 NAME PROPERTY VALUE SOURCE R1_1.6TB_SSD_EVO860/vm-1002-disk-0 type volume - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 creation Wed Mar 2 15:55 2022 - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 used 61.9G - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 available 1.34T - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 referenced 33.8G - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 compressratio 1.00x - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 reservation none default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 volsize 60G local R1_1.6TB_SSD_EVO860/vm-1002-disk-0 volblocksize 8K default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 checksum on default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 compression off inherited from R1_1.6TB_SSD_EVO860 R1_1.6TB_SSD_EVO860/vm-1002-disk-0 readonly off default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 createtxg 381946 - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 copies 1 default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 refreservation 61.9G local R1_1.6TB_SSD_EVO860/vm-1002-disk-0 guid 3858109402526119173 - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 primarycache all default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 secondarycache all default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 usedbysnapshots 0B - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 usedbydataset 33.8G - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 usedbychildren 0B - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 usedbyrefreservation 28.1G - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 logbias latency default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 objsetid 121040 - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 dedup off default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 mlslabel none default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 sync disabled inherited from R1_1.6TB_SSD_EVO860 R1_1.6TB_SSD_EVO860/vm-1002-disk-0 refcompressratio 1.00x - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 written 33.8G - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 logicalused 33.6G - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 logicalreferenced 33.6G - R1_1.6TB_SSD_EVO860/vm-1002-disk-0 volmode default default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 snapshot_limit none default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 snapshot_count none default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 snapdev hidden default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 context none default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 fscontext none default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 defcontext none default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 rootcontext none default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 redundant_metadata all default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 encryption off default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 keylocation none default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 keyformat none default R1_1.6TB_SSD_EVO860/vm-1002-disk-0 pbkdf2iters 0 default

pve-kernel-5.15 installed
pveversion -v proxmox-ve: 7.1-1 (running kernel: 5.15.19-2-pve) pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe) pve-kernel-helper: 7.1-12 pve-kernel-5.15: 7.1-11 pve-kernel-5.13: 7.1-9 pve-kernel-5.15.19-2-pve: 5.15.19-3 pve-kernel-5.13.19-6-pve: 5.13.19-14 pve-kernel-5.13.19-5-pve: 5.13.19-13 pve-kernel-5.13.19-4-pve: 5.13.19-9 ceph-fuse: 15.2.15-pve1 corosync: 3.1.5-pve2 criu: 3.15-1+pve-1 glusterfs-client: 9.2-1 ifupdown2: 3.1.0-1+pmx3 ksm-control-daemon: 1.4-1 libjs-extjs: 7.0.0-1 libknet1: 1.22-pve2 libproxmox-acme-perl: 1.4.1 libproxmox-backup-qemu0: 1.2.0-1 libpve-access-control: 7.1-6 libpve-apiclient-perl: 3.2-1 libpve-common-perl: 7.1-3 libpve-guest-common-perl: 4.1-1 libpve-http-server-perl: 4.1-1 libpve-storage-perl: 7.1-1 libspice-server1: 0.14.3-2.1 lvm2: 2.03.11-2.1 lxc-pve: 4.0.11-1 lxcfs: 4.0.11-pve1 novnc-pve: 1.3.0-2 proxmox-backup-client: 2.1.5-1 proxmox-backup-file-restore: 2.1.5-1 proxmox-mini-journalreader: 1.3-1 proxmox-widget-toolkit: 3.4-7 pve-cluster: 7.1-3 pve-container: 4.1-4 pve-docs: 7.1-2 pve-edk2-firmware: 3.20210831-2 pve-firewall: 4.2-5 pve-firmware: 3.3-5 pve-ha-manager: 3.3-3 pve-i18n: 2.6-2 pve-qemu-kvm: 6.1.1-2 pve-xtermjs: 4.16.0-1 qemu-server: 7.1-4 smartmontools: 7.2-1 spiceterm: 3.2-2 swtpm: 0.7.1~bpo11+1 vncterm: 1.7-1 zfsutils-linux: 2.1.2-pve1
reboot
 

Attachments

  • journal.txt
    173 KB · Views: 2
After another reboot, one of the vm that didn't boot at the previous reboot is now booting but I have the same issue with other vm using the same zpool.

zvol_wait
Testing 11 zvol links Still waiting on 1 zvol links ... Still waiting on 1 zvol links ... No progress since last loop. Checking if any zvols were deleted. Remaining zvols: RZ2-2_5-8_2TB/vm-1002-disk-0 Still waiting on 1 zvol links ... No progress since last loop. Checking if any zvols were deleted. Remaining zvols: RZ2-2_5-8_2TB/vm-1002-disk-0 Still waiting on 1 zvol links ... No progress since last loop. Checking if any zvols were deleted. Remaining zvols: RZ2-2_5-8_2TB/vm-1002-disk-0
 
Last edited:
hm - could you run the following:
Code:
for i in $(ls -1  /dev/zd* |grep -v '/dev/zd[0-9]*p[0-9]*'); do echo $i;  /lib/udev/zvol_id $i ; done  |grep -B1 <name of missing zvol>
(replace <name of missing zvol> by the name of a zvol which has the problem (in your previous outputs: R1_1.6TB_SSD_EVO860/vm-1002-disk-0)

EDIT: if the above produces output - e.g. /dev/zd128
please also run
Code:
udevadm trigger  /dev/zd128
and check if /dev/zvol/R1_1.6TB_SSD_EVO860/vm-1002-disk-0 exists
(easiest done by running `zvol_wait`)

additionally - could you please try booting an older kernel:
pve-kernel-5.13.19-3-pve (or older)

Thanks!
 
Last edited:
hm - could you run the following:
Code:
for i in $(ls -1  /dev/zd* |grep -v '/dev/zd[0-9]*p[0-9]*'); do echo $i;  /lib/udev/zvol_id $i ; done  |grep -B1 <name of missing zvol>
(replace <name of missing zvol> by the name of a zvol which has the problem (in your previous outputs: R1_1.6TB_SSD_EVO860/vm-1002-disk-0)

additionally - could you please try booting an older kernel:
pve-kernel-5.13.19-3-pve (or older)

Thanks!
for i in $(ls -1 /dev/zd* |grep -v '/dev/zd[0-9]*p[0-9]*'); do echo $i; /lib/udev/zvol_id $i ; done |grep -B1 RZ2-2_5-8_2TB/vm-1002-disk-0
/dev/zd0
RZ2-2_5-8_2TB/vm-1002-disk-0
 
sorry - you were too fast for me:
EDIT: if the above produces output - e.g. /dev/zd128
please also run
Code:
udevadm trigger /dev/zd128
and check if /dev/zvol/R1_1.6TB_SSD_EVO860/vm-1002-disk-0 exists
(easiest done by running `zvol_wait`)
 
for i in $(ls -1 /dev/zd* |grep -v '/dev/zd[0-9]*p[0-9]*'); do echo $i; /lib/udev/zvol_id $i ; done |grep -B1 RZ2-2_5-8_2TB/vm-1002-disk-0
/dev/zd0
RZ2-2_5-8_2TB/vm-1002-disk-0
root@pve:~# udevadm trigger /dev/zd128
root@pve:~# udevadm trigger /dev/zd0
root@pve:~# ls /dev/zvol/R1_1.6TB_SSD_EVO860
base-2000-disk-0 vm-1001-disk-0 vm-1002-disk-0 vm-1003-disk-0-part5 vm-1004-disk-1-part3 vm-102-disk-0 base-2000-disk-0-part1 vm-1001-disk-0-part1 vm-1002-disk-0-part3 vm-1004-disk-0 vm-1005-disk-0 vm-102-disk-0-part1 base-2000-disk-0-part2 vm-1001-disk-0-part2 vm-1003-disk-0 vm-1004-disk-1 vm-1005-disk-0-part1 vm-102-disk-0-part2 base-2000-disk-0-part3 vm-1001-disk-0-part3 vm-1003-disk-0-part1 vm-1004-disk-1-part1 vm-1005-disk-0-part2 vm-102-disk-0-part3 base-2000-disk-0-part4 vm-1001-disk-0-part4 vm-1003-disk-0-part2 vm-1004-disk-1-part2 vm-1005-disk-0-part3 vm-102-disk-0-part4


so in a second window I still had zpool_wait that was running and loop on RZ2-2_5-8_2TB/vm-1002-disk-0 like this :
Still waiting on 1 zvol links ... No progress since last loop. Checking if any zvols were deleted. Remaining zvols: RZ2-2_5-8_2TB/vm-1002-disk-0 Still waiting on 1 zvol links ... No progress since last loop. Checking if any zvols were deleted. Remaining zvols: RZ2-2_5-8_2TB/vm-1002-disk-0 Still waiting on 1 zvol links ... No progress since last loop. Checking if any zvols were deleted. Remaining zvols: RZ2-2_5-8_2TB/vm-1002-disk-0

As soon as this timed out I was able to start a problematic VM and the vm can access the disks from that pool without any issue.

But strange enough now it's a different Zpool than earlier...
 
Hmm - seems like the issue is similar to https://bugzilla.proxmox.com/show_bug.cgi?id=3572
(which boils down to an issue in systemd)

But strange enough now it's a different Zpool than earlier...
If I understand the issue correctly it's got little to do with the zpool - but is rather an issue that udev messes up - resulting in the symlinks not getting create (this is non-deterministic)

Unless the issue completely goes away with an older kernel (pve-kernel-5.13.19-3-pve (or older))
currently I think that the following might help:
* try running (after a reboot): `systemctl restart systemd-udevd; udevadm trigger; udevadm settle` - then start your vms
* if this does not help - I think running:
Code:
for i in $(ls -1 /dev/zd* |grep -v '/dev/zd[0-9]*p[0-9]*'); do udevadm trigger $i; done
should work
 
Hmm - seems like the issue is similar to https://bugzilla.proxmox.com/show_bug.cgi?id=3572
(which boils down to an issue in systemd)


If I understand the issue correctly it's got little to do with the zpool - but is rather an issue that udev messes up - resulting in the symlinks not getting create (this is non-deterministic)

Unless the issue completely goes away with an older kernel (pve-kernel-5.13.19-3-pve (or older))
currently I think that the following might help:
* try running (after a reboot): `systemctl restart systemd-udevd; udevadm trigger; udevadm settle` - then start your vms
* if this does not help - I think running:
Code:
for i in $(ls -1 /dev/zd* |grep -v '/dev/zd[0-9]*p[0-9]*'); do udevadm trigger $i; done
should work
ok I will give it a try.
In the meantime, would you mind just telling me how to revert to the previous kernel and remove pve-kernel-5.15 ?
 
Hmm - seems like the issue is similar to https://bugzilla.proxmox.com/show_bug.cgi?id=3572
(which boils down to an issue in systemd)


If I understand the issue correctly it's got little to do with the zpool - but is rather an issue that udev messes up - resulting in the symlinks not getting create (this is non-deterministic)

Unless the issue completely goes away with an older kernel (pve-kernel-5.13.19-3-pve (or older))
currently I think that the following might help:
* try running (after a reboot): `systemctl restart systemd-udevd; udevadm trigger; udevadm settle` - then start your vms
* if this does not help - I think running:
Code:
for i in $(ls -1 /dev/zd* |grep -v '/dev/zd[0-9]*p[0-9]*'); do udevadm trigger $i; done
should work
So while running the following code is working temporarly to fix the problem I was wondering if anything could be done to fix this completely or if any fix would be created in the near future to avoid running the command at every reboot ? Should a bug report be openned ? I'm just asking because I would like to know where to look for any updates regarding that while I put in place a temporary solution on my side..

Thanks for your help determining the problem btw :)

Code:
for i in $(ls -1 /dev/zd* |grep -v '/dev/zd[0-9]*p[0-9]*'); do udevadm trigger $i; done
 
I was wondering if there was any chance that this could be fixed in the near future ?
For now I don't know if this is the best thing to do but I added a cron job at each boot to run :
for i in $(ls -1 /dev/zd* |grep -v '/dev/zd[0-9]*p[0-9]*'); do udevadm trigger $i; done
If I dont, my vms dont start.
I see that the similar issue "https://bugzilla.proxmox.com/show_bug.cgi?id=3572" was fixed by installing libpve-storage-perl >= 7.0-12 but this is not the case here since I already run 7.1-1.

Any help / suggestion appreciated :)
 
Just updated to latest Proxmox ( 7.2-11) and started having same issue. The command described above will fix the issue for this boot.
 
Same problem here:
proxmox-ve: 7.2-1 (running kernel: 5.15.64-1-pve)
pve-manager: 7.2-11 (running version: 7.2-11/b76d3178)
pve-kernel-5.15: 7.2-13
pve-kernel-helper: 7.2-13
pve-kernel-5.15.64-1-pve: 5.15.64-1
pve-kernel-5.15.60-2-pve: 5.15.60-2
pve-kernel-5.15.60-1-pve: 5.15.60-1
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-3
libpve-guest-common-perl: 4.1-4
libpve-http-server-perl: 4.1-4
libpve-storage-perl: 7.2-10
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.7-1
proxmox-backup-file-restore: 2.2.7-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-2
pve-container: 4.2-3
pve-docs: 7.2-2
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-6
pve-firmware: 3.5-6
pve-ha-manager: 3.4.0
pve-i18n: 2.7-2
pve-qemu-kvm: 7.0.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.6-pve1


Hope it gets fixed soon...

kind regards
 
After today installation of proxmox + os updates we have the similar situation in our cluster.
One of VM reported TASK ERROR: timeout: no zvol device link for 'vm-121-disk-0' found after 300 sec found.

Code:
for i in $(ls -1  /dev/zd* |grep -v '/dev/zd[0-9]*p[0-9]*'); do echo $i;  /lib/udev/zvol_id $i ; done  |grep -B1 <name of missing zvol>
returned nothing

Code:
zvol_wait
Testing 18 zvol links
Still waiting on 1 zvol links ...
Still waiting on 1 zvol links ...
No progress since last loop.
Checking if any zvols were deleted.
Remaining zvols:
ZFSHDD/vm-121-disk-0
Still waiting on 1 zvol links ...
No progress since last loop.
Checking if any zvols were deleted.
Remaining zvols:
ZFSHDD/vm-121-disk-0
^C

Hopefully, after reboot everything was fine.

For debug purposes, pveversion bellow

Code:
proxmox-ve: 7.2-1 (running kernel: 5.15.74-1-pve)
pve-manager: 7.2-14 (running version: 7.2-14/65898fbc)
pve-kernel-5.15: 7.2-14
pve-kernel-helper: 7.2-14
pve-kernel-5.13: 7.1-9
pve-kernel-5.11: 7.0-10
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.15.60-2-pve: 5.15.60-2
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-4-pve: 5.11.22-9
ceph-fuse: 15.2.14-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-7
libpve-guest-common-perl: 4.2-2
libpve-http-server-perl: 4.1-5
libpve-network-perl: 0.7.2
libpve-storage-perl: 7.2-12
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
openvswitch-switch: 2.15.0+ds1-2+deb11u1
proxmox-backup-client: 2.2.7-1
proxmox-backup-file-restore: 2.2.7-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.0-1
proxmox-widget-toolkit: 3.5.2
pve-cluster: 7.2-3
pve-container: 4.3-4
pve-docs: 7.2-3
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.5-6
pve-ha-manager: 3.4.0
pve-i18n: 2.7-2
pve-qemu-kvm: 7.1.0-3
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-10
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.6-pve1
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!