ZFS issues with disk images after reboot

tlex · Mar 11, 2022

I do have a ZFS pool (R1_1.6TB_SSD_EVO860) where I store my vm os disk images.
Recently, after I had to reboot the host, some vm were not starting (while others from the same zfs pool did) and I got the following log events in syslog (see below).
"timeout: no zvol device link for 'vm-1002-disk-0' found after 300 sec found."

The thing is I can see the disk images in the gui and some vms from the same zpool are booting without any issue.
I tried rebooting again and some times one of the vm will find it's way mount the disks some other times it wont.
The only way I could "workaround" this problem is by detaching the problematic disks from the vm and re-attach them. After that the vm will boot.
But that workaround does not seem to be persistent after a reboot.
The only difference the problematic vm had over the time was that a few weeks ago, I had to migrate temporarly the disk images to another zfs pool before migrating them back. At the time, everything went smoothly.
Anybody had the same problems ? ideas ?
Physical disks seems to be sane and I didn't see any zfs error
In the logs below, the problematic disk image is "R1_1.6TB_SSD_EVO860/vm-1002-disk-0"

tail -f /var/log/syslog
Mar 11 06:25:40 pve zvol_wait[4215]: R1_1.6TB_SSD_EVO860/vm-1002-disk-0
Mar 11 06:25:40 pve zvol_wait[4215]: R1_1.6TB_SSD_EVO860/vm-1004-disk-0
Mar 11 06:25:40 pve zvol_wait[4215]: RZ2-2_5-8_2TB/vm-1002-disk-0
Mar 11 06:26:10 pve zvol_wait[4215]: Still waiting on 3 zvol links ...
Mar 11 06:26:10 pve zvol_wait[4215]: No progress since last loop.
Mar 11 06:26:10 pve zvol_wait[4215]: Checking if any zvols were deleted.
Mar 11 06:26:10 pve zvol_wait[4215]: Remaining zvols:
Mar 11 06:26:10 pve zvol_wait[4215]: R1_1.6TB_SSD_EVO860/vm-1002-disk-0
Mar 11 06:26:10 pve zvol_wait[4215]: R1_1.6TB_SSD_EVO860/vm-1004-disk-0
Mar 11 06:26:10 pve zvol_wait[4215]: RZ2-2_5-8_2TB/vm-1002-disk-0
Mar 11 06:27:19 pve pve-guests[16140]: timeout: no zvol device link for 'vm-1002-disk-0' found after 300 sec found.

zfs get mounted,mountpoint,canmount
NAME                                           PROPERTY    VALUE                                   SOURCE
R1_1.6TB_SSD_EVO860                            mounted     yes                                     -
R1_1.6TB_SSD_EVO860                            mountpoint  /R1_1.6TB_SSD_EVO860                    default
R1_1.6TB_SSD_EVO860                            canmount    on                                      default
R1_1.6TB_SSD_EVO860/base-2000-disk-0           mounted     -                                       -
R1_1.6TB_SSD_EVO860/base-2000-disk-0           mountpoint  -                                       -
R1_1.6TB_SSD_EVO860/base-2000-disk-0           canmount    -                                       -
R1_1.6TB_SSD_EVO860/base-2000-disk-0@__base__  mounted     -                                       -
R1_1.6TB_SSD_EVO860/base-2000-disk-0@__base__  mountpoint  -                                       -
R1_1.6TB_SSD_EVO860/base-2000-disk-0@__base__  canmount    -                                       -
R1_1.6TB_SSD_EVO860/subvol-100-disk-0          mounted     yes                                     -
R1_1.6TB_SSD_EVO860/subvol-100-disk-0          mountpoint  /R1_1.6TB_SSD_EVO860/subvol-100-disk-0  default
R1_1.6TB_SSD_EVO860/subvol-100-disk-0          canmount    on                                      default
R1_1.6TB_SSD_EVO860/subvol-101-disk-0          mounted     yes                                     -
R1_1.6TB_SSD_EVO860/subvol-101-disk-0          mountpoint  /R1_1.6TB_SSD_EVO860/subvol-101-disk-0  default
R1_1.6TB_SSD_EVO860/subvol-101-disk-0          canmount    on                                      default
R1_1.6TB_SSD_EVO860/vm-1001-disk-0             mounted     -                                       -
R1_1.6TB_SSD_EVO860/vm-1001-disk-0             mountpoint  -                                       -
R1_1.6TB_SSD_EVO860/vm-1001-disk-0             canmount    -                                       -
R1_1.6TB_SSD_EVO860/vm-1002-disk-0             mounted     -                                       -
R1_1.6TB_SSD_EVO860/vm-1002-disk-0             mountpoint  -                                       -
R1_1.6TB_SSD_EVO860/vm-1002-disk-0             canmount    -                                       -
R1_1.6TB_SSD_EVO860/vm-1003-disk-0             mounted     -                                       -
R1_1.6TB_SSD_EVO860/vm-1003-disk-0             mountpoint  -                                       -
R1_1.6TB_SSD_EVO860/vm-1003-disk-0             canmount    -                                       -
R1_1.6TB_SSD_EVO860/vm-1004-disk-0             mounted     -                                       -
R1_1.6TB_SSD_EVO860/vm-1004-disk-0             mountpoint  -                                       -
R1_1.6TB_SSD_EVO860/vm-1004-disk-0             canmount    -                                       -
R1_1.6TB_SSD_EVO860/vm-1004-disk-1             mounted     -                                       -
R1_1.6TB_SSD_EVO860/vm-1004-disk-1             mountpoint  -                                       -
R1_1.6TB_SSD_EVO860/vm-1004-disk-1             canmount    -                                       -
R1_1.6TB_SSD_EVO860/vm-1005-disk-0             mounted     -                                       -
R1_1.6TB_SSD_EVO860/vm-1005-disk-0             mountpoint  -                                       -
R1_1.6TB_SSD_EVO860/vm-1005-disk-0             canmount    -                                       -
R1_1.6TB_SSD_EVO860/vm-102-disk-0              mounted     -                                       -
R1_1.6TB_SSD_EVO860/vm-102-disk-0              mountpoint  -                                       -
R1_1.6TB_SSD_EVO860/vm-102-disk-0              canmount    -                                       -

zfs list
NAME                                    USED  AVAIL     REFER  MOUNTPOINT
R1_1.6TB_SSD_EVO860                     273G  1.31T       96K  /R1_1.6TB_SSD_EVO860
R1_1.6TB_SSD_EVO860/base-2000-disk-0   57.9G  1.35T     16.6G  -
R1_1.6TB_SSD_EVO860/subvol-100-disk-0  1.19G   830M     1.19G  /R1_1.6TB_SSD_EVO860/subvol-100-disk-0
R1_1.6TB_SSD_EVO860/subvol-101-disk-0  1.22G  6.78G     1.22G  /R1_1.6TB_SSD_EVO860/subvol-101-disk-0
R1_1.6TB_SSD_EVO860/vm-1001-disk-0     33.0G  1.31T     30.6G  -
R1_1.6TB_SSD_EVO860/vm-1002-disk-0     61.9G  1.34T     33.8G  -
R1_1.6TB_SSD_EVO860/vm-1003-disk-0     33.0G  1.34T     4.59G  -
R1_1.6TB_SSD_EVO860/vm-1004-disk-0        3M  1.31T      192K  -
R1_1.6TB_SSD_EVO860/vm-1004-disk-1     33.0G  1.33T     8.42G  -
R1_1.6TB_SSD_EVO860/vm-1005-disk-0     10.3G  1.31T     4.72G  -
R1_1.6TB_SSD_EVO860/vm-102-disk-0      41.3G  1.33T     16.4G  -
RZ2-1_1-4_4TB                           419G  6.51T      140K  /RZ2-1_1-4_4TB
RZ2-1_1-4_4TB/vm-1003-disk-0           55.3G  6.53T     36.7G  -
RZ2-1_1-4_4TB/vm-1003-disk-1            363G  6.51T      363G  -
RZ2-2_5-8_2TB                          2.16T  1.25T      140K  /RZ2-2_5-8_2TB
RZ2-2_5-8_2TB/vm-1002-disk-0           2.16T  1.94T     1.47T  -
find /dev | grep 1002
/dev/RZ2-2_5-8_2TB/vm-1002-disk-0-part2
/dev/RZ2-2_5-8_2TB/vm-1002-disk-0-part1
/dev/RZ2-2_5-8_2TB/vm-1002-disk-0
/dev/R1_1.6TB_SSD_EVO860/vm-1002-disk-0-part1
/dev/R1_1.6TB_SSD_EVO860/vm-1002-disk-0
/dev/R1_1.6TB_SSD_EVO860/vm-1002-disk-0-part3
/dev/R1_1.6TB_SSD_EVO860/vm-1002-disk-0-part2
/dev/zvol/RZ2-2_5-8_2TB/vm-1002-disk-0-part2
/dev/zvol/RZ2-2_5-8_2TB/vm-1002-disk-0-part1
/dev/zvol/RZ2-2_5-8_2TB/vm-1002-disk-0
/dev/zvol/R1_1.6TB_SSD_EVO860/vm-1002-disk-0-part1
/dev/zvol/R1_1.6TB_SSD_EVO860/vm-1002-disk-0
/dev/zvol/R1_1.6TB_SSD_EVO860/vm-1002-disk-0-part3
/dev/zvol/R1_1.6TB_SSD_EVO860/vm-1002-disk-0-part2

zpool list
NAME                  SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
R1_1.6TB_SSD_EVO860  1.62T   118G  1.51T        -         -    16%     7%  1.00x    ONLINE  -
RZ2-1_1-4_4TB        14.5T   826G  13.7T        -         -     0%     5%  1.00x    ONLINE  -
RZ2-2_5-8_2TB        7.27T  3.04T  4.23T        -         -     2%    41%  1.00x    ONLINE  -

zpool status -v R1_1.6TB_SSD_EVO860
  pool: R1_1.6TB_SSD_EVO860
 state: ONLINE
  scan: scrub repaired 0B in 00:02:40 with 0 errors on Sun Feb 13 00:26:41 2022
config:
        NAME                                             STATE     READ WRITE CKSUM
        R1_1.6TB_SSD_EVO860                              ONLINE       0     0     0
          mirror-0                                       ONLINE       0     0     0
            ata-Samsung_SSD_860_EVO_2TB_S597NJ0NB19827A  ONLINE       0     0     0
            ata-Samsung_SSD_860_EVO_2TB_S597NJ0NB19834W  ONLINE       0     0     0
errors: No known data errors

nano /etc/pve/qemu-server/1002.conf
agent: 1
bios: ovmf
boot: order=scsi0
cores: 16
hotplug: disk,network,usb,memory,cpu
machine: pc-q35-6.0
memory: 16384
name: BlueIris
net0: virtio=HIDDEN,bridge=vmbr0
numa: 1
onboot: 1
ostype: win10
scsi0: R1_1.6TB_SSD_EVO860:vm-1002-disk-0,size=60G
scsi1: RZ2-2_5-8_2TB:vm-1002-disk-0,backup=0,size=2000G
scsihw: virtio-scsi-pci
smbios1: uuid=HIDDEN
sockets: 1
startup: order=3,up=0
vga: memory=8
vmgenid: HIDDEN

Stoiko Ivanov · Mar 11, 2022

Please post `pveversion -v` and make sure you've installed the latest available updates.

We had a similar case in this thread - maybe it helps you as well:
https://forum.proxmox.com/threads/t...found-after-300-sec-found.106242/#post-457549

tlex · Mar 11, 2022

Latest updates installed today and that's how I saw the problem.

pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-6-pve)
pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe)
pve-kernel-helper: 7.1-12
pve-kernel-5.13: 7.1-9
pve-kernel-5.13.19-6-pve: 5.13.19-14
pve-kernel-5.13.19-5-pve: 5.13.19-13
pve-kernel-5.13.19-4-pve: 5.13.19-9
ceph-fuse: 15.2.15-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.1
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-6
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-3
libpve-guest-common-perl: 4.1-1
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.1-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-2
proxmox-backup-client: 2.1.5-1
proxmox-backup-file-restore: 2.1.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-7
pve-cluster: 7.1-3
pve-container: 4.1-4
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-5
pve-ha-manager: 3.3-3
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.1-2
pve-xtermjs: 4.16.0-1
qemu-server: 7.1-4
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.2-pve1

Stoiko Ivanov · Mar 11, 2022

hmm - tried to reproduce the issue here - but without success...
could you:
* paste the output (in code tags) of `zfs get all R1_1.6TB_SSD_EVO860/vm-1002-disk-0`
* try installing `pve-kernel-5.15` to see if this changes anything
* attach your journal since booting (journalctl -b)

do you have any customized/non-default setting on the system/the zpools/the zvols?

tlex · Mar 11, 2022

same problem : (and no I don't have anything customized/nondefault for the zpool/system/zvols

zfs get all R1_1.6TB_SSD_EVO860/vm-1002-disk-0
NAME                                PROPERTY              VALUE                  SOURCE
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  type                  volume                 -
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  creation              Wed Mar  2 15:55 2022  -
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  used                  61.9G                  -
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  available             1.34T                  -
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  referenced            33.8G                  -
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  compressratio         1.00x                  -
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  reservation           none                   default
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  volsize               60G                    local
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  volblocksize          8K                     default
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  checksum              on                     default
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  compression           off                    inherited from R1_1.6TB_SSD_EVO860
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  readonly              off                    default
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  createtxg             381946                 -
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  copies                1                      default
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  refreservation        61.9G                  local
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  guid                  3858109402526119173    -
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  primarycache          all                    default
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  secondarycache        all                    default
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  usedbysnapshots       0B                     -
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  usedbydataset         33.8G                  -
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  usedbychildren        0B                     -
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  usedbyrefreservation  28.1G                  -
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  logbias               latency                default
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  objsetid              121040                 -
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  dedup                 off                    default
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  mlslabel              none                   default
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  sync                  disabled               inherited from R1_1.6TB_SSD_EVO860
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  refcompressratio      1.00x                  -
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  written               33.8G                  -
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  logicalused           33.6G                  -
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  logicalreferenced     33.6G                  -
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  volmode               default                default
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  snapshot_limit        none                   default
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  snapshot_count        none                   default
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  snapdev               hidden                 default
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  context               none                   default
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  fscontext             none                   default
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  defcontext            none                   default
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  rootcontext           none                   default
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  redundant_metadata    all                    default
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  encryption            off                    default
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  keylocation           none                   default
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  keyformat             none                   default
R1_1.6TB_SSD_EVO860/vm-1002-disk-0  pbkdf2iters           0                      default

pve-kernel-5.15 installed

pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.15.19-2-pve)
pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe)
pve-kernel-helper: 7.1-12
pve-kernel-5.15: 7.1-11
pve-kernel-5.13: 7.1-9
pve-kernel-5.15.19-2-pve: 5.15.19-3
pve-kernel-5.13.19-6-pve: 5.13.19-14
pve-kernel-5.13.19-5-pve: 5.13.19-13
pve-kernel-5.13.19-4-pve: 5.13.19-9
ceph-fuse: 15.2.15-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.1
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-6
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-3
libpve-guest-common-perl: 4.1-1
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.1-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-2
proxmox-backup-client: 2.1.5-1
proxmox-backup-file-restore: 2.1.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-7
pve-cluster: 7.1-3
pve-container: 4.1-4
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-5
pve-ha-manager: 3.3-3
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.1-2
pve-xtermjs: 4.16.0-1
qemu-server: 7.1-4
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.2-pve1

reboot

tlex · Mar 11, 2022

After another reboot, one of the vm that didn't boot at the previous reboot is now booting but I have the same issue with other vm using the same zpool.

zvol_wait

Testing 11 zvol links
Still waiting on 1 zvol links ...
Still waiting on 1 zvol links ...
No progress since last loop.
Checking if any zvols were deleted.
Remaining zvols:
RZ2-2_5-8_2TB/vm-1002-disk-0
Still waiting on 1 zvol links ...
No progress since last loop.
Checking if any zvols were deleted.
Remaining zvols:
RZ2-2_5-8_2TB/vm-1002-disk-0
Still waiting on 1 zvol links ...
No progress since last loop.
Checking if any zvols were deleted.
Remaining zvols:
RZ2-2_5-8_2TB/vm-1002-disk-0

Stoiko Ivanov · Mar 11, 2022

hm - could you run the following:

Code:

for i in $(ls -1  /dev/zd* |grep -v '/dev/zd[0-9]*p[0-9]*'); do echo $i;  /lib/udev/zvol_id $i ; done  |grep -B1 <name of missing zvol>

(replace <name of missing zvol> by the name of a zvol which has the problem (in your previous outputs: R1_1.6TB_SSD_EVO860/vm-1002-disk-0)

EDIT: if the above produces output - e.g. /dev/zd128
please also run

Code:

udevadm trigger  /dev/zd128

and check if /dev/zvol/R1_1.6TB_SSD_EVO860/vm-1002-disk-0 exists
(easiest done by running `zvol_wait`)

additionally - could you please try booting an older kernel:
pve-kernel-5.13.19-3-pve (or older)

Thanks!

tlex · Mar 11, 2022

Stoiko Ivanov said:
hm - could you run the following:

Code:

for i in $(ls -1 /dev/zd* |grep -v '/dev/zd[0-9]*p[0-9]*'); do echo $i; /lib/udev/zvol_id $i ; done |grep -B1 <name of missing zvol>

(replace <name of missing zvol> by the name of a zvol which has the problem (in your previous outputs: R1_1.6TB_SSD_EVO860/vm-1002-disk-0)

additionally - could you please try booting an older kernel:
pve-kernel-5.13.19-3-pve (or older)

Thanks!

for i in $(ls -1 /dev/zd* |grep -v '/dev/zd[0-9]*p[0-9]*'); do echo $i; /lib/udev/zvol_id $i ; done |grep -B1 RZ2-2_5-8_2TB/vm-1002-disk-0
/dev/zd0
RZ2-2_5-8_2TB/vm-1002-disk-0

Stoiko Ivanov · Mar 11, 2022

sorry - you were too fast for me:

Stoiko Ivanov said:
EDIT: if the above produces output - e.g. /dev/zd128
please also run

Code:

udevadm trigger /dev/zd128

and check if /dev/zvol/R1_1.6TB_SSD_EVO860/vm-1002-disk-0 exists
(easiest done by running `zvol_wait`)

tlex · Mar 11, 2022

for i in $(ls -1 /dev/zd* |grep -v '/dev/zd[0-9]*p[0-9]*'); do echo $i; /lib/udev/zvol_id $i ; done |grep -B1 RZ2-2_5-8_2TB/vm-1002-disk-0
/dev/zd0
RZ2-2_5-8_2TB/vm-1002-disk-0
root@pve:~# udevadm trigger /dev/zd128
root@pve:~# udevadm trigger /dev/zd0
root@pve:~# ls /dev/zvol/R1_1.6TB_SSD_EVO860

base-2000-disk-0        vm-1001-disk-0        vm-1002-disk-0        vm-1003-disk-0-part5  vm-1004-disk-1-part3  vm-102-disk-0
base-2000-disk-0-part1  vm-1001-disk-0-part1  vm-1002-disk-0-part3  vm-1004-disk-0        vm-1005-disk-0        vm-102-disk-0-part1
base-2000-disk-0-part2  vm-1001-disk-0-part2  vm-1003-disk-0        vm-1004-disk-1        vm-1005-disk-0-part1  vm-102-disk-0-part2
base-2000-disk-0-part3  vm-1001-disk-0-part3  vm-1003-disk-0-part1  vm-1004-disk-1-part1  vm-1005-disk-0-part2  vm-102-disk-0-part3
base-2000-disk-0-part4  vm-1001-disk-0-part4  vm-1003-disk-0-part2  vm-1004-disk-1-part2  vm-1005-disk-0-part3  vm-102-disk-0-part4

so in a second window I still had zpool_wait that was running and loop on RZ2-2_5-8_2TB/vm-1002-disk-0 like this :

Still waiting on 1 zvol links ...
No progress since last loop.
Checking if any zvols were deleted.
Remaining zvols:
RZ2-2_5-8_2TB/vm-1002-disk-0
Still waiting on 1 zvol links ...
No progress since last loop.
Checking if any zvols were deleted.
Remaining zvols:
RZ2-2_5-8_2TB/vm-1002-disk-0
Still waiting on 1 zvol links ...
No progress since last loop.
Checking if any zvols were deleted.
Remaining zvols:
RZ2-2_5-8_2TB/vm-1002-disk-0

As soon as this timed out I was able to start a problematic VM and the vm can access the disks from that pool without any issue.

But strange enough now it's a different Zpool than earlier...

Stoiko Ivanov · Mar 11, 2022

Hmm - seems like the issue is similar to https://bugzilla.proxmox.com/show_bug.cgi?id=3572
(which boils down to an issue in systemd)

tlex said:
But strange enough now it's a different Zpool than earlier...

If I understand the issue correctly it's got little to do with the zpool - but is rather an issue that udev messes up - resulting in the symlinks not getting create (this is non-deterministic)

Unless the issue completely goes away with an older kernel (pve-kernel-5.13.19-3-pve (or older))
currently I think that the following might help:
* try running (after a reboot): `systemctl restart systemd-udevd; udevadm trigger; udevadm settle` - then start your vms
* if this does not help - I think running:

Code:

for i in $(ls -1 /dev/zd* |grep -v '/dev/zd[0-9]*p[0-9]*'); do udevadm trigger $i; done

should work

tlex · Mar 11, 2022

Stoiko Ivanov said:
Hmm - seems like the issue is similar to https://bugzilla.proxmox.com/show_bug.cgi?id=3572
(which boils down to an issue in systemd)

If I understand the issue correctly it's got little to do with the zpool - but is rather an issue that udev messes up - resulting in the symlinks not getting create (this is non-deterministic)

Unless the issue completely goes away with an older kernel (pve-kernel-5.13.19-3-pve (or older))
currently I think that the following might help:
* try running (after a reboot): `systemctl restart systemd-udevd; udevadm trigger; udevadm settle` - then start your vms
* if this does not help - I think running:

Code:

for i in $(ls -1 /dev/zd* |grep -v '/dev/zd[0-9]*p[0-9]*'); do udevadm trigger $i; done

should work

ok I will give it a try.
In the meantime, would you mind just telling me how to revert to the previous kernel and remove pve-kernel-5.15 ?

tlex · Mar 11, 2022

tlex said:
ok I will give it a try.
In the meantime, would you mind just telling me how to revert to the previous kernel and remove pve-kernel-5.15 ?

nevermind, that was too easy : apt remove pve-kernel-5.15.19-2-pve

tlex · Mar 15, 2022

Stoiko Ivanov said:
Hmm - seems like the issue is similar to https://bugzilla.proxmox.com/show_bug.cgi?id=3572
(which boils down to an issue in systemd)

If I understand the issue correctly it's got little to do with the zpool - but is rather an issue that udev messes up - resulting in the symlinks not getting create (this is non-deterministic)

Unless the issue completely goes away with an older kernel (pve-kernel-5.13.19-3-pve (or older))
currently I think that the following might help:
* try running (after a reboot): `systemctl restart systemd-udevd; udevadm trigger; udevadm settle` - then start your vms
* if this does not help - I think running:

Code:

for i in $(ls -1 /dev/zd* |grep -v '/dev/zd[0-9]*p[0-9]*'); do udevadm trigger $i; done

should work

So while running the following code is working temporarly to fix the problem I was wondering if anything could be done to fix this completely or if any fix would be created in the near future to avoid running the command at every reboot ? Should a bug report be openned ? I'm just asking because I would like to know where to look for any updates regarding that while I put in place a temporary solution on my side..

Thanks for your help determining the problem btw

Code:

for i in $(ls -1 /dev/zd* |grep -v '/dev/zd[0-9]*p[0-9]*'); do udevadm trigger $i; done

tlex · Apr 9, 2022

I was wondering if there was any chance that this could be fixed in the near future ?
For now I don't know if this is the best thing to do but I added a cron job at each boot to run :
for i in $(ls -1 /dev/zd* |grep -v '/dev/zd[0-9]*p[0-9]*'); do udevadm trigger $i; done
If I dont, my vms dont start.
I see that the similar issue "https://bugzilla.proxmox.com/show_bug.cgi?id=3572" was fixed by installing libpve-storage-perl >= 7.0-12 but this is not the case here since I already run 7.1-1.

Any help / suggestion appreciated

Ivan Dimitrov · May 26, 2022

Hi, I have the same issue on the latest 7.2. Is this a regression or new issue?

Jannoke · Oct 10, 2022

Just updated to latest Proxmox ( 7.2-11) and started having same issue. The command described above will fix the issue for this boot.

dmcgough · Oct 10, 2022

Jannoke said:
Just updated to latest Proxmox ( 7.2-11) and started having same issue. The command described above will fix the issue for this boot.

Same. Confirming this on one of my nodes.

gelcom · Nov 8, 2022

Same problem here:
proxmox-ve: 7.2-1 (running kernel: 5.15.64-1-pve)
pve-manager: 7.2-11 (running version: 7.2-11/b76d3178)
pve-kernel-5.15: 7.2-13
pve-kernel-helper: 7.2-13
pve-kernel-5.15.64-1-pve: 5.15.64-1
pve-kernel-5.15.60-2-pve: 5.15.60-2
pve-kernel-5.15.60-1-pve: 5.15.60-1
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-3
libpve-guest-common-perl: 4.1-4
libpve-http-server-perl: 4.1-4
libpve-storage-perl: 7.2-10
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.7-1
proxmox-backup-file-restore: 2.2.7-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-2
pve-container: 4.2-3
pve-docs: 7.2-2
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-6
pve-firmware: 3.5-6
pve-ha-manager: 3.4.0
pve-i18n: 2.7-2
pve-qemu-kvm: 7.0.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.6-pve1

Hope it gets fixed soon...

kind regards

sneer · Nov 19, 2022

After today installation of proxmox + os updates we have the similar situation in our cluster.
One of VM reported TASK ERROR: timeout: no zvol device link for 'vm-121-disk-0' found after 300 sec found.

Code:

for i in $(ls -1  /dev/zd* |grep -v '/dev/zd[0-9]*p[0-9]*'); do echo $i;  /lib/udev/zvol_id $i ; done  |grep -B1 <name of missing zvol>

returned nothing

Code:

zvol_wait
Testing 18 zvol links
Still waiting on 1 zvol links ...
Still waiting on 1 zvol links ...
No progress since last loop.
Checking if any zvols were deleted.
Remaining zvols:
ZFSHDD/vm-121-disk-0
Still waiting on 1 zvol links ...
No progress since last loop.
Checking if any zvols were deleted.
Remaining zvols:
ZFSHDD/vm-121-disk-0
^C

Hopefully, after reboot everything was fine.

For debug purposes, pveversion bellow

Code:

proxmox-ve: 7.2-1 (running kernel: 5.15.74-1-pve)
pve-manager: 7.2-14 (running version: 7.2-14/65898fbc)
pve-kernel-5.15: 7.2-14
pve-kernel-helper: 7.2-14
pve-kernel-5.13: 7.1-9
pve-kernel-5.11: 7.0-10
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.15.60-2-pve: 5.15.60-2
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-4-pve: 5.11.22-9
ceph-fuse: 15.2.14-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-7
libpve-guest-common-perl: 4.2-2
libpve-http-server-perl: 4.1-5
libpve-network-perl: 0.7.2
libpve-storage-perl: 7.2-12
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
openvswitch-switch: 2.15.0+ds1-2+deb11u1
proxmox-backup-client: 2.2.7-1
proxmox-backup-file-restore: 2.2.7-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.0-1
proxmox-widget-toolkit: 3.5.2
pve-cluster: 7.2-3
pve-container: 4.3-4
pve-docs: 7.2-3
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.5-6
pve-ha-manager: 3.4.0
pve-i18n: 2.7-2
pve-qemu-kvm: 7.1.0-3
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-10
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.6-pve1

ZFS issues with disk images after reboot

Active Member

Proxmox Staff Member

Active Member

Proxmox Staff Member

Active Member

Attachments

Active Member

Proxmox Staff Member

Active Member

Proxmox Staff Member

Active Member

Proxmox Staff Member

Active Member

Active Member

Active Member

Active Member

Renowned Member

Renowned Member

Member

Member

Renowned Member

We value your privacy