[SOLVED] After reboot, one VM fails to start due to TASK ERROR: timeout: no zvol device link for 'vm-125-disk-0' found after 300 sec found.

linucksproxmox · Oct 4, 2023

After rebooting, one of my VMs fails to start with this message "TASK ERROR: timeout: no zvol device link for 'vm-125-disk-0' found after 300 sec found."

This is on 8.0.4 and here are some outputs I think might be relevant.

root@pve:~# zvol_wait
Testing 4 zvol links
Still waiting on 1 zvol links ...
Still waiting on 1 zvol links ...
No progress since last loop.
Checking if any zvols were deleted.
Remaining zvols:
nvme1tb/vm-125-disk-0
Still waiting on 1 zvol links ...

root@pve:~# zfs get all nvme1tb/vm-125-disk-0
NAME PROPERTY VALUE SOURCE
nvme1tb/vm-125-disk-0 type volume -
nvme1tb/vm-125-disk-0 creation Tue Oct 3 13:58 2023 -
nvme1tb/vm-125-disk-0 used 55.7G -
nvme1tb/vm-125-disk-0 available 128G -
nvme1tb/vm-125-disk-0 referenced 22.1G -
nvme1tb/vm-125-disk-0 compressratio 1.43x -
nvme1tb/vm-125-disk-0 reservation none default
nvme1tb/vm-125-disk-0 volsize 32G local
nvme1tb/vm-125-disk-0 volblocksize 8K default
nvme1tb/vm-125-disk-0 checksum on default
nvme1tb/vm-125-disk-0 compression lz4 inherited from nvme1tb
nvme1tb/vm-125-disk-0 readonly off default
nvme1tb/vm-125-disk-0 createtxg 577374 -
nvme1tb/vm-125-disk-0 copies 1 default
nvme1tb/vm-125-disk-0 refreservation 33.0G local
nvme1tb/vm-125-disk-0 guid 5051027053486940799 -
nvme1tb/vm-125-disk-0 primarycache all default
nvme1tb/vm-125-disk-0 secondarycache all default
nvme1tb/vm-125-disk-0 usedbysnapshots 582M -
nvme1tb/vm-125-disk-0 usedbydataset 22.1G -
nvme1tb/vm-125-disk-0 usedbychildren 0B -
nvme1tb/vm-125-disk-0 usedbyrefreservation 33.0G -
nvme1tb/vm-125-disk-0 logbias latency default
nvme1tb/vm-125-disk-0 objsetid 1809 -
nvme1tb/vm-125-disk-0 dedup off default
nvme1tb/vm-125-disk-0 mlslabel none default
nvme1tb/vm-125-disk-0 sync standard default
nvme1tb/vm-125-disk-0 refcompressratio 1.42x -
nvme1tb/vm-125-disk-0 written 0 -
nvme1tb/vm-125-disk-0 logicalused 32.2G -
nvme1tb/vm-125-disk-0 logicalreferenced 31.4G -
nvme1tb/vm-125-disk-0 volmode default default
nvme1tb/vm-125-disk-0 snapshot_limit none default
nvme1tb/vm-125-disk-0 snapshot_count none default
nvme1tb/vm-125-disk-0 snapdev hidden default
nvme1tb/vm-125-disk-0 context none default
nvme1tb/vm-125-disk-0 fscontext none default
nvme1tb/vm-125-disk-0 defcontext none default
nvme1tb/vm-125-disk-0 rootcontext none default
nvme1tb/vm-125-disk-0 redundant_metadata all default
nvme1tb/vm-125-disk-0 encryption off default
nvme1tb/vm-125-disk-0 keylocation none default
nvme1tb/vm-125-disk-0 keyformat none default
nvme1tb/vm-125-disk-0 pbkdf2iters 0 default

Any ideas how I can resolve this?

Philipp Hufnagl · Oct 4, 2023

Hello

Lets see if there is anything in the logs with journalctl -b.
Lets see get some more information about your storage sitoation with lsblk and zpool status.
Lets take a look at your vm config with /etc/pve/qemu-server/125.conf ust to be sure.

linucksproxmox · Oct 4, 2023

Thanks. I want to start with the fact that I ran apt update && apt full-upgrade which updated the kernel version and zfs version, then rebooted. After the reboot everything is working as expected. So I'm not sure if there was a fix or something in either the kernel or zfs, but it appears to have resolved it without any other changes.

Here's the output you requested, although they are from after the reboot. The journalctl.txt file is from the previous reboot until this morning.

root@pve:~# pveversion -v
proxmox-ve: 8.0.2 (running kernel: 6.2.16-15-pve)
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
proxmox-kernel-helper: 8.0.3
pve-kernel-5.15: 7.4-6
pve-kernel-5.13: 7.1-9
proxmox-kernel-6.2.16-15-pve: 6.2.16-15
proxmox-kernel-6.2: 6.2.16-15
proxmox-kernel-6.2.16-14-pve: 6.2.16-14
pve-kernel-5.15.116-1-pve: 5.15.116-1
pve-kernel-5.13.19-6-pve: 5.13.19-15
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown: residual config
ifupdown2: 3.2.0-1+pmx5
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.26-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.5
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.9
libpve-guest-common-perl: 5.0.5
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.5
libpve-storage-perl: 8.0.2
libqb0: 1.0.5-1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.3-1
proxmox-backup-file-restore: 3.0.3-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.2
proxmox-widget-toolkit: 4.0.9
pve-cluster: 8.0.4
pve-container: 5.0.4
pve-docs: 8.0.5
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.8-2
pve-ha-manager: 4.0.2
pve-i18n: 3.0.7
pve-qemu-kvm: 8.0.2-6
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.7
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.13-pve1

root@pve:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 931.5G 0 disk
├─sda1 8:1 0 1.9M 0 part
├─sda2 8:2 0 512M 0 part
└─sda3 8:3 0 931G 0 part
├─pve-swap 253:0 0 8G 0 lvm [SWAP]
├─pve-root 253:1 0 96G 0 lvm /mnt/vzdump
│ /
├─pve-vm--9999--disk--0 253:2 0 2.2G 0 lvm
├─pve-vm--9999--cloudinit 253:3 0 4M 0 lvm
├─pve-vm--402--disk--0 253:4 0 50G 0 lvm
├─pve-var--tmp 253:5 0 97.7G 0 lvm
├─pve-vm--303--disk--0 253:6 0 125G 0 lvm
└─pve-vm--104--disk--0 253:7 0 125G 0 lvm
sdb 8:16 0 3.6T 0 disk
├─sdb1 8:17 0 3.6T 0 part
└─sdb9 8:25 0 8M 0 part
sdc 8:32 0 3.6T 0 disk
├─sdc1 8:33 0 3.6T 0 part
└─sdc9 8:41 0 8M 0 part
sdd 8:48 0 3.6T 0 disk
├─sdd1 8:49 0 3.6T 0 part
└─sdd9 8:57 0 8M 0 part
sde 8:64 0 3.6T 0 disk
├─sde1 8:65 0 3.6T 0 part
└─sde9 8:73 0 8M 0 part
sdf 8:80 0 3.6T 0 disk
├─sdf1 8:81 0 3.6T 0 part
└─sdf9 8:89 0 8M 0 part
sdg 8:96 0 3.6T 0 disk
├─sdg1 8:97 0 3.6T 0 part
└─sdg9 8:105 0 8M 0 part
sdh 8:112 0 3.6T 0 disk
├─sdh1 8:113 0 3.6T 0 part
└─sdh9 8:121 0 8M 0 part
sdi 8:128 0 3.6T 0 disk
├─sdi1 8:129 0 3.6T 0 part
└─sdi9 8:137 0 8M 0 part
sdj 8:144 0 3.6T 0 disk
├─sdj1 8:145 0 3.6T 0 part
└─sdj9 8:153 0 64M 0 part
zd0 230:0 0 125G 0 disk
├─zd0p1 230:1 0 1M 0 part
└─zd0p2 230:2 0 125G 0 part
zd16 230:16 0 32G 0 disk
zd32 230:32 0 42G 0 disk
├─zd32p1 230:33 0 40.4G 0 part
└─zd32p2 230:34 0 1.6G 0 part
zd48 230:48 0 60G 0 disk
├─zd48p1 230:49 0 579M 0 part
└─zd48p2 230:50 0 59.4G 0 part
nvme0n1 259:0 0 931.5G 0 disk
├─nvme0n1p1 259:1 0 931.5G 0 part
└─nvme0n1p9 259:2 0 8M 0 part

root@pve:~# zpool status
pool: backup1
state: ONLINE
scan: scrub repaired 0B in 02:56:52 with 0 errors on Fri Sep 22 07:56:53 2023
config:

NAME STATE READ WRITE CKSUM
backup1 ONLINE 0 0 0
ata-Hitachi_HUS724040ALE641_PBJ1TLWT ONLINE 0 0 0

errors: No known data errors

pool: backup2
state: ONLINE
scan: scrub repaired 0B in 04:34:10 with 0 errors on Sat Sep 23 09:34:11 2023
config:

NAME STATE READ WRITE CKSUM
backup2 ONLINE 0 0 0
ata-Hitachi_HUS724040ALE641_PCJL704B ONLINE 0 0 0

errors: No known data errors

pool: nvme1tb
state: ONLINE
config:

NAME STATE READ WRITE CKSUM
nvme1tb ONLINE 0 0 0
nvme-CT1000P5SSD8_21022C4CE431 ONLINE 0 0 0

errors: No known data errors

pool: tank
state: ONLINE
scan: scrub repaired 0B in 06:11:43 with 0 errors on Sun Oct 1 11:11:44 2023
config:

NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
ata-Hitachi_HUS724040ALE641_PCJ4YN0X ONLINE 0 0 0
ata-Hitachi_HUS724040ALE641_PBKZKVJT ONLINE 0 0 0
ata-Hitachi_HUS724040ALE641_PCGL7GGB ONLINE 0 0 0
ata-Hitachi_HUS724040ALE641_P4GNESYB ONLINE 0 0 0
ata-Hitachi_HUS724040ALE641_P4GRKWMB ONLINE 0 0 0
ata-Hitachi_HUS724040ALE641_P4GMAKYB ONLINE 0 0 0

errors: No known data errors

pool: usbhotbackup
state: ONLINE
scan: scrub repaired 0B in 07:07:18 with 0 errors on Mon Sep 25 12:07:33 2023
config:

NAME STATE READ WRITE CKSUM
usbhotbackup ONLINE 0 0 0
ata-HITACHI_HUS724040ALE640_PBGKD40S ONLINE 0 0 0

errors: No known data errors

agent: 1
boot: order=scsi0;ide2;net0
cores: 4
ide2: none,media=cdrom
memory: 4096
name: dockercompose01
net0: virtio=86:9A:BF:80:0E:6C,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: nvme-images:vm-125-disk-0,size=32G
scsihw: virtio-scsi-pci
smbios1: uuid=e2383a5b-c31c-4f15-955f-aee27cbea6ea
sockets: 1
vmgenid: 00e198a0-8f08-4404-b9a2-2c041af00f39

Philipp Hufnagl · Oct 4, 2023

Can you also send me the cat /etc/pve/storage.cfg and ls -l /dev/zvol/nvme1tb?

linucksproxmox · Oct 4, 2023

root@pve:~# cat /etc/pve/storage.cfg
dir: local
path /var/lib/vz
content iso,vztmpl,backup

dir: isos
path /tank/isos
content iso
shared 0

dir: diskimages
path /tank/diskimages
content images
shared 0

dir: cloudstorage
path /tank/cloudstorage/
content images
shared 0

dir: tankcontainers
path /tank/containers
content rootdir
shared 0

pbs: backup
datastore backup
server 192.168.1.16
content backup
fingerprint e0:1d:c8:ab:de:e5:55:7d:dd:8a:08:9b:1b:9a:87:71:a3:60:3d:18:4d:d1:74:ed:01:7c:d6:d3:6b:e6:e0:14
prune-backups keep-all=1
username root@pam

lvm: local-lvm
vgname pve
content rootdir,images
shared 0

zfspool: nvme-images
pool nvme1tb
content images,rootdir
mountpoint /nvme1tb
sparse 0

This output includes everything since this morning's reboot, but previously it was missing vm-103 and vm-125

Code:

root@pve:~# ls -l /dev/zvol/nvme1tb/
total 0
lrwxrwxrwx 1 root root 10 Oct  4 09:02 vm-103-disk-0 -> ../../zd48
lrwxrwxrwx 1 root root 12 Oct  4 09:02 vm-103-disk-0-part1 -> ../../zd48p1
lrwxrwxrwx 1 root root 12 Oct  4 09:02 vm-103-disk-0-part2 -> ../../zd48p2
lrwxrwxrwx 1 root root  9 Oct  4 09:02 vm-104-disk-0 -> ../../zd0
lrwxrwxrwx 1 root root 11 Oct  4 09:02 vm-104-disk-0-part2 -> ../../zd0p2
lrwxrwxrwx 1 root root 10 Oct  4 09:02 vm-125-disk-0 -> ../../zd16
lrwxrwxrwx 1 root root 10 Oct  4 09:02 vm-400-disk-0 -> ../../zd32
lrwxrwxrwx 1 root root 12 Oct  4 09:02 vm-400-disk-0-part1 -> ../../zd32p1
lrwxrwxrwx 1 root root 12 Oct  4 09:02 vm-400-disk-0-part2 -> ../../zd32p2

Philipp Hufnagl · Oct 4, 2023

Hello,
I am still looking thorugh your log. Was that before or after the reboot?

Anyway, I found some udev error I wanted to bring to your attention:

Code:

Oct 03 21:33:34 pve (udev-worker)[4034]: zd16: Failed to process device, ignoring: Unknown error 512
Oct 03 21:33:34 pve (udev-worker)[4033]: zd0p1: Failed to process device, ignoring: Unknown error 512

Its hard to say anything for certain right now but if this is after the the reboot, it could mean that the root cause is not solved.

linucksproxmox · Oct 4, 2023

The journalctl log was before the reboot, when I was having the issue.

The udev failures are interesting, specifically the zd16 is the one that corresponds to vm-125. How can I determine the root cause for the udev failures?

Edit: Since the latest reboot, I still see the 1 udev-worker failure

Oct 04 09:02:41 pve (udev-worker)[4024]: zd0p1: Failed to process device, ignoring: Unknown error 512

Philipp Hufnagl · Oct 5, 2023

Hello

I think you have been effected by this bug. As far as I understand, they think its fixed now, but they are not 100% certain. You may want to check your log once a while for some time but I don't think its reason to worry too much.

linucksproxmox · Oct 5, 2023

Thanks! I found the same information yesterday but wasn't sure if that was related. I will keep an eye on this going forward, but it doesn't sound like a major issue.

Search

Search

[SOLVED] After reboot, one VM fails to start due to TASK ERROR: timeout: no zvol device link for 'vm-125-disk-0' found after 300 sec found.

linucksproxmox

Member

Philipp Hufnagl

Active Member

linucksproxmox

Member

Attachments

Philipp Hufnagl

Active Member

linucksproxmox

Member

Philipp Hufnagl

Active Member

linucksproxmox

Member

Philipp Hufnagl

Active Member

linucksproxmox

Member