[SOLVED] VM Disk Missing and vm does not boot.

chrispapan

New Member
Jun 6, 2023
18
1
3
Hello everyone, today i found that my VM had shut down on its own and would not boot back up. When i look at the vm disks, they have disappeared.
However, when i zfs list i can see the zvol there quite clearly and the disk has allocation so it is there, but for somereason not showing and the vm is not seeing it so it does not boot. The error is this: TASK ERROR: timeout: no zvol device link for 'vm-200-disk-0' found after 300 sec found.

Also, i am getting a second error sometimes which is: TASK ERROR: can't lock file '/var/lock/qemu-server/lock-200.conf' - got timeout
but i resolved that by deleting the lock file.

No chance of physical failiure, all two of the drives are recognized and have passed S.M.A.R.T.

Also, i am encountering the same problem on an identical server thats on the same cluster as this one. Both are dell poweredge 210ii's

Anything you need config wise, i can send, just please instruct me the command to input.

The problem is really serious, these are production servers and my work is dependent on them, could anyone please help? It would be so much appreciated. Thank you so much!
 
Hi,
the error states that the zvol device link was not found, it should be located under ls -la /dev/zvol/<zpool>? What do you find there?

Please also share the output of cat /etc/pve/storage.cfg and zfs list as well as the config for the VM in question qm config <VMID>, all in code tags for better readability.
 
cat /etc/pve/storage.cfg output is
Code:
dir: local
        path /var/lib/vz
        content vztmpl,backup,iso

zfspool: Secure-Hybrid-ZFS
        pool Secure-Hybrid-ZFS
        content rootdir,images
        mountpoint /Secure-Hybrid-ZFS
        nodes nitrogen3

lvm: nitrogen4
        vgname pve
        content images,rootdir
        nodes nitrogen4
        shared 0

zfspool: local-zfs
        pool rpool
        content images,rootdir
        mountpoint /rpool
        nodes nitrogen2,nitrogen1
        sparse 0
and zfs list output is
Code:
rpool                     14.8G   435G      104K  /rpool
rpool/ROOT                6.15G   435G       96K  /rpool/ROOT
rpool/ROOT/pve-1          6.15G   435G     6.15G  /
rpool/data                8.56G   435G       96K  /rpool/data
rpool/data/vm-200-disk-0  8.56G   435G     8.56G  -
and the qm config 200 is
Code:
boot: order=scsi0;ide2;net0
cores: 4
ide2: local:iso/ubuntu-22.04.1-live-server-amd64.iso,media=cdrom,size=1440306K
memory: 7600
meta: creation-qemu=7.1.0,ctime=1678803610
name: Web1
net0: virtio=E2:A9:F6:0A:CA:69,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: local-zfs:vm-200-disk-0,cache=writeback,discard=on,iothread=1,size=450G
scsihw: virtio-scsi-single
smbios1: uuid=4a63602c-fadb-49f3-95a8-22af08c625ce
sockets: 1
vmgenid: 088dcc58-210e-44ea-ab6b-42f5e121c4d4
When inputing ls -la /dev/zvol/vm-200-disk-0 the output is:

ls: cannot access '/dev/zvol/vm-200-disk-0': No such file or directory

Please advise
 
Last edited:
cat /etc/pve/storage.cfg output is
Code:
dir: local
        path /var/lib/vz
        content vztmpl,backup,iso

zfspool: Secure-Hybrid-ZFS
        pool Secure-Hybrid-ZFS
        content rootdir,images
        mountpoint /Secure-Hybrid-ZFS
        nodes nitrogen3

lvm: nitrogen4
        vgname pve
        content images,rootdir
        nodes nitrogen4
        shared 0

zfspool: local-zfs
        pool rpool
        content images,rootdir
        mountpoint /rpool
        nodes nitrogen2,nitrogen1
        sparse 0
and zfs list output is
Code:
rpool                     14.8G   435G      104K  /rpool
rpool/ROOT                6.15G   435G       96K  /rpool/ROOT
rpool/ROOT/pve-1          6.15G   435G     6.15G  /
rpool/data                8.56G   435G       96K  /rpool/data
rpool/data/vm-200-disk-0  8.56G   435G     8.56G  -
and the qm config 200 is
Code:
boot: order=scsi0;ide2;net0
cores: 4
ide2: local:iso/ubuntu-22.04.1-live-server-amd64.iso,media=cdrom,size=1440306K
memory: 7600
meta: creation-qemu=7.1.0,ctime=1678803610
name: Web1
net0: virtio=E2:A9:F6:0A:CA:69,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: local-zfs:vm-200-disk-0,cache=writeback,discard=on,iothread=1,size=450G
scsihw: virtio-scsi-single
smbios1: uuid=4a63602c-fadb-49f3-95a8-22af08c625ce
sockets: 1
vmgenid: 088dcc58-210e-44ea-ab6b-42f5e121c4d4
When inputing ls -la /dev/zvol/vm-200-disk-0 the output is:

ls: cannot access '/dev/zvol/vm-200-disk-0': No such file or directory

Please advise
You did not query the right path, in your case it should be ls -la /dev/zvol/rpool/data/. I see you have a cluster, is the VM config located on the correct host, meaning on the host where the disk is available?
 
I think the vm config is on the correct host. How do i check?

Also, you were right, this is the output of ls -la /dev/zvol/rpool/data/:

Code:
total 0
drwxr-xr-x 2 root root 120 Jun 21 17:09 .
drwxr-xr-x 3 root root  60 Jun 21 17:09 ..
lrwxrwxrwx 1 root root  12 Jun 21 17:09 vm-200-disk-0 -> ../../../zd0
lrwxrwxrwx 1 root root  14 Jun 21 17:09 vm-200-disk-0-part1 -> ../../../zd0p1
lrwxrwxrwx 1 root root  14 Jun 21 17:09 vm-200-disk-0-part2 -> ../../../zd0p2
lrwxrwxrwx 1 root root  14 Jun 21 17:09 vm-200-disk-0-part3 -> ../../../zd0p3
 
I think the vm config is on the correct host. How do i check?

Also, you were right, this is the output of ls -la /dev/zvol/rpool/data/:

Code:
total 0
drwxr-xr-x 2 root root 120 Jun 21 17:09 .
drwxr-xr-x 3 root root  60 Jun 21 17:09 ..
lrwxrwxrwx 1 root root  12 Jun 21 17:09 vm-200-disk-0 -> ../../../zd0
lrwxrwxrwx 1 root root  14 Jun 21 17:09 vm-200-disk-0-part1 -> ../../../zd0p1
lrwxrwxrwx 1 root root  14 Jun 21 17:09 vm-200-disk-0-part2 -> ../../../zd0p2
lrwxrwxrwx 1 root root  14 Jun 21 17:09 vm-200-disk-0-part3 -> ../../../zd0p3
You can check by running ls -lah /etc/pve/local/qemu-server/ on the node where you have located the zvols. If the VM config is listed, than the VM is on the same host.
 
This is the output, so i think its the correct host.

Code:
total 512
drwxr-xr-x 2 root www-data   0 Mar 14 13:31 .
drwxr-xr-x 2 root www-data   0 Mar 14 13:31 ..
-rw-r----- 1 root www-data 482 Jun 21 17:15 200.conf
 
zfspool: local-zfs pool rpool content images,rootdir mountpoint /rpool
I think I found your issue: The pool parameter for local-zfs should be rpool/data and not rpool as it is now. Also I am not sure you need the mountpoint parameter? Did you recently reconfigured the zpool?
 
No i did not, i have not configured anything, so how do i go on from here? Sorry that im asking so much, im new to hypervisors.
 
You will have to manually adapt the config in /etc/pve/storage.cfg to reflect the changes.
 
I can't thank you enough. If there is any way i can express my gratitude please let me know. You truly saved me.
 
I can't thank you enough. If there is any way i can express my gratitude please let me know. You truly saved me.
Glad I could help! Please mark the thread as solved for others to find solutions easier.
 
Hi,
I am getting the same error, can you please see the below details for each command as I can not see the vm-103-disk-0 locate in ls -la /dev/zvol/rpool/data/

This VM was working before without any issue.


ls -la /dev/azvol/rpool/data/:
Code:
drwxr-xr-x 2 root root 280 Dec  6 09:45 .
drwxr-xr-x 3 root root  60 Dec  6 09:44 ..
lrwxrwxrwx 1 root root  13 Dec  6 09:45 vm-108-disk-0 -> ../../../zd32
lrwxrwxrwx 1 root root  15 Dec  6 09:45 vm-108-disk-0-part1 -> ../../../zd32p1
lrwxrwxrwx 1 root root  15 Dec  6 09:45 vm-108-disk-0-part2 -> ../../../zd32p2
lrwxrwxrwx 1 root root  15 Dec  6 09:45 vm-108-disk-0-part3 -> ../../../zd32p3
lrwxrwxrwx 1 root root  13 Dec  6 09:45 vm-120-disk-0 -> ../../../zd16
lrwxrwxrwx 1 root root  15 Dec  6 09:45 vm-120-disk-0-part1 -> ../../../zd16p1
lrwxrwxrwx 1 root root  15 Dec  6 09:45 vm-120-disk-0-part2 -> ../../../zd16p2
lrwxrwxrwx 1 root root  15 Dec  6 09:45 vm-120-disk-0-part3 -> ../../../zd16p3
lrwxrwxrwx 1 root root  12 Dec  6 09:45 vm-136-disk-0 -> ../../../zd0
lrwxrwxrwx 1 root root  14 Dec  6 09:45 vm-136-disk-0-part1 -> ../../../zd0p1
lrwxrwxrwx 1 root root  14 Dec  6 09:45 vm-136-disk-0-part2 -> ../../../zd0p2
lrwxrwxrwx 1 root root  14 Dec  6 09:45 vm-136-disk-0-part3 -> ../../../zd0p3

cat /etc/pve/storage.cfg:
Code:
dir: local
    path /var/lib/vz
    content images,iso,vztmpl
    shared 0

zfspool: local-zfs
    pool rpool/data
    content rootdir,images
    sparse 1

rbd: datapool
    content images,rootdir
    krbd 0
    pool datapool

zfspool: localdata-zfs
    pool localdata-zfs
    content images,rootdir
    mountpoint /localdata-zfs
    nodes proxgpu

zfs list:
Code:
NAME                       USED  AVAIL     REFER  MOUNTPOINT
rpool                     79.2G   370G      104K  /rpool
rpool/ROOT                40.9G   370G       96K  /rpool/ROOT
rpool/ROOT/pve-1          40.9G   370G     40.9G  /
rpool/data                38.2G   370G       96K  /rpool/data
rpool/data/vm-108-disk-0  11.8G   370G     11.8G  -
rpool/data/vm-120-disk-0  7.20G   370G     7.20G  -
rpool/data/vm-136-disk-0  19.2G   370G     19.2G  -

qm config 103:
Code:
boot: order=scsi0;net0
cipassword: **********
ciuser: 
cores: 2
ipconfig0: 
memory: 12288
name: gitlab-internal
nameserver: 
net0: virtio=A2:39:8F:60:A8:E7,bridge=vmbr230
numa: 0
ostype: l26
scsi0: local-zfs:vm-103-disk-0,cache=writeback,discard=on,size=100G
scsihw: virtio-scsi-pci
searchdomain: stuxnet.lab
smbios1: uuid=bbce16b9-098d-4e78-b46d-e453ec19d215
sockets: 1
vmgenid: b8094370-2dc3-4078-8031-a4b78aedc318
 
Hi,
I am getting the same error, can you please see the below details for each command as I can not see the vm-103-disk-0 locate in ls -la /dev/zvol/rpool/data/

This VM was working before without any issue.


ls -la /dev/azvol/rpool/data/:
Code:
drwxr-xr-x 2 root root 280 Dec  6 09:45 .
drwxr-xr-x 3 root root  60 Dec  6 09:44 ..
lrwxrwxrwx 1 root root  13 Dec  6 09:45 vm-108-disk-0 -> ../../../zd32
lrwxrwxrwx 1 root root  15 Dec  6 09:45 vm-108-disk-0-part1 -> ../../../zd32p1
lrwxrwxrwx 1 root root  15 Dec  6 09:45 vm-108-disk-0-part2 -> ../../../zd32p2
lrwxrwxrwx 1 root root  15 Dec  6 09:45 vm-108-disk-0-part3 -> ../../../zd32p3
lrwxrwxrwx 1 root root  13 Dec  6 09:45 vm-120-disk-0 -> ../../../zd16
lrwxrwxrwx 1 root root  15 Dec  6 09:45 vm-120-disk-0-part1 -> ../../../zd16p1
lrwxrwxrwx 1 root root  15 Dec  6 09:45 vm-120-disk-0-part2 -> ../../../zd16p2
lrwxrwxrwx 1 root root  15 Dec  6 09:45 vm-120-disk-0-part3 -> ../../../zd16p3
lrwxrwxrwx 1 root root  12 Dec  6 09:45 vm-136-disk-0 -> ../../../zd0
lrwxrwxrwx 1 root root  14 Dec  6 09:45 vm-136-disk-0-part1 -> ../../../zd0p1
lrwxrwxrwx 1 root root  14 Dec  6 09:45 vm-136-disk-0-part2 -> ../../../zd0p2
lrwxrwxrwx 1 root root  14 Dec  6 09:45 vm-136-disk-0-part3 -> ../../../zd0p3

cat /etc/pve/storage.cfg:
Code:
dir: local
    path /var/lib/vz
    content images,iso,vztmpl
    shared 0

zfspool: local-zfs
    pool rpool/data
    content rootdir,images
    sparse 1

rbd: datapool
    content images,rootdir
    krbd 0
    pool datapool

zfspool: localdata-zfs
    pool localdata-zfs
    content images,rootdir
    mountpoint /localdata-zfs
    nodes proxgpu

zfs list:
Code:
NAME                       USED  AVAIL     REFER  MOUNTPOINT
rpool                     79.2G   370G      104K  /rpool
rpool/ROOT                40.9G   370G       96K  /rpool/ROOT
rpool/ROOT/pve-1          40.9G   370G     40.9G  /
rpool/data                38.2G   370G       96K  /rpool/data
rpool/data/vm-108-disk-0  11.8G   370G     11.8G  -
rpool/data/vm-120-disk-0  7.20G   370G     7.20G  -
rpool/data/vm-136-disk-0  19.2G   370G     19.2G  -

qm config 103:
Code:
boot: order=scsi0;net0
cipassword: **********
ciuser:
cores: 2
ipconfig0:
memory: 12288
name: gitlab-internal
nameserver:
net0: virtio=A2:39:8F:60:A8:E7,bridge=vmbr230
numa: 0
ostype: l26
scsi0: local-zfs:vm-103-disk-0,cache=writeback,discard=on,size=100G
scsihw: virtio-scsi-pci
searchdomain: stuxnet.lab
smbios1: uuid=bbce16b9-098d-4e78-b46d-e453ec19d215
sockets: 1
vmgenid: b8094370-2dc3-4078-8031-a4b78aedc318
Hi,
well you have no disk related to VM with ID 103 on your local-zfs storage. Did you maybe move the disk to another storage, without the VM config reflecting the change?
 
Hi,
well you have no disk related to VM with ID 103 on your local-zfs storage. Did you maybe move the disk to another storage, without the VM config reflecting the change?
Hi,
I did not move any disk, it was a disk related to this VM and suddenly disappeared, what do you recommend now please?
 
Hi,
I did not move any disk, it was a disk related to this VM and suddenly disappeared, what do you recommend now please?
Were there other actions taken which might explain why the disk disappeared? This will not just happen on its own. Is this node part of a cluster? Maybe the VM config is located on a different node where the disk is not present?

If it is not clear what changes where made or how/why the disk was removed, I would recommend to restore from backup.
 
Hi,
I am not the one who setup this, can you tell me how to check if VM config located on a different node, you can see the details from qm config 103 command that is assigned to vm-103-disk.
if I need to restore from backup so I need to create new disk for this VM then restore the backup? is there anything else need to do?
Thank you so much
 
Hi,
I am not the one who setup this, can you tell me how to check if VM config located on a different node, you can see the details from qm config 103 command that is assigned to vm-103-disk.
if I need to restore from backup so I need to create new disk for this VM then restore the backup? is there anything else need to do?
Thank you so much
I recommend you check all the storages of the nodes for the presence of disks related to VMID 103. If you cannot find a disk it is enough to restore the backup, this will recreate the disks and configuration as stored in the backup. You can also restore to a different VMID, if you do not want to overwrite the current VM.

See also https://pve.proxmox.com/pve-docs/pve-admin-guide.html#chapter_vzdump