After Proxmox upgrade from 6 to 7, all vas are down due not found disks

franciscopaniskaseker · Jul 9, 2021

The VMs are not finding the disks. They are under LVM volume that sum 4x2TB disks in one large volume. The Proxmox can see all the disks, but probably is not mounting the volume.

I saw one change in lvm.conf did by the upgrade procedure:

Code:

root@pve:/etc/lvm# diff lvm.conf lvm.conf.bak
175c175
< #    scan_lvs = 1
---
>     scan_lvs = 1
2194,2197d2193
< devices {
<      # added by pve-manager to avoid scanning LVM volumes
<      scan_lvs=0
< }

I am not sure why we have this new config and if is right or not.

franciscopaniskaseker · Jul 9, 2021

VGS command show all the volumes expected. LVS show all the disks expected.

franciscopaniskaseker · Jul 9, 2021

I discovered that just first volume was mapped in /dev (/dev/pve). The biggest one (/dev/pve3/) is not there. And all the VMs are there. I am trying to identify which task caused that during the upgrade.

franciscopaniskaseker · Jul 9, 2021

For now I fixed doing scan_lvs=1. I need more informations about this new setting.

fabian · Jul 9, 2021

do you use nested LVM (e.g., a PV and VG on top of an LV?) - that is disabled by default in LVM now. if you need that, you might want to check out the global_filter option and configure it accordingly (disallow scanning LVs except the ones that you require for your nested setup).

you definitely don't want the host LVM to scan all LVs, as that can lead to LVs being active on the host and inside a VM, which can cause corruption.

mindseye · Jul 9, 2021

I am also having the exact same issue as this. I have an install on a low end AMD system that was running Proxmox VE 6.4, with a PBS server being the only VM running on this server. The server has a single SSD with the default partition layout. I upgraded to 7.0, following the upgrade path specified without errors. Upon boot, I am no longer able to boot my VM with the VM stating that it cannot find a boot device. I see no errors logged on the Proxmox side. I created a new VM and it also cannot access the disk. lvdisplay shows the LV groups. If I move the disk from the local storage to a NFS share hosted on a Synology, the VM is able to start up. Moving the disk back to the local storage, it fails to access the boot volume again.

I can post a new thread if preferred.

fabian · Jul 13, 2021

mindseye said:
I am also having the exact same issue as this. I have an install on a low end AMD system that was running Proxmox VE 6.4, with a PBS server being the only VM running on this server. The server has a single SSD with the default partition layout. I upgraded to 7.0, following the upgrade path specified without errors. Upon boot, I am no longer able to boot my VM with the VM stating that it cannot find a boot device. I see no errors logged on the Proxmox side. I created a new VM and it also cannot access the disk. lvdisplay shows the LV groups. If I move the disk from the local storage to a NFS share hosted on a Synology, the VM is able to start up. Moving the disk back to the local storage, it fails to access the boot volume again.

I can post a new thread if preferred.

could you attach your full /etc/lvm/lvm.conf and the log of the failed VM start?

mindseye · Jul 17, 2021

I have attached my /etc/lvm/lvm.conf file. It appears to be a standard default file.

I am not sure which logs you are asking for. Please direct me to it and I will attach it. In the console, when I start a VM, all I see is "TASK OK". The syslog shows the following when I start the VM:


Jul 16 21:04:54 lonestar kernel: [718543.812660] device tap200i0 entered promiscuous mode
Jul 16 21:04:54 lonestar kernel: [718544.047105] fwbr200i0: port 1(fwln200i0) entered blocking state
Jul 16 21:04:54 lonestar kernel: [718544.047113] fwbr200i0: port 1(fwln200i0) entered disabled state
Jul 16 21:04:54 lonestar kernel: [718544.047337] device fwln200i0 entered promiscuous mode
Jul 16 21:04:54 lonestar kernel: [718544.047439] fwbr200i0: port 1(fwln200i0) entered blocking state
Jul 16 21:04:54 lonestar kernel: [718544.047444] fwbr200i0: port 1(fwln200i0) entered forwarding state
Jul 16 21:04:54 lonestar kernel: [718544.068802] vmbr0: port 3(fwpr200p0) entered blocking state
Jul 16 21:04:54 lonestar kernel: [718544.068809] vmbr0: port 3(fwpr200p0) entered disabled state
Jul 16 21:04:54 lonestar kernel: [718544.069030] device fwpr200p0 entered promiscuous mode
Jul 16 21:04:54 lonestar kernel: [718544.069121] vmbr0: port 3(fwpr200p0) entered blocking state
Jul 16 21:04:54 lonestar kernel: [718544.069125] vmbr0: port 3(fwpr200p0) entered forwarding state
Jul 16 21:04:54 lonestar kernel: [718544.088969] fwbr200i0: port 2(tap200i0) entered blocking state
Jul 16 21:04:54 lonestar kernel: [718544.088977] fwbr200i0: port 2(tap200i0) entered disabled state
Jul 16 21:04:54 lonestar kernel: [718544.089288] fwbr200i0: port 2(tap200i0) entered blocking state
Jul 16 21:04:54 lonestar kernel: [718544.089293] fwbr200i0: port 2(tap200i0) entered forwarding state

Here is my disk layout:


[root@lonestar ~]$ lsblk
NAME                         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                            8:0    0 223.6G  0 disk
├─sda1                         8:1    0  1007K  0 part
├─sda2                         8:2    0   512M  0 part
└─sda3                         8:3    0 223.1G  0 part
  ├─pve-swap                 253:0    0     7G  0 lvm  [SWAP]
  ├─pve-root                 253:1    0  55.8G  0 lvm  /
  ├─pve-data_tmeta           253:2    0   1.4G  0 lvm 
  │ └─pve-data-tpool         253:4    0 141.4G  0 lvm 
  │   ├─pve-data             253:5    0 141.4G  1 lvm 
  │   ├─pve-vm--105--disk--0 253:6    0    32G  0 lvm 
  │   └─pve-vm--200--disk--1 253:8    0    32G  0 lvm 
  └─pve-data_tdata           253:3    0 141.4G  0 lvm 
    └─pve-data-tpool         253:4    0 141.4G  0 lvm 
      ├─pve-data             253:5    0 141.4G  1 lvm 
      ├─pve-vm--105--disk--0 253:6    0    32G  0 lvm 
      └─pve-vm--200--disk--1 253:8    0    32G  0 lvm 
sr0                           11:0    1  1024M  0 rom 
[root@lonestar ~]$

fabian · Jul 19, 2021

so the VM starts, but does not boot?

please post the VM config and output of lvs
from which version did you upgrade?

mindseye · Jul 19, 2021

Output of lvs:


[root@lonestar ~]$ lvs
  LV            VG  Attr       LSize    Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  data          pve twi-aotz-- <141.43g             5.93   1.40                            
  root          pve -wi-ao----   55.75g                                                    
  swap          pve -wi-ao----    7.00g                                                    
  vm-105-disk-0 pve Vwi-a-tz--   32.00g data        26.20                                  
  vm-200-disk-1 pve Vwi-a-tz--   32.00g data        0.00

VM Config:


agent: 1,fstrim_cloned_disks=1
boot: order=ide2;virtio0
cores: 2
ide2: none,media=cdrom
machine: pc-i440fx-5.2
memory: 6144
name: PBS2
net0: virtio=EE:69:FE:C9:A4:5C,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
scsihw: virtio-scsi-pci
smbios1: uuid=60d93504-f029-4cd0-ad9e-f7e1b91a70d0
sockets: 1
startup: up=60
unused0: local-lvm:vm-105-disk-0
virtio0: HC3:105/vm-105-disk-0.qcow2,size=32G
vmgenid: 5b9ec4ce-4311-4115-b4df-4935f358a35b

I upgraded from the latest 6.4 release. It was fully updated, and rebooted before the upgrade to 7.0.

You are correct, the VM boots, but it is unable to find the hard disk. I restored the VM from backup to a new VM ID (200) and that has the same problem. Moving the disk image to a NFS share via the "move disk" button in the GUI allows the VM to boot and access the hard drive. When the VM boots with the disk on the local-lvm storage, it says that the hard drive is inaccessable. Booting off of a Parted Magic ISO, GParted says "Could not stat device /dev/mapper/no block devices found - no such file or directory.", followed by "Input/Output error during read on /dev/vda". Clicking Ignore dismisses the errors, and GParted shows /dev/vda with 32GB unallocated. Any operations on the disk, like creating a partition table gives the "Input/Output error during read on /dev/vda" error message again.

fabian · Jul 20, 2021

but the block device on the host side is there - else lvs output would look different and the VM wouldn't even start.. is there anything else in the logs surrounding the VM start? there should be more than just the network device messages from the kernel..

mindseye · Jul 20, 2021

fabian said:
but the block device on the host side is there - else lvs output would look different and the VM wouldn't even start.. is there anything else in the logs surrounding the VM start? there should be more than just the network device messages from the kernel..

I know it is odd, but that is what is happening. The VM sees the HDD, but it cannot read/write it, yet on NFS storage it can. I am unable to locate logs, other than syslog. I posted the syslog output from when the VM started.

Please be specific as to which logs you want to see, and I will post them.

fabian · Jul 21, 2021

anything visible if you start the/a VM in the foreground:

stop VM
qm showcmd --pretty
remove the line with --daemonize
run the resulting command - the VM process should stay in the foreground and print errors to the terminal

also, are all the affected VMs using virtio? can you try with virtio-scsi?

mindseye · Jul 21, 2021

fabian said:
anything visible if you start the/a VM in the foreground:

stop VM

qm showcmd --pretty

remove the line with --daemonize

run the resulting command - the VM process should stay in the foreground and print errors to the terminal

also, are all the affected VMs using virtio? can you try with virtio-scsi?

This is the only VM on the Proxmox server. It's sole purpose is to run a PBS instance. I have created additional VMs to test, and none of them can access the block device.

I have taken the VM, detached the HDD and re-added it as SCSI, IDE, SATA and VIRTIO. Each time I went into the Options tab to make sure that the block device is in the boot order. Each time, I get the same results, inaccessable boot device.

The VM console shows this:


SeaBIOS (version rel-1.14.0.0-g155821a1990b-prebuilt.qemu.org
Machine UUID 60d93504-f029-4cd0-ad9e-f7e1b91a790d0
Booting from DVD/CD...
Boot failed: Could not read from CDROM (code 0003)
Booting from Hard Disk...
Boot failed: not a bootable disk

No bootable device. Retrying in 1 seconds.

Running the VM from the foreground makes no difference, there are no errors logged. I stopped the VM via the stop button in the GUI. This is the full console capture:


[root@lonestar ~]$ /usr/bin/kvm \
  -id 200 \
  -name PBS2 \
  -no-shutdown \
  -chardev 'socket,id=qmp,path=/var/run/qemu-server/200.qmp,server=on,wait=off' \
  -mon 'chardev=qmp,mode=control' \
  -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' \
  -mon 'chardev=qmp-event,mode=control' \
  -pidfile /var/run/qemu-server/200.pid \
  -smbios 'type=1,uuid=60d93504-f029-4cd0-ad9e-f7e1b91a70d0' \
  -smp '2,sockets=1,cores=2,maxcpus=2' \
  -nodefaults \
  -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
  -vnc 'unix:/var/run/qemu-server/200.vnc,password=on' \
  -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep \
  -m 6144 \
  -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' \
  -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' \
  -device 'vmgenid,guid=61f30508-c3e1-440e-899f-a0254755a8d2' \
  -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' \
  -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' \
  -device 'VGA,id=vga,bus=pci.0,addr=0x2' \
  -chardev 'socket,path=/var/run/qemu-server/200.qga,server=on,wait=off,id=qga0' \
  -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' \
  -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' \
  -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' \
  -iscsi 'initiator-name=iqn.1993-08.org.debian:01:5fb123b7e68' \
  -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' \
  -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=100' \
  -device 'ahci,id=ahci0,multifunction=on,bus=pci.0,addr=0x7' \
  -drive 'file=/dev/pve/vm-200-disk-1,if=none,id=drive-sata0,format=raw,cache=none,aio=io_uring,detect-zeroes=on' \
  -device 'ide-hd,bus=ahci0.0,drive=drive-sata0,id=sata0,bootindex=101' \
  -netdev 'type=tap,id=net0,ifname=tap200i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
  -device 'virtio-net-pci,mac=EE:69:FE:C9:A4:5C,netdev=net0,bus=pci.0,addr=0x12,id=net0' \
  -machine 'type=pc+pve0'
[root@lonestar ~]$

fabian · Jul 22, 2021

can you try modifying the "aio" setting of your disk to 'native' instead of 'io_uring'?

mindseye · Jul 28, 2021

The problem seems to have resolved itself. While the problem existed, I was updating packages daily. If there was a kernel update, I rebooted. After several days of reboots, the VM started to boot off of the local-lvm storage. I didn't make any changes, it just started working. The existing VM105, I moved the disk back from NFS to local-lvm and it booted.

I am at a loss to what it could be, unless it was some kernel or recept package bug.

Search

Search

After Proxmox upgrade from 6 to 7, all vas are down due not found disks

franciscopaniskaseker

Member

franciscopaniskaseker

Member

franciscopaniskaseker

Member

franciscopaniskaseker

Member

fabian

Proxmox Staff Member

mindseye

New Member

fabian

Proxmox Staff Member

mindseye

New Member

Attachments

fabian

Proxmox Staff Member

mindseye

New Member

fabian

Proxmox Staff Member

mindseye

New Member

fabian

Proxmox Staff Member

mindseye

New Member

fabian

Proxmox Staff Member

mindseye

New Member