VM is not booting after kernel/proxmox binaries upgrade

junixbr

New Member
Jun 1, 2022
5
0
1
Hello guys.

After an apt dist-upgrade try, my Windows 11 vm stops to start.

When I run "qm start 100", first I've got the message:
Bash:
no such logical volume pve/data
Bus error
(but I have logical volume called pve/data)
Bash:
  --- Logical volume ---
  LV Name                data
  VG Name                pve
  LV UUID                xQYZIv-3tsZ-H03b-0DHi-RHyr-pHPu-WGKofm
  LV Write Access        read/write (activated read only)
  LV Creation host, time proxmox, 2022-02-14 21:32:35 -0300
  LV Pool metadata       data_tmeta
  LV Pool data           data_tdata
  LV Status              available
  # open                 0
  LV Size                <320.10 GiB
  Allocated pool data    41.43%
  Allocated metadata     2.43%
  Current LE             81945
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:7

Then... a lot of following messages:
Bash:
dm-1: writeback error on inode ...

In a few seconds after that, the messages change:
Bash:
systemd-journald: Failed to open /var/log/journal/...: Input/output error

After this messages, I've lost the shell control.

My setup is:

Bash:
root@pve:~# pveversion -v
proxmox-ve: 7.2-1 (running kernel: 5.15.35-1-pve)
pve-manager: 7.2-4 (running version: 7.2-4/ca9d43cc)
pve-kernel-5.15: 7.2-3
pve-kernel-helper: 7.2-3
pve-kernel-5.13: 7.1-9
pve-kernel-5.15.35-1-pve: 5.15.35-3
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph-fuse: 15.2.15-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.1-8
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-2
libpve-storage-perl: 7.2-4
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.1-1
proxmox-backup-file-restore: 2.2.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-1
pve-container: 4.2-1
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-2
pve-ha-manager: 3.3-4
pve-i18n: 2.7-2
pve-qemu-kvm: 6.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1

Bash:
root@pve:~# qm config 100
acpi: 1
agent: 1
args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,kvm=off'
balloon: 0
bios: ovmf
boot: order=scsi0;net0
cores: 4
cpu: host,hidden=1
efidisk0: local-lvm:vm-100-disk-3,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:01:00,pcie=1,romfile=Ellesmere.rom
hostpci1: 0000:08:00.3,pcie=1
hostpci2: 0000:08:00.4,pcie=1
hostpci3: 0000:02:00,pcie=1
machine: pc-q35-6.1
memory: 10240
meta: creation-qemu=6.1.0,ctime=1644887265
name: windows-11
net0: virtio=92:FB:3F:1E:64:AF,bridge=vmbr0
numa: 0
ostype: win11
parent: win_11_20220409
scsi0: local-lvm:vm-100-disk-0,size=256G
scsi1: local-lvm2:vm-100-disk-0,size=1T
scsihw: virtio-scsi-pci
smbios1: uuid=5cbfc3bd-be64-436c-a76a-1650a91cd232
sockets: 1
tpmstate0: local-lvm:vm-100-disk-2,size=4M,version=v2.0
vga: none
vmgenid: 9a20e63e-e144-4bbd-b44a-c5cf772b4162

My disks:
Bash:
root@pve:~# fdisk -l
Disk /dev/sda: 447.13 GiB, 480103981056 bytes, 937703088 sectors
Disk model: KINGSTON SA400S3
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 0DDE3CCC-A15F-4F12-81BF-DC4CD5BCD412

Device       Start       End   Sectors   Size Type
/dev/sda1       34      2047      2014  1007K BIOS boot
/dev/sda2     2048   1050623   1048576   512M EFI System
/dev/sda3  1050624 937703054 936652431 446.6G Linux LVM


Disk /dev/sdb: 1.82 TiB, 2000398934016 bytes, 3907029168 sectors
Disk model: ST2000DM008-2FR1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 200CB18C-50DC-C649-8D0D-4728FE991F97

Device     Start        End    Sectors  Size Type
/dev/sdb1   2048 3907029134 3907027087  1.8T Linux LVM

PS.: The Proxmox installation and the main Windows 11 disk are in the same physical disk (I don't know if is a problem).
I've tried to run xfs_repair on /dev/mapper/pve-root, but nothing to fix (I believe that is not a xfs problem).

Some tip?
 
Last edited:
can you post your journal e.g. from the last time you booted ? (journalctl -b)
 
An observation...

When I run:
Code:
strace -x qm start 100
I notice that the process is finished (killed by SIGBUS) and the disks are suspended, including the sata hard disk that is on my second vm.

Bash:
...
rt_sigaction(SIGRT_32, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigaction(SIGABRT, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigaction(SIGCHLD, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f4aed156140}, 8) = 0
rt_sigaction(SIGIO, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
--- SIGBUS {si_signo=SIGBUS, si_code=BUS_ADRERR, si_addr=0x7f4ae24d55e0} ---
+++ killed by SIGBUS +++
Bus error

But why it's happening, I have no idea.
 
well the logs look fine, no error in sight, are you sure it's from the same system ? where did you see the errors ?

also can you post the output from
Code:
pvs
vgs
lvs
mount
?
 
Something when I start the Windows 11 vm is going wrong, but the Linux vm is working normally.

Bash:
root@pve:~# pvs
  PV         VG    Fmt  Attr PSize    PFree
  /dev/sda3  pve   lvm2 a--  <446.63g 15.99g
  /dev/sdb1  sata1 lvm2 a--    <1.82t     0

Bash:
root@pve:~# vgs
  VG    #PV #LV #SN Attr   VSize    VFree
  pve     1   9   0 wz--n- <446.63g 15.99g
  sata1   1   5   0 wz--n-   <1.82t     0

Bash:
root@pve:~# lvs
  LV                                 VG    Attr       LSize    Pool  Origin        Data%  Meta%  Move Log Cpy%Sync Convert
  data                               pve   twi-aotz-- <320.10g                     41.43  2.43
  root                               pve   -wi-ao----   96.00g
  snap_vm-100-disk-0_win_11_20220409 pve   Vri---tz-k  256.00g data  vm-100-disk-0
  snap_vm-100-disk-2_win_11_20220409 pve   Vri---tz-k    4.00m data  vm-100-disk-2
  snap_vm-100-disk-3_win_11_20220409 pve   Vri---tz-k    4.00m data  vm-100-disk-3
  swap                               pve   -wi-ao----    8.00g
  vm-100-disk-0                      pve   Vwi-a-tz--  256.00g data                33.38
  vm-100-disk-2                      pve   Vwi-a-tz--    4.00m data                1.56
  vm-100-disk-3                      pve   Vwi-a-tz--    4.00m data                14.06
  data1                              sata1 twi-aotz--   <1.82t                     38.99  29.19
  snap_vm-100-disk-0_win_11_20220409 sata1 Vri---tz-k    1.00t data1 vm-100-disk-0
  vm-100-disk-0                      sata1 Vwi-a-tz--    1.00t data1               66.92
  vm-101-disk-0                      sata1 Vwi-a-tz--  256.00g data1               13.73
  vm-101-disk-2                      sata1 Vwi-a-tz--    4.00m data1               25.00

Bash:
root@pve:~# mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=8129964k,nr_inodes=2032491,mode=755,inode64)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,noexec,relatime,size=1632888k,mode=755,inode64)
/dev/mapper/pve-root on / type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,inode64)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k,inode64)
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
bpf on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=28472)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
tracefs on /sys/kernel/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)
sunrpc on /run/rpc_pipefs type rpc_pipefs (rw,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)
configfs on /sys/kernel/config type configfs (rw,nosuid,nodev,noexec,relatime)
/dev/sda2 on /boot/efi type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)
lxcfs on /var/lib/lxcfs type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
/dev/fuse on /etc/pve type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=1632884k,nr_inodes=408221,mode=700,inode64)
 
Only a wild guess, but is it possible, that your PCI(e)-device-IDs have changed after the (kernel-)upgrade and you are now passing the wrong devices through your Windows-VM?