Proxmox not using new kernel

SpinningRust · Jul 16, 2021

Hello everyone,

for some nefarious reason, one of my nodes (3 node cluster with external VM storage) will not boot into a newer kernel after kernel version 5.4.78. While booting up, the only options in the Proxmox-boot tool screen are kernel verison 5.4.78 and 5.4.73. An ls in /boot however, shows newer versions (see second spoiler).
Just now i upgraded all nodes to PVE 7 and while the other two nodes adopted the new kernel without problems, this node didn't. All nodes share the exact same Hardware and should also have all the same software/configuration. During the upgrade from PVE 6 to 7 no errors occured on any nodes.
i'll attach the package versions and would be happy to any advice on how to solve this weird error.

Regards

John Tanner

SQL:

proxmox-ve: 7.0-2 (running kernel: 5.4.78-2-pve)
pve-manager: 7.0-9 (running version: 7.0-9/228c9caa)
pve-kernel-helper: 7.0-4
pve-kernel-5.11: 7.0-3
pve-kernel-5.4: 6.4-4
pve-kernel-5.11.22-1-pve: 5.11.22-2
pve-kernel-5.4.124-1-pve: 5.4.124-1
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 14.2.21-1
corosync: 3.1.2-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: not correctly installed
ifupdown2: 3.0.0-1+pve6
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.21-pve1
libproxmox-acme-perl: 1.1.1
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-4
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-2
libpve-storage-perl: 7.0-9
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-2
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.4-1
proxmox-backup-file-restore: 2.0.4-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-4
pve-cluster: 7.0-3
pve-container: 4.0-8
pve-docs: 7.0-5
pve-edk2-firmware: 3.20200531-1
pve-firewall: 4.2-2
pve-firmware: 3.2-4
pve-ha-manager: 3.3-1
pve-i18n: 2.4-1
pve-qemu-kvm: 6.0.0-2
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-10
smartmontools: 7.2-pve2
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.4-pve1

Code:

drwxr-xr-x  5 root root   23 Jul 16 10:22 ./
drwxr-xr-x 19 root root   25 Mar 18 13:27 ../
-rw-r--r--  1 root root 247K Jul  2 16:22 config-5.11.22-1-pve
-rw-r--r--  1 root root 232K Jun 23 13:47 config-5.4.124-1-pve
-rw-r--r--  1 root root 232K Nov 16  2020 config-5.4.73-1-pve
-rw-r--r--  1 root root 232K Dec  3  2020 config-5.4.78-2-pve
drwxr-xr-x  2 root root    2 Mar 12 14:32 efi/
drwxr-xr-x  5 root root    8 Jul 16 10:22 grub/
-rw-r--r--  1 root root  56M Jul 16 09:19 initrd.img-5.11.22-1-pve
-rw-r--r--  1 root root  47M Jul 15 08:48 initrd.img-5.4.124-1-pve
-rw-r--r--  1 root root  41M Mar 12 14:37 initrd.img-5.4.73-1-pve
-rw-r--r--  1 root root  41M Apr  9 10:56 initrd.img-5.4.78-2-pve
-rw-r--r--  1 root root 179K Aug 15  2019 memtest86+.bin
-rw-r--r--  1 root root 181K Aug 15  2019 memtest86+_multiboot.bin
drwxr-xr-x  2 root root    8 Jul 16 09:19 pve/
-rw-r--r--  1 root root 5.5M Jul  2 16:22 System.map-5.11.22-1-pve
-rw-r--r--  1 root root 4.6M Jun 23 13:47 System.map-5.4.124-1-pve
-rw-r--r--  1 root root 4.5M Nov 16  2020 System.map-5.4.73-1-pve
-rw-r--r--  1 root root 4.6M Dec  3  2020 System.map-5.4.78-2-pve
-rw-r--r--  1 root root  14M Jul  2 16:22 vmlinuz-5.11.22-1-pve
-rw-r--r--  1 root root  12M Jun 23 13:47 vmlinuz-5.4.124-1-pve
-rw-r--r--  1 root root  12M Nov 16  2020 vmlinuz-5.4.73-1-pve
-rw-r--r--  1 root root  12M Dec  3  2020 vmlinuz-5.4.78-2-pve

fabian · Jul 16, 2021

could you post the output of proxmox-boot-tool status from a working and a non-working node?

SpinningRust · Jul 16, 2021

fabian said:
could you post the output of proxmox-boot-tool status from a working and a non-working node?

"stuck" or non-working node:

Code:

Re-executing '/usr/sbin/proxmox-boot-tool' in new private mount namespace..
System currently booted with uefi
WARN: /dev/disk/by-uuid/D6CF-1F28 does not exist - clean '/etc/kernel/proxmox-boot-uuids'! - skipping
D6D0-5D81 is configured with: uefi (versions: 5.11.22-1-pve, 5.4.124-1-pve, 5.4.78-2-pve)

working node:

Code:

Re-executing '/usr/sbin/proxmox-boot-tool' in new private mount namespace..
System currently booted with uefi
0ED0-BC90 is configured with: uefi (versions: 5.11.22-1-pve, 5.4.124-1-pve)
0ED1-4864 is configured with: uefi (versions: 5.11.22-1-pve, 5.4.124-1-pve)

The non-working node does indeed look unhealthy but how do i fix this? the rpool in question is healthy

Stoiko Ivanov · Jul 16, 2021

Please also provide the outputs of:
* `lsblk`
* ` blkid /dev/disk/by-id/*`

SpinningRust · Jul 16, 2021

Stoiko Ivanov said:
Please also provide the outputs of:
* `lsblk`
* ` blkid /dev/disk/by-id/*`

Code:

NAME                        MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                           8:0    0   1.5T  0 disk 
└─35000cca050abd018         253:0    0   1.5T  0 mpath
  ├─35000cca050abd018-part1 253:2    0   1.5T  0 part 
  └─35000cca050abd018-part9 253:3    0     8M  0 part 
sdb                           8:16   0   1.5T  0 disk 
└─35000cca050abc524         253:1    0   1.5T  0 mpath
  ├─35000cca050abc524-part1 253:5    0   1.5T  0 part 
  └─35000cca050abc524-part9 253:6    0     8M  0 part 
sdc                           8:32   0 186.3G  0 disk 
└─sdc3                        8:35   0 185.8G  0 part 
sdd                           8:48   0 186.3G  0 disk 
├─sdd1                        8:49   0  1007K  0 part 
├─sdd2                        8:50   0   512M  0 part 
└─sdd3                        8:51   0 185.8G  0 part 
sde                           8:64   0  13.6T  0 disk 
└─hp-san                    253:4    0  13.6T  0 mpath
  ├─PLS-vm--103--disk--1    253:7    0     4M  0 lvm   
  ├─PLS-vm--100--disk--0    253:8    0   128G  0 lvm   
  ├─PLS-vm--103--disk--0    253:9    0   128G  0 lvm   
  ├─PLS-vm--101--disk--0    253:10   0     8G  0 lvm   
  ├─PLS-vm--104--disk--0    253:11   0    32G  0 lvm   
  ├─PLS-vm--105--disk--0    253:12   0    16G  0 lvm   
  ├─PLS-vm--102--disk--0    253:13   0    32G  0 lvm   
  ├─PLS-vm--106--disk--0    253:14   0  1000G  0 lvm   
  ├─PLS-vm--107--disk--0    253:15   0    32G  0 lvm   
  ├─PLS-vm--108--disk--0    253:16   0    64G  0 lvm   
  ├─PLS-vm--109--disk--0    253:17   0    32G  0 lvm   
  ├─PLS-vm--110--disk--0    253:18   0    32G  0 lvm   
  ├─PLS-vm--110--disk--1    253:19   0   128G  0 lvm   
  ├─PLS-vm--110--disk--2    253:20   0    32G  0 lvm   
  ├─PLS-vm--110--disk--3    253:21   0   128G  0 lvm   
  ├─PLS-vm--111--disk--0    253:22   0    32G  0 lvm   
  ├─PLS-vm--111--disk--1    253:23   0    32G  0 lvm   
  └─PLS-vm--110--disk--4    253:24   0    32G  0 lvm   
zd0                         230:0    0   128G  0 disk 
├─zd0p1                     230:1    0     1M  0 part 
├─zd0p2                     230:2    0     1G  0 part 
└─zd0p3                     230:3    0   127G  0 part 
zd16                        230:16   0   128G  0 disk 
├─zd16p1                    230:17   0     1M  0 part 
├─zd16p2                    230:18   0     1G  0 part 
└─zd16p3                    230:19   0   127G  0 part

Code:

root@middle:~# blkid /dev/disk/by-id/
root@middle:~#

Stoiko Ivanov · Jul 16, 2021

JohnTanner said:
Spoiler: blkid /dev/disk/by-id/

the '*' was missing .. hence no output

could you also provide the output of both the lsblk and blkid commands also from the working node?

How is the system setup?
sda and sdb look like they're multipathed on a iSCSI/FC san ?
could it be that you set the system was setup with a ZFS RAID1 and at some point you replaced one of the disks?

SpinningRust · Jul 16, 2021

Stoiko Ivanov said:
the '*' was missing .. hence no output

could you also provide the output of both the lsblk and blkid commands also from the working node?

How is the system setup?
sda and sdb look like they're multipathed on a iSCSI/FC san ?
could it be that you set the system was setup with a ZFS RAID1 and at some point you replaced one of the disks?

Code:

sda                                   8:0    0   1.5T  0 disk 
└─hp-san                            253:0    0   1.5T  0 mpath
  ├─hp-san-part1                    253:2    0   1.5T  0 part 
  └─hp-san-part9                    253:3    0     8M  0 part 
sdb                                   8:16   0   1.5T  0 disk 
└─35000cca050abcb98                 253:1    0   1.5T  0 mpath
  ├─35000cca050abcb98-part1         253:5    0   1.5T  0 part 
  └─35000cca050abcb98-part9         253:6    0     8M  0 part 
sdc                                   8:32   0 186.3G  0 disk 
├─sdc1                                8:33   0  1007K  0 part 
├─sdc2                                8:34   0   512M  0 part 
└─sdc3                                8:35   0 185.8G  0 part 
sdd                                   8:48   0 186.3G  0 disk 
├─sdd1                                8:49   0  1007K  0 part 
├─sdd2                                8:50   0   512M  0 part 
└─sdd3                                8:51   0 185.8G  0 part 
sde                                   8:64   0  13.6T  0 disk 
└─3600c0ff000267d927f555b6001000000 253:4    0  13.6T  0 mpath
  ├─PLS-vm--103--disk--1            253:7    0     4M  0 lvm   
  ├─PLS-vm--100--disk--0            253:8    0   128G  0 lvm   
  ├─PLS-vm--103--disk--0            253:9    0   128G  0 lvm   
  ├─PLS-vm--101--disk--0            253:10   0     8G  0 lvm   
  ├─PLS-vm--104--disk--0            253:11   0    32G  0 lvm   
  ├─PLS-vm--105--disk--0            253:12   0    16G  0 lvm   
  ├─PLS-vm--102--disk--0            253:13   0    32G  0 lvm   
  ├─PLS-vm--106--disk--0            253:14   0  1000G  0 lvm   
  ├─PLS-vm--107--disk--0            253:15   0    32G  0 lvm   
  ├─PLS-vm--108--disk--0            253:16   0    64G  0 lvm   
  ├─PLS-vm--109--disk--0            253:17   0    32G  0 lvm   
  ├─PLS-vm--110--disk--0            253:18   0    32G  0 lvm   
  ├─PLS-vm--110--disk--1            253:19   0   128G  0 lvm   
  ├─PLS-vm--110--disk--2            253:20   0    32G  0 lvm   
  ├─PLS-vm--110--disk--3            253:21   0   128G  0 lvm   
  ├─PLS-vm--111--disk--0            253:22   0    32G  0 lvm   
  ├─PLS-vm--111--disk--1            253:23   0    32G  0 lvm   
  └─PLS-vm--110--disk--4            253:24   0    32G  0 lvm   
zd0                                 230:0    0   128G  0 disk 
├─zd0p1                             230:1    0     1M  0 part 
├─zd0p2                             230:2    0     1G  0 part 
└─zd0p3                             230:3    0   127G  0 part 
zd16                                230:16   0   128G  0 disk 
├─zd16p1                            230:17   0     1M  0 part 
├─zd16p2                            230:18   0     1G  0 part 
└─zd16p3                            230:19   0   127G  0 part

About the system setup:
All 3 nodes are running on a DL380 Gen9, sda and sdb are unused SSDs.
The external storage is a multipathed iSCSI target on a HP MSA 2040 and contains all VM/CT data.
And lastly yes, the problematic node was setup with a ZFS RAID1 and one of the SSDs disappeared quite some time ago. After reinserting it and running a resilver, everything (seemingly) worked again.

The command outputs were too long for a message, therefore please refer to the attached txt file

Stoiko Ivanov · Jul 16, 2021

JohnTanner said:
And lastly yes, the problematic node was setup with a ZFS RAID1 and one of the SSDs disappeared quite some time ago. After reinserting it and running a resilver, everything (seemingly) worked again.

did you follow the guide at:
https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#_zfs_administration
(changing a failed bootable device)?

I'd try to add the 512M vfat partition to the replaced disk and add it with proxmox-boot-tool format/init - as described there

I hope this helps!

SpinningRust · Jul 16, 2021

Stoiko Ivanov said:
did you follow the guide at:
https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#_zfs_administration
(changing a failed bootable device)?

I'd try to add the 512M vfat partition to the replaced disk and add it with proxmox-boot-tool format/init - as described there

I hope this helps!

No unfortunately I didn't follow that guide...
But how do I find out which drive was the failed one? Because I do not remember which one "failed"

Stoiko Ivanov · Jul 16, 2021

JohnTanner said:
No unfortunately I didn't follow that guide...
But how do I find out which drive was the failed one? Because I do not remember which one "failed"

* check the output of `zpool status`

I still think that sda and sdb are not local in the server (but could be wrong)

I'd guess that sdc is the one that was swapped out, and sdd is the one that was in `rpool` from the beginning

as said -based on the command outputs these are just guesses - check with zpool status and then check the partition-tables with `parted` or `fdisk`

I hope this helps!

SpinningRust · Jul 16, 2021

Stoiko Ivanov said:
* check the output of `zpool status`

I still think that sda and sdb are not local in the server (but could be wrong)

I'd guess that sdc is the one that was swapped out, and sdd is the one that was in `rpool` from the beginning

as said -based on the command outputs these are just guesses - check with zpool status and then check the partition-tables with `parted` or `fdisk`

I hope this helps!

Ok i think i have broken the pool now as the proxmox boot tool continiously gives me this error, no matter what device path i choose:

Code:

invalid vdev specification
the following errors must be manually repaired:
/dev/disk/by-id/dm-name-35000cca050abc524-part3 is part of active pool 'rpool'
root@middle:~# proxmox-boot-tool format /dev/disk/by-id/dm-name-35000cca050abc524-part2
UUID="12523253704678969671" SIZE="536870912" FSTYPE="zfs_member" PARTTYPE="c12a7328-f81f-11d2-ba4b-00a0c93ec93b" PKNAME="" MOUNTPOINT=""
E: cannot determine parent device of '/dev/disk/by-id/dm-name-35000cca050abc524-part2' - please provide a partition, not a full disk.

EDIT: device path as in: the same device has several entries in /dev/disk/by-id/ and no matter what I choose it can not find the parent device...

Search

Search

Proxmox not using new kernel

SpinningRust

Active Member

fabian

Proxmox Staff Member

SpinningRust

Active Member

Stoiko Ivanov

Proxmox Staff Member

SpinningRust

Active Member

Stoiko Ivanov

Proxmox Staff Member

SpinningRust

Active Member

Attachments

Stoiko Ivanov

Proxmox Staff Member

SpinningRust

Active Member

Stoiko Ivanov

Proxmox Staff Member

SpinningRust

Active Member

We value your privacy