PVE v6 - error attempt to read or write outside of disk 'hd0' after dist-upgrade

nutnut

New Member
May 19, 2020
9
1
1
36
Hello,

after new kernel install on one of our Proxmox servers each time i want to boot into that kernel version i get this error:
Code:
error attempt to read or write outside of disk 'hd0'

I did try to install older kernels but non of them seems to work, they all print out this error at the boot time and i'm not able to boot the hypervisor. Funny part is that older kernel that was already present on the server is working and i can boot without a problem pve-kernel-5.4.78-2-pve but if i install older kernel like pve-kernel-5.4.44-2-pve server does not boot. I guess it's something with the process of how initramfs is built.

Some information about the system that i'm currently using but if you need more information i'm more than happy to provide

Proxmox version:
Code:
root@hypervisor:~# pveversion --verbose
proxmox-ve: 6.4-1 (running kernel: 5.4.78-2-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-5.4: 6.4-10
pve-kernel-helper: 6.4-10
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.151-1-pve: 5.4.151-1
pve-kernel-5.4.143-1-pve: 5.4.143-1
pve-kernel-5.4.124-1-pve: 5.4.124-2
pve-kernel-5.4.119-1-pve: 5.4.119-1
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.44-2-pve: 5.4.44-2
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.22-pve1~bpo10+1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-4
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.13-2
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.3-2
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.6-pve1~bpo10+1

Disk layout:
Code:
root@hypervisor:~# lsblk
NAME                           MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                              8:0    0  2.6T  0 disk
├─sda1                           8:1    0 1007K  0 part
├─sda2                           8:2    0  512M  0 part
└─sda3                           8:3    0  2.6T  0 part
  ├─pve-swap                   253:0    0   16G  0 lvm  [SWAP]
  ├─pve-root                   253:1    0   35G  0 lvm  /
  ├─pve-data_tmeta             253:2    0 15.8G  0 lvm 
  │ └─pve-data-tpool           253:4    0  2.3T  0 lvm 
  │   ├─pve-data               253:5    0  2.3T  0 lvm 
  │   └─pve-vm--30026--disk--0 253:6    0   80G  0 lvm 
  └─pve-data_tdata             253:3    0  2.3T  0 lvm 
    └─pve-data-tpool           253:4    0  2.3T  0 lvm 
      ├─pve-data               253:5    0  2.3T  0 lvm 
      └─pve-vm--30026--disk--0 253:6    0   80G  0 lvm

List of installed kernels:
Code:
root@hypervisor:~# dpkg -l |grep pve-kernel
ii  pve-firmware                         3.3-2                         all          Binary firmware code for the pve-kernel
ii  pve-kernel-5.3                       6.1-6                         all          Latest Proxmox VE Kernel Image
ii  pve-kernel-5.3.10-1-pve              5.3.10-1                      amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-5.3.18-3-pve              5.3.18-3                      amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-5.4                       6.4-10                        all          Latest Proxmox VE Kernel Image
ii  pve-kernel-5.4.119-1-pve             5.4.119-1                     amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-5.4.124-1-pve             5.4.124-2                     amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-5.4.143-1-pve             5.4.143-1                     amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-5.4.151-1-pve             5.4.151-1                     amd64        The Proxmox PVE Kernel Image
rc  pve-kernel-5.4.44-1-pve              5.4.44-1                      amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-5.4.44-2-pve              5.4.44-2                      amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-5.4.78-2-pve              5.4.78-2                      amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-helper                    6.4-10                        all          Function for various kernel maintenance tasks.

If anybody has a clue what is going on, your help would be much appreciated!

Thank you.

Best regards,
 

Attachments

  • Screenshot from 2021-11-29 12-35-23.png
    Screenshot from 2021-11-29 12-35-23.png
    3 KB · Views: 21
Running
Code:
update-initramfs -u
also did not help.

Could this be present on any other version of Proxmox VE like v7 or is this specific to v6?

Best regards,
 
this is usually a symptom of a bios/raid controller FW bug where they present wrong size information about the disk to the bootloader, and if the kernel/initrd/grub files are beyond a certain (perfectly valid!) area on the disk, booting fails
 
Thank you for your answer!

Strange thing is that older kernels on the server still work without a problem, only new ones show that error (the ones that initramfs was built, for example pve-kernel-5.4.119-1-pve was working fine until i've tried to reinstall kernel using command apt-get reinstall pve-kernel-5.4.119-1-pve).

Nothing strange about the output here:
Code:
root@hypervisor:~# apt-get reinstall pve-kernel-5.4.119-1-pve
Reading package lists... Done
Building dependency tree       
Reading state information... Done
0 upgraded, 0 newly installed, 1 reinstalled, 0 to remove and 0 not upgraded.
Need to get 0 B/60.6 MB of archives.
After this operation, 0 B of additional disk space will be used.
(Reading database ... 101365 files and directories currently installed.)
Preparing to unpack .../pve-kernel-5.4.119-1-pve_5.4.119-1_amd64.deb ...
Unpacking pve-kernel-5.4.119-1-pve (5.4.119-1) over (5.4.119-1) ...
Setting up pve-kernel-5.4.119-1-pve (5.4.119-1) ...
Examining /etc/kernel/postinst.d.
run-parts: executing /etc/kernel/postinst.d/apt-auto-removal 5.4.119-1-pve /boot/vmlinuz-5.4.119-1-pve
run-parts: executing /etc/kernel/postinst.d/initramfs-tools 5.4.119-1-pve /boot/vmlinuz-5.4.119-1-pve
update-initramfs: Generating /boot/initrd.img-5.4.119-1-pve
run-parts: executing /etc/kernel/postinst.d/proxmox-auto-removal 5.4.119-1-pve /boot/vmlinuz-5.4.119-1-pve
run-parts: executing /etc/kernel/postinst.d/zz-proxmox-boot 5.4.119-1-pve /boot/vmlinuz-5.4.119-1-pve
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.
run-parts: executing /etc/kernel/postinst.d/zz-update-grub 5.4.119-1-pve /boot/vmlinuz-5.4.119-1-pve
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.4.157-1-pve
Found initrd image: /boot/initrd.img-5.4.157-1-pve
Found linux image: /boot/vmlinuz-5.4.143-1-pve
Found initrd image: /boot/initrd.img-5.4.143-1-pve
Found linux image: /boot/vmlinuz-5.4.124-1-pve
Found initrd image: /boot/initrd.img-5.4.124-1-pve
Found linux image: /boot/vmlinuz-5.4.119-1-pve
Found initrd image: /boot/initrd.img-5.4.119-1-pve
Found linux image: /boot/vmlinuz-5.4.78-2-pve
Found initrd image: /boot/initrd.img-5.4.78-2-pve
Found linux image: /boot/vmlinuz-5.4.44-2-pve
Found initrd image: /boot/initrd.img-5.4.44-2-pve
Found linux image: /boot/vmlinuz-5.3.18-3-pve
Found initrd image: /boot/initrd.img-5.3.18-3-pve
Found linux image: /boot/vmlinuz-5.3.10-1-pve
Found initrd image: /boot/initrd.img-5.3.10-1-pve
Found memtest86+ image: /boot/memtest86+.bin
Found memtest86+ multiboot image: /boot/memtest86+_multiboot.bin
done

BIOS has not been upgraded for years and the same goes for FW on RAID controller, so this hit us out of nowhere.

Any suggestions what should we do next to overcome this problem?
 
if you have an empty slot, move /boot to a small disk directly attached to the MB if possible. the bug will trigger randomly if any of the files needed to boot a certain kernel image is beyond a certain offset in the disk - so each rewrite of the initrd/kernel images or grub files is a risk.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!