[SOLVED] cluster with different versions of kernel

R0bin

Member
Dec 6, 2019
27
0
21
34
Montpellier
Hi !
I have a PVE cluster with 3 nodes, ceph and all works fine. Once or twice a month I do a dist-upgrade on all my nodes (each nodes one after one). Hardware is the same on each bare metal server.
recently, my zabbix monitoring alert me because /boot is using more than 80% of free space, on each nodes, after an upgrade.
I have run thing like apt-get autoremove or dpkg --purge to remove old kernels, but when verifying, I see that my kernel used are not the same on each nodes named occ-host-000{1..3} :
Code:
Linux occ-host-0001 5.13.19-4-pve #1 SMP PVE 5.13.19-9 (Mon, 07 Feb 2022 11:01:14 +0100) x86_64 GNU/Linux
Linux occ-host-0002 5.13.19-4-pve #1 SMP PVE 5.13.19-9 (Mon, 07 Feb 2022 11:01:14 +0100) x86_64 GNU/Linux
Linux occ-host-0003 5.15.35-1-pve #1 SMP PVE 5.15.35-3 (Wed, 11 May 2022 07:57:51 +0200) x86_64 GNU/Linux
0003 seems to be late (I dont know why...)

Code:
root@occ-host-0001:~# pveversion
pve-manager/7.2-7/d0dd0e85 (running kernel: 5.13.19-4-pve)
root@occ-host-0002:~# pveversion
pve-manager/7.2-7/d0dd0e85 (running kernel: 5.13.19-4-pve)
root@occ-host-0003:~# pveversion
pve-manager/7.2-7/d0dd0e85 (running kernel: 5.15.35-1-pve)
PVE are in same version

Code:
root@occ-host-0001:~# df -h |grep "/boot"
/dev/sda2                                                        488M  271M  182M  60% /boot
root@occ-host-0002:~# df -h |grep "/boot"
/dev/sda2                                                        488M  394M   59M  88% /boot
root@occ-host-0003:~# df -h |grep "/boot"
/dev/sda2                                                        488M  332M  121M  74% /boot
Used space in all /boot are different

Code:
root@occ-host-0001:~# dpkg --list |grep pve-kernel
ii  pve-firmware                         3.4-2                          all          Binary firmware code for the pve-kernel
ii  pve-kernel-5.13                      7.1-9                          all          Latest Proxmox VE Kernel Image
ii  pve-kernel-5.13.19-4-pve             5.13.19-9                      amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-5.13.19-6-pve             5.13.19-15                     amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-5.15                      7.2-6                          all          Latest Proxmox VE Kernel Image
ii  pve-kernel-5.15.35-1-pve             5.15.35-3                      amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-5.15.39-1-pve             5.15.39-1                      amd64        Proxmox Kernel Image
ii  pve-kernel-helper                    7.2-6                          all          Function for various kernel maintenance tasks.

root@occ-host-0002:~# dpkg --list |grep pve-kernel
ii  pve-firmware                         3.4-2                          all          Binary firmware code for the pve-kernel
ii  pve-kernel-5.13                      7.1-9                          all          Latest Proxmox VE Kernel Image
ii  pve-kernel-5.13.19-2-pve             5.13.19-4                      amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-5.13.19-4-pve             5.13.19-9                      amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-5.13.19-6-pve             5.13.19-15                     amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-5.15                      7.2-6                          all          Latest Proxmox VE Kernel Image
ii  pve-kernel-5.15.35-1-pve             5.15.35-3                      amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-5.15.39-1-pve             5.15.39-1                      amd64        Proxmox Kernel Image
ii  pve-kernel-5.4                       6.4-7                          all          Latest Proxmox VE Kernel Image
ii  pve-kernel-5.4.143-1-pve             5.4.143-1                      amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-helper                    7.2-6                          all          Function for various kernel maintenance tasks.

root@occ-host-0003:~# dpkg --list |grep pve-kernel
ii  pve-firmware                         3.4-2                          all          Binary firmware code for the pve-kernel
ii  pve-kernel-5.13                      7.1-9                          all          Latest Proxmox VE Kernel Image
ii  pve-kernel-5.13.19-2-pve             5.13.19-4                      amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-5.13.19-6-pve             5.13.19-15                     amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-5.15                      7.2-6                          all          Latest Proxmox VE Kernel Image
ii  pve-kernel-5.15.35-1-pve             5.15.35-3                      amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-5.15.39-1-pve             5.15.39-1                      amd64        Proxmox Kernel Image
ii  pve-kernel-5.4                       6.4-7                          all          Latest Proxmox VE Kernel Image
ii  pve-kernel-5.4.143-1-pve             5.4.143-1                      amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-helper                    7.2-6                          all          Function for various kernel maintenance tasks.
Installed kernels are different on each nodes !
What's wrong ? I suspect 003 to use the lst kernel before upgrading to PVE 7 (but not sure). is there a grub issue ? how to check that ?
How to free space on /boot safely ?

Thank you for helping me, for now i'm scared to upgrade again occ-host-0002 kernels (too few space on /boot), and scared to rebbot occ-host-0003 (I don't understand this kernel age).
 
Last edited:
First, having different kernels is not a problem per se, but you should keep them at least in the same series (e.g. 5.15).
You normally only need one fall-back-kernel (best the one you had before the update), so you can remove the older kernel from older series (5.13 and 5.2) and it'll free a lot of space.
 
thanks for reply,
on the last code block of my previous post, Wee can see that latests kernels installer (E.G pve-kernel-5.15.39-1-pve for host 0001 is installed but 5.13.19-4-pve is runed).
Is it possible to load newest kernel (and remove older) without reboot ?
 
Is it possible to load newest kernel (and remove older) without reboot ?
Yes and no. There are techniques to load a newer kernel and also until PVE6 KernelCare was possible with PVE, now it is seemlingly not the case anymore (or at least I don't find information about it). I personally have not tried it yet, I always reboot. A reboot is also a good test if everthing comes back up correctly so that you can have more trust in your machine.

Normally we do rolling upgrades of our cluster for each node:
- distribue all running VMs across all other nodes
- running dist-upgrade
- reboot
- check if everthing is working
- migrate one or more not so important VM back, check again, potentially wait a few hours if problems arise
- continue with the next node
 
I'have done what you explain. All works fine, (exept ceph doing clean+snaptrim but I don't worry about that).
After reboot, networking service was not aviable, I had to run a service networking restart on each node.
To fix that I try "systemctl enable networking", and I will verify if it's working on next upgrade and reboot :)

Thank you for answers.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!