[SOLVED] /lib/modules filling up hard drive

apoc

Famous Member
Oct 13, 2017
1,051
170
108
Hello Proxmox Team,

I have a Proxmox PVE server which is configured to apply updates automatically by a cron-job + shell script, I am doing this so I just need to reboot at some point.
My boot-device is not particulary large, however there is plenty of space in normal operation (about 2GB) and /var/log is on a separate partition.

Now I have noticed that /lib/modules is filling up my /root partition. It has completely run out of space. Thankfully the system behaves graceful in this condition (probably because there is enough RAM).
It seems contents are not being deleted automatically when using "apt-get clean" and "apt-get autoremove", nor do the images in /boot

I am wondering if this actually expected / wanted behavior and if so, why this approach was chosen.
Currently I need to clean up my /root drive about every 2 weeks which is sub optimal.

Thanks for your help
With best regards
Thomas
 
I am wondering if this actually expected / wanted behavior and if so, why this approach was chosen.

Yes, this is expected behavior. We do not remove older kernels automatically, to avoid to run into situation where you cannot boot. You need to remove old, unused/unwanted kernels manually.
 
Maybe some sort of compromise could be better. I do not see any reason to keep more than 2 older kernels (in addition to the latest one)...
 
Thanks Dietmar, not as hoped but at least it is expected.
When I have time, I will look into a automatic solution for this as it is (as mentioned already) inconvenient.
 
It's an old thread, but I thought I share how I have finally solved my issue.
I am running the script via a cronjob on a regular basis. It cleans up several log files as well as /lib/modules and the /boot directory.

Use on your own convenience and your own risk!

/edit: Updated script to recent version (2022-08-10)
Changes:
- completely adjusted the logic - goal was to make things more bulletproof after the report of @cglmicro .
- The script, when run without parameter, only displays what it would do (dry-run).
- to actually clean use the -e parameter (e stands for execute).
- also introduced variables to enable and disable a section at the very beginning of the script - this should help and reduce the need to comment or remove lines


This works fine for me (ubuntu 20.04, PVE 7.2, but please test on your end on a system as well. I have not checked against other languages, locales, etc. And again:

Use on your own convenience and your own risk!
 

Attachments

Last edited:
@tburger Thanks! That worked quite well! I have a proxmox node running off a 4G memory stick. ESXI was on it previously and I was curious to see how it would go but it almost immediately got very full after running some updates.
 
It's an old thread, but I thought I share how I have finally solved my issue.
I am running the script via a cronjob on a regular basis. It cleans up several log files as well as /lib/modules and the /boot directory.

Use on your own convenience and your own risk!

/edit: Updated script to recent version (2020-10)
Thank you very much, you saved me here.
 
It's an old thread, but I thought I share how I have finally solved my issue.
I am running the script via a cronjob on a regular basis. It cleans up several log files as well as /lib/modules and the /boot directory.

Use on your own convenience and your own risk!

/edit: Updated script to recent version (2020-10)
Thank you for this script. Can I still use this script in my two Proxmox: I still have 6.3 and the latest version 7?
 
Just an heads up for all that use this script.

I have noticed on a longer running system that this script does not necessarily prevent the harddrive from filling up.

Situation: In case new kernels get installed but the reboot is not yet issued and the newer kernels and modules do not get cleaned up.
So in my case there were 3 or 4 newer kernels installed and this led to the point that my small bootdisk (8GB) did get full.
Reboot and re-run of the script cleared the situation for me.

So just that you all know. All the best.
 
Hi Apoc.

Thanks for the code. First I don't accuse you of anything, I know it was at my own risk. I just want to know what I did wrong, and prevent anybody else to do the same mistake that I did.

I tried running a cleaned version of your file just to take care of the kernel in /usr/lib/modules and /boot, and also to clean the repository.
I ended up with no files in /boot and in /usr/lib/modules; the script deleted all the kernels file in both folders, but didn't touch .

It was an (almost) empty test server, so I'm moving a VM out of it before trying to reboot it. I don't have much hope for it to boot, unless someone know how to repopulate both folders? And what about the apt files it deleted? You think I better format it and rebuild it from scratch?

Here is the code I ran
Code:
#!/bin/bash
################################################
# Cleanup Linux Installation
# Author: Thomas Burger
# Version 1.4
# Date: 2020-10-23
################################################

counter=0

### Cleaning up
clear
echo "### CLEANING UP LINUX INSTALLATION"

# /lib/modules AND /boot
kernelVersion=`uname -r`
myPath="/lib/modules"
echo "Cleaning up >$myPath<"
myPathContent=$null
myPathContent=`ls -1 "$myPath"`
for item in $myPathContent ;
    do
        if [[ "$item" == "$kernelVersion" ]] ;
            then
                #echo "break the loop"
                break
            else
                #echo "delete >$myPath/$item<"
                if ! [[ -d "$file" ]] ;
                    then
                        sudo rm -rf "$myPath/$item"
                        counter=$(($counter+1))

                        #boot files
                        correspondingBootFiles=`sudo find /boot | grep $item`
                        for bootItem in $correspondingBootFiles ;
                            do
                                #echo "delete >$bootItem<"
                                if ! [[ -d "$file" ]] ;
                                    then
                                        sudo rm -f "$bootItem"
                                        counter=$(($counter+1))
                                fi
                        done
                fi
        fi
done


echo "Cleaning up configuration files from removed packages"
sudo dpkg -l | awk '/^rc/{print $2}' | sudo xargs apt-get purge -y


# update grub
echo "Updating grub"
sudo update-grub

# Cleanup apt cache
echo "Cleaning up apt-cache via apt-get"
sudo apt-get clean -y
sudo apt-get autoremove -y

When I ran the script with "sh ./clean_linux.sh", I received errors on these 3 lines:
Code:
if [[ "$item" == "$kernelVersion" ]] ;
if ! [[ -d "$file" ]] ;
if ! [[ -d "$file" ]] ;

Here are the errors I received:
Cleaning up >/lib/modules<
./clean_kernel.sh: 153: [[: not found
./clean_kernel.sh: 159: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 153: [[: not found
./clean_kernel.sh: 159: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 153: [[: not found
./clean_kernel.sh: 159: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 153: [[: not found
./clean_kernel.sh: 159: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 153: [[: not found
./clean_kernel.sh: 159: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 153: [[: not found
./clean_kernel.sh: 159: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 153: [[: not found
./clean_kernel.sh: 159: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 153: [[: not found
./clean_kernel.sh: 159: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 153: [[: not found
./clean_kernel.sh: 159: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 153: [[: not found
./clean_kernel.sh: 159: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 153: [[: not found
./clean_kernel.sh: 159: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 153: [[: not found
./clean_kernel.sh: 159: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 153: [[: not found
./clean_kernel.sh: 159: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found
./clean_kernel.sh: 169: [[: not found

Thank you.
 
Ouch that is ugly @cglmicro

I have never experienced something like this and I am using this script-block for literally years now, on different systems and different linux distributions.
The output indicates to me that there is something seriously going wrong in terms of the script execution. I have no idea though why that happens.
Have you used "copy and paste" to transfer the content of the script over to your machine? Or did you scp it over?

Here is the code I ran
Code:
#!/bin/bash
################################################
# Cleanup Linux Installation
# Author: Thomas Burger
# Version 1.4
# Date: 2020-10-23
################################################

counter=0

### Cleaning up
clear
echo "### CLEANING UP LINUX INSTALLATION"

# /lib/modules AND /boot
kernelVersion=`uname -r`
myPath="/lib/modules"
echo "Cleaning up >$myPath<"
myPathContent=$null
myPathContent=`ls -1 "$myPath"`
for item in $myPathContent ;
do
if [[ "$item" == "$kernelVersion" ]] ;
then
#echo "break the loop"
break
else
#echo "delete >$myPath/$item<"
if ! [[ -d "$file" ]] ;
then
sudo rm -rf "$myPath/$item"
counter=$(($counter+1))

#boot files
correspondingBootFiles=`sudo find /boot | grep $item`
for bootItem in $correspondingBootFiles ;
do
#echo "delete >$bootItem<"
if ! [[ -d "$file" ]] ;
then
sudo rm -f "$bootItem"
counter=$(($counter+1))
fi
done
fi
fi
done


echo "Cleaning up configuration files from removed packages"
sudo dpkg -l | awk '/^rc/{print $2}' | sudo xargs apt-get purge -y


# update grub
echo "Updating grub"
sudo update-grub

# Cleanup apt cache
echo "Cleaning up apt-cache via apt-get"
sudo apt-get clean -y
sudo apt-get autoremove -y
Have you in any way modified the script?
I am asking because the script-block you have provided must be incomplete. There are no lines 153, 159, 169. The originally provided script was used on my end for some months before I have uploaded it here.

All these lines indicate that there is something going wrong on the if/else clauses which I really don't get. It seems that the shell seems to interpret the if-clauses as actual commands and these (of course) fail.

Could you please provide the script which gave you the errors?

A reboot will very likely not be successful, as all the kernel entries are missing.
 
Hi.

Yes, I commented out some lines before pasting it in this forum for lisibility, see below for the complete script.
I opened your script in notepad, I commented out some lines, and I copy-paste in ssh in my machine. Do you think WINDOWS CRLF bug could be responsible?

Code:
#!/bin/bash
################################################
# Cleanup Linux Installation
# Author: Thomas Burger
# Version 1.4
# Date: 2020-10-23
################################################

counter=0

### Cleaning up
clear
echo "### CLEANING UP LINUX INSTALLATION"


# /var/log
#myPath="/var/log"
#if [[ -d "$myPath" ]] ;
#    then
#        echo "Cleaning up >$myPath<"
#        myPathContent=$null
#        myPathContent=`sudo find $myPath`
#        for file in $myPathContent ;
#            do
#                delete="false"
#                postdot=""
#                postdot=$(echo $file | rev | cut -d. -f1 | rev)
#                if [[ "$postdot" == "0" ]] ;
#                    then
#                        delete="true"
#                elif [[ "$postdot" == "1" ]] ;
#                    then
#                        delete="true"
#                elif [[ "$postdot" == "gz" ]] ;
#                    then
#                        delete="true"
#                fi
#
#                if [[ "$delete" == "true" ]] ;
#                    then
#                        #echo "delete >$file<"
#                        if ! [[ -d "$file" ]] ;
#                            then
#                                sudo rm -f "$file"
#                                counter=$(($counter+1))
#                        fi
#                fi
#        done
#fi


# /var/log/journal
#myPath="/var/log/journal"
#if [[ -d "$myPath" ]] ;
#    then
#        echo "Cleaning up >$myPath<"
#        myPathContent=$null
#        myPathContent=`sudo find "$myPath" -mtime +3`
#        for file in $myPathContent ;
#            do
#                #echo "delete >$file<"
#                if ! [[ -d "$file" ]] ;
#                    then
#                        sudo rm -f "$file"
#                        counter=$(($counter+1))
#                fi
#        done
#fi


# /var/log.save
#myPath="/var/log.save"
#if [[ -d "$myPath" ]] ;
#    then
#        echo "Cleaning up >$myPath<"
#        myPathContent=$null
#        myPathContent=`sudo find $myPath`
#        for file in $myPathContent ;
#            do
#                delete="false"
#                postdot=""
#                postdot=$(echo $file | rev | cut -d. -f1 | rev)
#                if [[ "$postdot" == "0" ]] ;
#                    then
#                        delete="true"
#                elif [[ "$postdot" == "1" ]] ;
#                    then
#                        delete="true"
#                elif [[ "$postdot" == "gz" ]] ;
#                    then
#                        delete="true"
#                fi
#
#                if [[ "$delete" == "true" ]] ;
#                    then
#                        #echo "delete >$file<"
#                        if ! [[ -d "$file" ]] ;
#                            then
#                                sudo rm -f "$file"
#                                counter=$(($counter+1))
#                        fi
#                fi
#        done
#fi


# /var/log.save/journal
#myPath="/var/log.save/journal"
#if [[ -d "$myPath" ]] ;
#    then
#        echo "Cleaning up >$myPath<"
#        myPathContent=$null
#        myPathContent=`sudo find "$myPath" -mtime +3`
#        for file in $myPathContent ;
#            do
#                #echo "delete >$file<"
#                if ! [[ -d "$file" ]] ;
#                    then
#                        sudo rm -f "$file"
#                        counter=$(($counter+1))
#                fi
#        done
#fi


# /var/cache/e2fsck
#myPath="/var/cache/e2fsck"
#if [[ -d "$myPath" ]] ;
#    then
#        echo "Cleaning up >$myPath<"
#        myPathContent=$null
#        myPathContent=`sudo find "$myPath" -mtime +3`
#        for file in $myPathContent ;
#            do
#                #echo "delete >$file<"
#                if ! [[ -d "$file" ]] ;
#                    then
#                        sudo rm -f "$file"
#                        counter=$(($counter+1))
#                fi
#        done
#fi


# /lib/modules AND /boot
kernelVersion=`uname -r`
myPath="/lib/modules"
echo "Cleaning up >$myPath<"
myPathContent=$null
myPathContent=`ls -1 "$myPath"`
for item in $myPathContent ;
    do
        if [[ "$item" == "$kernelVersion" ]] ;
            then
                #echo "break the loop"
                break
            else
                #echo "delete >$myPath/$item<"
                if ! [[ -d "$file" ]] ;
                    then
                        sudo rm -rf "$myPath/$item"
                        counter=$(($counter+1))

                        #boot files
                        correspondingBootFiles=`sudo find /boot | grep $item`
                        for bootItem in $correspondingBootFiles ;
                            do
                                #echo "delete >$bootItem<"
                                if ! [[ -d "$file" ]] ;
                                    then
                                        sudo rm -f "$bootItem"
                                        counter=$(($counter+1))
                                fi
                        done
                fi
        fi
done


echo "Cleaning up configuration files from removed packages"
sudo dpkg -l | awk '/^rc/{print $2}' | sudo xargs apt-get purge -y


# update grub
echo "Updating grub"
sudo update-grub


# Cleanup apt cache
echo "Cleaning up apt-cache via apt-get"
sudo apt-get clean -y
sudo apt-get autoremove -y


# /etc/apt/cache
#myPath="/etc/apt/cache"
#echo "Cleaning up >$myPath<"
#if [[ -d "$myPath" ]] ;
#    then
#        myPathContent=$null
#        myPathContent=`sudo find "$myPath/*"  -mtime +3`
#        for item in $myPathContent ;
#            do
#                #echo "delete >$item<"
#                if ! [[ -d "$file" ]] ;
#                    then
#                        sudo rm -f "$item"
#                        counter=$(($counter+1))
#                fi
#        done
#fi


# Closing down
echo "Cleaned up $counter elements."
#echo "     Current state:"
#sudo df -h /

Thank you.
 
Do you think WINDOWS CRLF bug could be responsible?
I don't think it is a LF / CRLF issue - but perhaps some encoding-issue.

For whatever reason the [[ is interpreted as an actual command, which of course is wrong. Sadly I can't repair what has happened to your system. But I will refresh the script to make the logic different in order to hopefully prevent such an event.
 
Thanks Apoc, and don't worry about the damage, as long as it can be prevented for someone else, that's what I was hoping for.

I tried to copy with rsync all files in both folders from another working PVE, but I don't know what to do to get back what was deleted in apt. And I didn't tried to reboot yet, waiting for a new server I ordered yesterday to replace this one.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!