Opt-in Linux 6.17 Kernel for Proxmox VE 9 available on test & no-subscription

The latest kernel update is very unstable on my machine(s) (see attached journalctl.log). The older kernel 6.14.11-4, and 6.14.8-2 don't have this issue.

1764057674070.png
 

Attachments

The latest kernel update is very unstable on my machine(s) (see attached journalctl.log). The older kernel 6.14.11-4, and 6.14.8-2 don't have this issue.

View attachment 93239
Please open a new thread for this and post more details about your HW, e.g. the server vendor and model.

This might be related to your CPU not implementing VT-d correctly, and, e.g., the newer kernel exposing some feature that was previously not used or just not detected that it was wrong. If you do not rely on PCI pass-through you can try adding disabling the IOMMU, e.g by adding the intel_iommu=off as kernel parameter.
 
  • Like
Reactions: sharanah
Today I upgraded 2 nodes to 9.1-6.17 of my 3 node cluster and both failed to boot with the 6.17 kernel into a bootloop.
  • It boots into grub and after selecting the 6.17 kernel the machine just reboots
  • For some reason `systemd-boot-efi` seems to be installed, but also seems to be on my last 6.14 node left.
  • Both servers are identical Dell R630 (see fastfetch below)
  • They are booted in BIOS.
  • Systemd-boot is NOT installed.
  • Using ZFS AND Ceph
  • All 3 nodes have already been upgraded from Proxmox 8 if that is relevant.
  • Myself would exclude any bios/efi issues, since it boots into grub.
  • Once i select the 6.17 kernel in grub using the iDRAC remote viewer i get a blackscreen with an "_" in the character location (0,0) and then it reboots. I struggle giving any more clues.

I therefore booted the 6.14 kernel again
root@triton:~# fastfetch
.://:` `://:. root@triton
`hMMMMMMd/ /dMMMMMMh` -----------
`sMMMMMMMd: :mMMMMMMMs` OS: Proxmox VE 9.1.1 x86_64
`-/+oo+/:`.yMMMMMMMh- -hMMMMMMMy.`:/+oo+/-` Host: PowerEdge R630
`:oooooooo/`-hMMMMMMMyyMMMMMMMh-`/oooooooo:` Kernel: Linux 6.14.11-4-pve
`/oooooooo:`:mMMMMMMMMMMMMm:`:oooooooo/` Uptime: 22 mins
./ooooooo+- +NMMMMMMMMN+ -+ooooooo/. Packages: 890 (dpkg)
.+ooooooo+-`oNMMMMNo`-+ooooooo+. Shell: bash 5.2.37
-+ooooooo/.`sMMs`./ooooooo+- Display (VGA-1): 1024x768 @ 60 Hz
:oooooooo/`..`/oooooooo: Terminal: /dev/pts/0
:oooooooo/`..`/oooooooo: CPU: Intel(R) Xeon(R) E5-2643 v3 (12) @ 3.70 GHz
-+ooooooo/.`sMMs`./ooooooo+- GPU: Matrox Electronics Systems Ltd. G200eR2
.+ooooooo+-`oNMMMMNo`-+ooooooo+. Memory: 3.86 GiB / 31.25 GiB (12%)
./ooooooo+- +NMMMMMMMMN+ -+ooooooo/. Swap: 0 B / 8.00 GiB (0%)
`/oooooooo:`:mMMMMMMMMMMMMm:`:oooooooo/` Disk (/): 96.65 GiB / 641.23 GiB (15%) - zfs
`:oooooooo/`-hMMMMMMMyyMMMMMMMh-`/oooooooo:` Disk (/rpool): 128.00 KiB / 544.58 GiB (0%) - zfs
`-/+oo+/:`.yMMMMMMMh- -hMMMMMMMy.`:/+oo+/-` Local IP (vmbr0): 192.168.180.100/24
`sMMMMMMMm: :dMMMMMMMs` Locale: en_US.UTF-8
`hMMMMMMd/ /dMMMMMMh`
`://:` `://:`
 
  • Like
Reactions: dj-bauer
I'm also running Podman in LXC and ran into this issue and can't use 6.17 therefore. It's good that there's a fix but you didn't report this anywhere but here yet @jaminmc?
Yes!!! I am not crazy, or the only one that has had this happen to! I have a LXC container that I have to compile the kernel with my patch in it. Here is how to do it. Create a Debian 13 container, and then paste the steps into it,
Bash:
# 1. Update base system
apt update && apt upgrade -y

# 2. Add Proxmox repo and key
wget -q https://enterprise.proxmox.com/debian/proxmox-release-trixie.gpg \
    -O /etc/apt/trusted.gpg.d/proxmox-release-trixie.gpg

cat > /etc/apt/sources.list.d/pve-src.sources <<EOF
Types: deb
URIs: http://download.proxmox.com/debian/pve
Suites: trixie
Components: pve-no-subscription
Signed-By: /etc/apt/trusted.gpg.d/proxmox-release-trixie.gpg
EOF

# 3. Append Debian deb-src if missing
if ! grep -q "deb-src" /etc/apt/sources.list.d/debian.sources 2>/dev/null; then
    cat >> /etc/apt/sources.list.d/debian.sources <<EOF

Types: deb-src
URIs: http://deb.debian.org/debian
Suites: trixie trixie-updates
Components: main contrib non-free non-free-firmware
Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg

Types: deb-src
URIs: http://security.debian.org/debian-security
Suites: trixie-security
Components: main contrib non-free non-free-firmware
Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg
EOF
fi

# 4. Update
apt update

# 5. Install build tools
apt install -y build-essential git git-email debhelper devscripts fakeroot \
    libncurses-dev bison flex libssl-dev libelf-dev bc cpio kmod pahole dwarves \
    rsync python3 python3-pip pve-doc-generator python-is-python3 dh-python \
    sphinx-common quilt libtraceevent-dev libunwind-dev libzstd-dev pkg-config equivs

# 6. Clone and prepare repo
git clone https://git.proxmox.com/git/pve-kernel.git
cd pve-kernel
git checkout master  # Latest kernel + patches

# 7. Prep and deps
make distclean
make build-dir-fresh

cd proxmox-kernel-*/ ; mk-build-deps -i -r -t "apt-get -o Debug::pkgProblemResolver=yes --no-install-recommends -y" debian/control ; cd ..

# 8. Add patch
cat >> patches/kernel/0014-apparmor-fix-NULL-pointer-dereference-in-aa_file.patch <<'EOF'
diff --git a/security/apparmor/file.c b/security/apparmor/file.c
--- a/security/apparmor/file.c
+++ b/security/apparmor/file.c
@@ -777,6 +777,9 @@ static bool __unix_needs_revalidation(struct file *file, struct aa_label *label
         return false;
     if (request & NET_PEER_MASK)
         return false;
+    /* sock and sock->sk can be NULL for sockets being set up or torn down */
+    if (!sock || !sock->sk)
+        return false;
     if (sock->sk->sk_family == PF_UNIX) {
         struct aa_sk_ctx *ctx = aa_sock(sock->sk);
EOF
make build-dir-fresh

# 9. Build
make

echo "=== BUILD COMPLETE ==="
ls -lh *.deb 


# For updates,
git reset --hard HEAD
git clean -df
git pull
git submodule update --init --recursive
#  Then do Step 8 & 9
After it is all built, on my proxmox since the container is on ZFS, I just run this on my proxmox to update the kernel with the one I compiled:

apt --reinstall install /rpool/data/subvol-114-disk-0/root/pve-kernel/proxmox-{kernel,headers}-6.17.2-1-pve_6.17.2-1*.deb

replace 114 with your container number. That will reinstall the kernel with the patched one. Or scp the created deb files to your proxmox servers, and then install them from there.

Check https://git.proxmox.com/?p=pve-kernel.git;a=summary for kernel updates
 
Last edited:
Yes!!! I am not crazy, or the only one that has had this happen to! I have a LXC container that I have to compile the kernel with my patch in it. Here is how to do it. Create a Debian 13 container, and then paste the steps into it,
Bash:
# 1. Update base system
apt update && apt upgrade -y

# 2. Add Proxmox repo and key
wget -q https://enterprise.proxmox.com/debian/proxmox-release-trixie.gpg \
    -O /etc/apt/trusted.gpg.d/proxmox-release-trixie.gpg

cat > /etc/apt/sources.list.d/pve-src.sources <<EOF
Types: deb
URIs: http://download.proxmox.com/debian/pve
Suites: trixie
Components: pve-no-subscription
Signed-By: /etc/apt/trusted.gpg.d/proxmox-release-trixie.gpg
EOF

# 3. Append Debian deb-src if missing
if ! grep -q "deb-src" /etc/apt/sources.list.d/debian.sources 2>/dev/null; then
    cat >> /etc/apt/sources.list.d/debian.sources <<EOF

Types: deb-src
URIs: http://deb.debian.org/debian
Suites: trixie trixie-updates
Components: main contrib non-free non-free-firmware
Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg

Types: deb-src
URIs: http://security.debian.org/debian-security
Suites: trixie-security
Components: main contrib non-free non-free-firmware
Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg
EOF
fi

# 4. Update
apt update

# 5. Install build tools
apt install -y build-essential git git-email debhelper devscripts fakeroot \
    libncurses-dev bison flex libssl-dev libelf-dev bc cpio kmod pahole dwarves \
    rsync python3 python3-pip pve-doc-generator python-is-python3 dh-python \
    sphinx-common quilt libtraceevent-dev libunwind-dev libzstd-dev pkg-config equivs

# 6. Clone and prepare repo
git clone https://git.proxmox.com/git/pve-kernel.git
cd pve-kernel
git checkout master  # Latest kernel + patches

# 7. Prep and deps
make distclean
make build-dir-fresh

cd proxmox-kernel-*/ ; mk-build-deps -i -r -t "apt-get -o Debug::pkgProblemResolver=yes --no-install-recommends -y" debian/control ; cd ..

# 8. Add patch
cat >> patches/kernel/0014-apparmor-fix-NULL-pointer-dereference-in-aa_file.patch <<'EOF'
diff --git a/security/apparmor/file.c b/security/apparmor/file.c
--- a/security/apparmor/file.c
+++ b/security/apparmor/file.c
@@ -777,6 +777,9 @@ static bool __unix_needs_revalidation(struct file *file, struct aa_label *label
         return false;
     if (request & NET_PEER_MASK)
         return false;
+    /* sock and sock->sk can be NULL for sockets being set up or torn down */
+    if (!sock || !sock->sk)
+        return false;
     if (sock->sk->sk_family == PF_UNIX) {
         struct aa_sk_ctx *ctx = aa_sock(sock->sk);
EOF
make build-dir-fresh

# 9. Build
make

echo "=== BUILD COMPLETE ==="
ls -lh *.deb


# For updates,
git reset --hard HEAD
git clean -df
git pull
git submodule update --init --recursive
#  Then do Step 8 & 9
After it is all built, on my proxmox since the container is on ZFS, I just run this on my proxmox to update the kernel with the one I compiled:

apt --reinstall install /rpool/data/subvol-114-disk-0/root/pve-kernel/proxmox-{kernel,headers}-6.17.2-1-pve_6.17.2-1*.deb

replace 114 with your container number. That will reinstall the kernel with the patched one. Or scp the created deb files to your proxmox servers, and then install them from there.

Check https://git.proxmox.com/?p=pve-kernel.git;a=summary for kernel updates
Thanks but compiling is not the issue. I'd rather have this issue fixed upstream so I don't have compile every new version after 6.14 from now on but for that it'd need to be reported or you just upstream your patch?
 
  • Like
Reactions: jaminmc
Thanks but compiling is not the issue. I'd rather have this issue fixed upstream so I don't have compile every new version after 6.14 from now on but for that it'd need to be reported or you just upstream your patch?
Prior to the release of 6.17, I raised this issue with the developers. Fiona even suggested that I submit the patches to https://bugzilla.proxmox.com/. However, in the subsequent post, it was mentioned that since I used Cursor for debugging and my 2nd patch in that post was fixable with an AppArmor change instead of the kernel level. But the main patch that I have been using cannot be fixed at the user level, and a NULL should not be able to takedown a whole system. Consequently, the issue has been ignored. I assume that @fiona is the same individual as https://git.proxmox.com/?p=pve-kernel.git;a=search;s=Fiona+Ebner;st=author in the git repository. I have recently submitted a bug report: https://bugzilla.proxmox.com/show_bug.cgi?id=7083, so hopefully it will get the attention needed.

Podman should be a better alternative to run Docker/OCI containers than just running them in a container. It also allows for Docker-compose which the LXC OCI template cannot do. With a Debian 13 container being the same OS Proxmox is based on, it should be the fully compatible. Especially since PodMan is made to run rootless.

Despite raising this issue repeatedly after each new release of Kernel 6.17, Podman on a Debian 13 LXC on ZFS consistently causes a severe kernel panic, rendering the entire system unresponsive and requiring a manual reset to reboot the server. This should never occur at the kernel level. The fix is a straightforward two-line code change. So I have been compiling the Kernel with the fix, until it gets fixed officially.
 
Last edited:
I have recently submitted a bug report: https://bugzilla.proxmox.com/show_bug.cgi?id=7083, so hopefully it will get the attention needed.
Yes, thank you for this! Having clear reproducer steps and the issue filed stand-alone goes a long way. Like this, it's clear that it's not related to the other issue. And other developers that don't stumble upon it by chance in the forum will be aware of it too. Personally, I had too much other things to look at in the context of the Proxmox VE 9.1 release. AFAIK no other user has reported the same issue as of yet.
 
  • Like
Reactions: jaminmc
Yes, thank you for this! Having clear reproducer steps and the issue filed stand-alone goes a long way. Like this, it's clear that it's not related to the other issue. And other developers that don't stumble upon it by chance in the forum will be aware of it too. Personally, I had too much other things to look at in the context of the Proxmox VE 9.1 release. AFAIK no other user has reported the same issue as of yet.
That was FAST! and the patch that Fabian Grünbichler came up with was only one line change, so I guess it is more efficient :) There is a 6.17.2-2-pve in the Proxmox Testing repo. I installed it and Podman is humming along now! Now I no longer have to compile the kernel myself to get Podman to work on my system.

@sambuka will be happy to see this!

So the new Kernel in the Test repo passes on my end.