Ceph Managers Seg Faulting Post Upgrade (8 -> 9 upgrade)

trickerz · Aug 25, 2025

I am only using ceph-csi-rbd

Also, I'd be curious if all of us are using Kasten for backups.

VirusTheSyrus · Aug 25, 2025

I just use ceph-csi-rbd and have the same segfaults and crashes

rpe · Aug 26, 2025

I use only ceph-csi-rbd, backups done with velero

frantathefranta · Aug 27, 2025

My setup is only ceph-csi-rbd, using Volsync and the snapshot-controller. I'm unable to run rbd trash purge without failing with:

Code:

Removing images: 29% complete...failed.
rbd: some expired images could not be removed
Ensure that they are closed/unmapped, do not have snapshots (including trashed snapshots with linked clones), are not in a group and were moved to the trash successfully.

I've managed to get the mgr daemons running briefly (but still crashed later) with this:

Code:

rbd -c /etc/pve/ceph.conf --cluster ceph --pool <pool> ls <pool> |grep snap | xargs -l rbd -c /etc/pve/ceph.conf --cluster ceph --pool <pool> snap purge

Even changing the ReplicationSource copyMethod to Directdoesn't help in the long run.

Jackobli · Aug 29, 2025

Max Carrara said:
Reported this upstream in the meantime: https://tracker.ceph.com/issues/72713

This is quite tricky to track down, unfortunately; but hey, we've got a bit of an idea now at least.

Just out of curiosity, are there any users here that are only using either one of the Ceph CSI drivers, but not both? (So either only ceph-csi-rbd or only ceph-csi-cephfs.)

We are also seeing this issue on our TEST-Environment.
Acutally, we have planned to upgrade our PROD next week (there we are in «Basic» Support not only Community like the TEST).
We will stop the upgrade until this issue is well known and resolved!

I'd suggest to send out a warning for other people trying to upgrade!

Regards, Urs

VirusTheSyrus · Aug 30, 2025

To create temporary backups of my Kubernetes workloads, I set up a Debian Bookworm container within Proxmox, in which I installed Ceph-MGR and added it to the cluster. The version is also 19.2.3, matching the Proxmox Ceph cluster. The manager running in the Debian Bookworm container does not experience these Segfault crashes, allowing me to temporarily backup my workloads.

Max Carrara · Sep 1, 2025

VirusTheSyrus said:
To create temporary backups of my Kubernetes workloads, I set up a Debian Bookworm container within Proxmox, in which I installed Ceph-MGR and added it to the cluster. The version is also 19.2.3, matching the Proxmox Ceph cluster. The manager running in the Debian Bookworm container does not experience these Segfault crashes, allowing me to temporarily backup my workloads.

Fascinating. Would you be so kind to share your installation procedure here, please? Did you build Ceph 19.2.3 from source, or did you pull it from our repos? Or did you install it some other way?

VirusTheSyrus · Sep 1, 2025

I installed debian-12-standard_12.7-1_amd64.tar.zst container image and added the official debian repo of ceph. then i installed ceph-mgr and started it

Bash:

apt-get install software-properties-common
apt-add-repository 'deb https://download.ceph.com/debian-squid/ bookworm main'
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E84AC2C0460F3994
apt update
apt install ceph-mgr

then i copied the admin keyring and ceph config over and started the mgr

Bash:

cd /etc/ceph/
scp source-pve-ip:/etc/ceph/ceph.client.admin* .
scp source-pve-ip:/etc/ceph/ceph.conf .

export $name=cephmgr1
``ceph auth get-or-create mgr.$name mon 'allow profile mgr' osd 'allow *' mds 'allow *'``
mkdir /var/lib/ceph/mgr/ceph-cephmgr1
nano /var/lib/ceph/mgr/ceph-cephmgr1/keyring #<-- paste key

#...
[mgr.cephmgr1]
        key = xxxxxxxxxxxxxxxxxxxxxxxxxx
#...
# start mgr daemon
ceph-mgr -i $name

mathd · Sep 6, 2025

I have the same issue using Volsync. I use the following cronjobs as a workaround until it is fixed.

On one proxmox node:

Code:

10 * * * * /usr/bin/rbd trash purge k8s-prod && sleep 60 && /usr/bin/systemctl reset-failed && /usr/bin/systemctl restart ceph-mgr.target

On the others:

Code:

15 * * * * /usr/bin/systemctl reset-failed && /usr/bin/systemctl restart ceph-mgr.target

My volsync jobs are on the hour and usually run under 5 minutes so by the time the cronjob runs they are done.

There is also the possibility to add a

Code:

ceph crash archive-all

or

Code:

ceph crash prune 0

to the first job to remove the warnings from the Ceph dashboard but I don't like the idea of removing possible non-related crashes.

marcuscmy · Sep 6, 2025

Would like to drop in as well.

pve 9.0.5 cluster, ceph 19.2.3

I have k3s installed on VMs with external ceph mount on the proxmox cluster crashing the managers. Using the csi-rbd-ceph plugin as well.

Since ceph froze I actually couldn't access the nodes, my steps to fix was:
1. Manually modify the ceph auth keyring used for Kubernetes, revoking most privileges. (Since empty doesn't work, I just did allow rbd on monitors)
2. rbd --pool k3s trash purge (I had to purge the metadata pool separately as well since I was using an EC pool)
3. systemctl reset-failed
4. Start ceph managers again

Would appreciate updates if anyone has a fix.

Good thing the k3s cluster was a fresh staging environment, the only pvc I had was a rancher backup, so that could be causing issues.

trickerz · Sep 7, 2025

I tried adding a manager from a Debian VM install as well and it has stayed alive for the last couple days. My original 3 managers are showing they're still in standby and the new one is active and working. If one of the old ones is the active manager, it crashes and the new one crashes as well when it tries to take over.

VirusTheSyrus · Sep 9, 2025

trickerz said:
I tried adding a manager from a Debian VM install as well and it has stayed alive for the last couple days. My original 3 managers are showing they're still in standby and the new one is active and working. If one of the old ones is the active manager, it crashes and the new one crashes as well when it tries to take over.

I had the same issues with the old remaining stanby mgr instances. So i destroyed them and no problem since then. I will readd them if the problem with the mgr is sorted.
In the upstream Repo from ceph there weren't any releases for trixie .... maybe thats why?

Max Carrara · Sep 9, 2025

Hello everyone,

I got some great news: I just sent in a patch that should provide a sufficient workaround for this issue. You can view it here if you want to read up on all the gritty details.

In short: The segfault happens inside the Python interpreter (!!!) during a function that prints the progress of the volume deletion (the percentage done etc.). This function is passed from the Ceph MGR all the way through librados and librbd through Ceph's Python-to-C/C++ bindings. When the function is finally called inside librbd, the Python interpreter itself chokes and dies.

This is probably due to some Ceph and Python internals that aren't really mixing well, combined with changes made between Python 3.11 and Python 3.13. I'll spare you the tirade on that now.

Anyway, the workaround disables the progress callback function, which means that the default "no-op" callback function is used instead. From what I've tested, this completely eliminates the issue, as that particular path in the code is never hit. This isn't really a satisfying solution for me personally, but it's pragmatic; we're avoiding the issue until the underlying cause has been fixed for good. In the meantime, I hope y'all can live without the RBD image deletion progress appearing in the logs. ;P

I can't really give an ETA until when patch trickles in, but nevertheless, the issue should hopefully not occur anymore once the patch is applied. I'll let you know when it is.

ShockStruck · Sep 10, 2025

I appreciate the work and root cause analysis on this! I have built a script that intercepts the dangerous calls, band-aiding the issue while we wait for the patch. Use at your own risk but it works fine in my cluster.

#!/bin/bash
#
# LD_PRELOAD Fix for Ceph MGR Segfault
# Intercepts C library calls to bypass progress callbacks
#

set -euo pipefail

RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m'

echo -e "${BLUE}LD_PRELOAD Ceph MGR Fix${NC}"
echo "========================"

# Check for root
if [[ $EUID -ne 0 ]]; then
echo -e "${RED}This script must be run as root${NC}"
exit 1
fi

# Create working directory
WORK_DIR="/tmp/ceph-fix-$$"
mkdir -p "${WORK_DIR}"
cd "${WORK_DIR}"

# Create the C interception library
echo -e "${YELLOW}Creating LD_PRELOAD interceptor...${NC}"

cat > rbd_callback_fix.c << 'EOF'
#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdio.h>
#include <stdlib.h>

// Intercept rbd_trash_remove_with_progress
int rbd_trash_remove_with_progress(void *ioctx, const char *id, int force,
void *cb, void *arg) {
// Try to find the non-progress version first
static int (*real_func)(void*, const char*, int) = NULL;
if (!real_func) {
real_func = dlsym(RTLD_NEXT, "rbd_trash_remove");
if (!real_func) {
// Fallback: call original with NULL callback
static int (*orig_func)(void*, const char*, int, void*, void*) = NULL;
if (!orig_func) {
orig_func = dlsym(RTLD_NEXT, "rbd_trash_remove_with_progress");
}
if (orig_func) {
return orig_func(ioctx, id, force, NULL, NULL);
}
return -1;
}
}
return real_func(ioctx, id, force);
}

// Intercept rbd_remove_with_progress
int rbd_remove_with_progress(void *ioctx, const char *name, void *cb, void *arg) {
static int (*real_func)(void*, const char*) = NULL;
if (!real_func) {
real_func = dlsym(RTLD_NEXT, "rbd_remove");
if (!real_func) {
static int (*orig_func)(void*, const char*, void*, void*) = NULL;
if (!orig_func) {
orig_func = dlsym(RTLD_NEXT, "rbd_remove_with_progress");
}
if (orig_func) {
return orig_func(ioctx, name, NULL, NULL);
}
return -1;
}
}
return real_func(ioctx, name);
}

// Intercept rbd_flatten_with_progress
int rbd_flatten_with_progress(void *image, void *cb, void *arg) {
static int (*real_func)(void*) = NULL;
if (!real_func) {
real_func = dlsym(RTLD_NEXT, "rbd_flatten");
if (!real_func) {
static int (*orig_func)(void*, void*, void*) = NULL;
if (!orig_func) {
orig_func = dlsym(RTLD_NEXT, "rbd_flatten_with_progress");
}
if (orig_func) {
return orig_func(image, NULL, NULL);
}
return -1;
}
}
return real_func(image);
}

// Intercept migration functions
int rbd_migration_execute_with_progress(void *ioctx, const char *name,
void *cb, void *arg) {
static int (*real_func)(void*, const char*) = NULL;
if (!real_func) {
real_func = dlsym(RTLD_NEXT, "rbd_migration_execute");
if (!real_func) {
static int (*orig_func)(void*, const char*, void*, void*) = NULL;
if (!orig_func) {
orig_func = dlsym(RTLD_NEXT, "rbd_migration_execute_with_progress");
}
if (orig_func) {
return orig_func(ioctx, name, NULL, NULL);
}
return -1;
}
}
return real_func(ioctx, name);
}

// Constructor - runs when library is loaded
__attribute__((constructor))
void init(void) {
fprintf(stderr, "RBD callback fix loaded - intercepting progress callbacks\n");
}
EOF

# Compile the library
echo -e "${YELLOW}Compiling interceptor library...${NC}"
gcc -shared -fPIC -o librbd_fix.so rbd_callback_fix.c -ldl || {
echo -e "${RED}Failed to compile interceptor${NC}"
exit 1
}

# Install the library
echo -e "${YELLOW}Installing interceptor...${NC}"
cp librbd_fix.so /usr/local/lib/
ldconfig

# Create systemd override for MGR
echo -e "${YELLOW}Configuring MGR to use interceptor...${NC}"
mkdir -p /etc/systemd/system/ceph-mgr@.service.d/

cat > /etc/systemd/system/ceph-mgr@.service.d/ld-preload.conf << 'EOF'
[Service]
Environment="LD_PRELOAD=/usr/local/lib/librbd_fix.so"
EOF

# Reload systemd
systemctl daemon-reload

# Restart MGR
HOSTNAME=$(hostname -s)
echo -e "${YELLOW}Restarting MGR with LD_PRELOAD fix...${NC}"
systemctl restart ceph-mgr@${HOSTNAME}

# Wait for MGR to start
echo "Waiting for MGR to become active..."
sleep 10

# Check status
for i in {1..6}; do
if ceph mgr stat 2>/dev/null | grep -q active; then
echo -e "${GREEN}SUCCESS: MGR is running with LD_PRELOAD fix!${NC}"

# Test RBD operations
echo -e "${YELLOW}Testing RBD operations...${NC}"
TEST_IMAGE="ld-preload-test-$$"

# Create and delete test image
if rbd create rbd/${TEST_IMAGE} --size 10M 2>/dev/null; then
rbd trash mv rbd/${TEST_IMAGE} 2>/dev/null

# Get image ID from trash
IMAGE_ID=$(rbd trash ls rbd --format json | jq -r ".[0].id" 2>/dev/null || echo "")
if [[ -n "${IMAGE_ID}" ]]; then
if rbd trash rm rbd/${IMAGE_ID} 2>/dev/null; then
echo -e "${GREEN}RBD operations successful - no segfault!${NC}"
fi
fi
fi

# Clean up
cd /
rm -rf "${WORK_DIR}"

echo ""
echo -e "${GREEN}LD_PRELOAD fix successfully applied!${NC}"
echo ""
echo "The fix intercepts these functions at the C library level:"
echo " - rbd_trash_remove_with_progress"
echo " - rbd_remove_with_progress"
echo " - rbd_flatten_with_progress"
echo " - rbd_migration_*_with_progress"
echo ""
echo "To verify the fix is active:"
echo " cat /proc/\$(pgrep ceph-mgr)/environ | tr '\\0' '\\n' | grep LD_PRELOAD"
echo ""
echo "To remove the fix:"
echo " rm /etc/systemd/system/ceph-mgr@.service.d/ld-preload.conf"
echo " rm /usr/local/lib/librbd_fix.so"
echo " systemctl daemon-reload"
echo " systemctl restart ceph-mgr@${HOSTNAME}"

exit 0
fi
echo "Waiting... (attempt $i/6)"
sleep 5
done

echo -e "${RED}MGR failed to start with LD_PRELOAD fix${NC}"
echo "Check logs: journalctl -u ceph-mgr@${HOSTNAME} -n 50"
exit 1

Max Carrara · Sep 11, 2025

The (v2 of the) patch has been applied, the updated packages should now be available on our testing repositories.

If anyone's got the testing repo enabled, feel free to test it. Everything seemed stable for me, even after multiple apply-delete-apply cycles with kubectl, so should anyone be doing some testing, please let me know how it goes.

Max Carrara · Sep 11, 2025

ShockStruck said:
I appreciate the work and root cause analysis on this! I have built a script that intercepts the dangerous calls, band-aiding the issue while we wait for the patch. Use at your own risk but it works fine in my cluster.

#!/bin/bash
#
# LD_PRELOAD Fix for Ceph MGR Segfault
# Intercepts C library calls to bypass progress callbacks
#

set -euo pipefail

RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m'

echo -e "${BLUE}LD_PRELOAD Ceph MGR Fix${NC}"
echo "========================"

# Check for root
if [[ $EUID -ne 0 ]]; then
echo -e "${RED}This script must be run as root${NC}"
exit 1
fi

# Create working directory
WORK_DIR="/tmp/ceph-fix-$$"
mkdir -p "${WORK_DIR}"
cd "${WORK_DIR}"

# Create the C interception library
echo -e "${YELLOW}Creating LD_PRELOAD interceptor...${NC}"

cat > rbd_callback_fix.c << 'EOF'
#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdio.h>
#include <stdlib.h>

// Intercept rbd_trash_remove_with_progress
int rbd_trash_remove_with_progress(void *ioctx, const char *id, int force,
void *cb, void *arg) {
// Try to find the non-progress version first
static int (*real_func)(void*, const char*, int) = NULL;
if (!real_func) {
real_func = dlsym(RTLD_NEXT, "rbd_trash_remove");
if (!real_func) {
// Fallback: call original with NULL callback
static int (*orig_func)(void*, const char*, int, void*, void*) = NULL;
if (!orig_func) {
orig_func = dlsym(RTLD_NEXT, "rbd_trash_remove_with_progress");
}
if (orig_func) {
return orig_func(ioctx, id, force, NULL, NULL);
}
return -1;
}
}
return real_func(ioctx, id, force);
}

// Intercept rbd_remove_with_progress
int rbd_remove_with_progress(void *ioctx, const char *name, void *cb, void *arg) {
static int (*real_func)(void*, const char*) = NULL;
if (!real_func) {
real_func = dlsym(RTLD_NEXT, "rbd_remove");
if (!real_func) {
static int (*orig_func)(void*, const char*, void*, void*) = NULL;
if (!orig_func) {
orig_func = dlsym(RTLD_NEXT, "rbd_remove_with_progress");
}
if (orig_func) {
return orig_func(ioctx, name, NULL, NULL);
}
return -1;
}
}
return real_func(ioctx, name);
}

// Intercept rbd_flatten_with_progress
int rbd_flatten_with_progress(void *image, void *cb, void *arg) {
static int (*real_func)(void*) = NULL;
if (!real_func) {
real_func = dlsym(RTLD_NEXT, "rbd_flatten");
if (!real_func) {
static int (*orig_func)(void*, void*, void*) = NULL;
if (!orig_func) {
orig_func = dlsym(RTLD_NEXT, "rbd_flatten_with_progress");
}
if (orig_func) {
return orig_func(image, NULL, NULL);
}
return -1;
}
}
return real_func(image);
}

// Intercept migration functions
int rbd_migration_execute_with_progress(void *ioctx, const char *name,
void *cb, void *arg) {
static int (*real_func)(void*, const char*) = NULL;
if (!real_func) {
real_func = dlsym(RTLD_NEXT, "rbd_migration_execute");
if (!real_func) {
static int (*orig_func)(void*, const char*, void*, void*) = NULL;
if (!orig_func) {
orig_func = dlsym(RTLD_NEXT, "rbd_migration_execute_with_progress");
}
if (orig_func) {
return orig_func(ioctx, name, NULL, NULL);
}
return -1;
}
}
return real_func(ioctx, name);
}

// Constructor - runs when library is loaded
__attribute__((constructor))
void init(void) {
fprintf(stderr, "RBD callback fix loaded - intercepting progress callbacks\n");
}
EOF

# Compile the library
echo -e "${YELLOW}Compiling interceptor library...${NC}"
gcc -shared -fPIC -o librbd_fix.so rbd_callback_fix.c -ldl || {
echo -e "${RED}Failed to compile interceptor${NC}"
exit 1
}

# Install the library
echo -e "${YELLOW}Installing interceptor...${NC}"
cp librbd_fix.so /usr/local/lib/
ldconfig

# Create systemd override for MGR
echo -e "${YELLOW}Configuring MGR to use interceptor...${NC}"
mkdir -p /etc/systemd/system/ceph-mgr@.service.d/

cat > /etc/systemd/system/ceph-mgr@.service.d/ld-preload.conf << 'EOF'
[Service]
Environment="LD_PRELOAD=/usr/local/lib/librbd_fix.so"
EOF

# Reload systemd
systemctl daemon-reload

# Restart MGR
HOSTNAME=$(hostname -s)
echo -e "${YELLOW}Restarting MGR with LD_PRELOAD fix...${NC}"
systemctl restart ceph-mgr@${HOSTNAME}

# Wait for MGR to start
echo "Waiting for MGR to become active..."
sleep 10

# Check status
for i in {1..6}; do
if ceph mgr stat 2>/dev/null | grep -q active; then
echo -e "${GREEN}SUCCESS: MGR is running with LD_PRELOAD fix!${NC}"

# Test RBD operations
echo -e "${YELLOW}Testing RBD operations...${NC}"
TEST_IMAGE="ld-preload-test-$$"

# Create and delete test image
if rbd create rbd/${TEST_IMAGE} --size 10M 2>/dev/null; then
rbd trash mv rbd/${TEST_IMAGE} 2>/dev/null

# Get image ID from trash
IMAGE_ID=$(rbd trash ls rbd --format json | jq -r ".[0].id" 2>/dev/null || echo "")
if [[ -n "${IMAGE_ID}" ]]; then
if rbd trash rm rbd/${IMAGE_ID} 2>/dev/null; then
echo -e "${GREEN}RBD operations successful - no segfault!${NC}"
fi
fi
fi

# Clean up
cd /
rm -rf "${WORK_DIR}"

echo ""
echo -e "${GREEN}LD_PRELOAD fix successfully applied!${NC}"
echo ""
echo "The fix intercepts these functions at the C library level:"
echo " - rbd_trash_remove_with_progress"
echo " - rbd_remove_with_progress"
echo " - rbd_flatten_with_progress"
echo " - rbd_migration_*_with_progress"
echo ""
echo "To verify the fix is active:"
echo " cat /proc/\$(pgrep ceph-mgr)/environ | tr '\\0' '\\n' | grep LD_PRELOAD"
echo ""
echo "To remove the fix:"
echo " rm /etc/systemd/system/ceph-mgr@.service.d/ld-preload.conf"
echo " rm /usr/local/lib/librbd_fix.so"
echo " systemctl daemon-reload"
echo " systemctl restart ceph-mgr@${HOSTNAME}"

exit 0
fi
echo "Waiting... (attempt $i/6)"
sleep 5
done

echo -e "${RED}MGR failed to start with LD_PRELOAD fix${NC}"
echo "Check logs: journalctl -u ceph-mgr@${HOSTNAME} -n 50"
exit 1

Sometimes y'all scare me with how crafty you are. Very interesting script.

gpothier · Sep 15, 2025

Max Carrara said:
The (v2 of the) patch has been applied, the updated packages should now be available on our testing repositories.

If anyone's got the testing repo enabled, feel free to test it. Everything seemed stable for me, even after multiple apply-delete-apply cycles with kubectl, so should anyone be doing some testing, please let me know how it goes.

Which package(s) should we look for in the testing repo? I just enabled the repo, and I didn't see any newer version of CEph-related packages:

Code:

root@r630-1:~# apt list --upgradable
libnvpair3linux/stable 2.3.4-pve1 amd64 [upgradable from: 2.3.3-pve1]
libpve-common-perl/stable 9.0.10 all [upgradable from: 9.0.9]
libpve-network-api-perl/stable 1.1.7 all [upgradable from: 1.1.6]
libpve-network-perl/stable 1.1.7 all [upgradable from: 1.1.6]
librrd8t64/stable 1.7.2-4.2+pve3 amd64 [upgradable from: 1.7.2-4.2+pve2]
librrds-perl/stable 1.7.2-4.2+pve3 amd64 [upgradable from: 1.7.2-4.2+pve2]
libuutil3linux/stable 2.3.4-pve1 amd64 [upgradable from: 2.3.3-pve1]
libzfs6linux/stable 2.3.4-pve1 amd64 [upgradable from: 2.3.3-pve1]
libzpool6linux/stable 2.3.4-pve1 amd64 [upgradable from: 2.3.3-pve1]
lxc-pve/stable 6.0.5-1 amd64 [upgradable from: 6.0.4-2]
proxmox-kernel-6.14/stable 6.14.11-2 all [upgradable from: 6.14.8-2]
proxmox-kernel-helper/stable 9.0.4 all [upgradable from: 9.0.3]
pve-container/stable 6.0.12 all [upgradable from: 6.0.10]
pve-firmware/stable 3.16-4 all [upgradable from: 3.16-3]
pve-i18n/stable 3.6.0 all [upgradable from: 3.5.2]
pve-manager/stable 9.0.10 all [upgradable from: 9.0.6]
pve-yew-mobile-i18n/stable 3.6.0 all [upgradable from: 3.5.2]
qemu-server/stable 9.0.21 amd64 [upgradable from: 9.0.19]
rrdcached/stable 1.7.2-4.2+pve3 amd64 [upgradable from: 1.7.2-4.2+pve2]
zfs-initramfs/stable 2.3.4-pve1 all [upgradable from: 2.3.3-pve1]
zfs-zed/stable 2.3.4-pve1 amd64 [upgradable from: 2.3.3-pve1]
zfsutils-linux/stable 2.3.4-pve1 amd64 [upgradable from: 2.3.3-pve1]

Code:

root@r630-1:~# apt-cache policy ceph-mgr
ceph-mgr:
  Installed: 19.2.3-pve1
  Candidate: 19.2.3-pve1
  Version table:
 *** 19.2.3-pve1 500
        500 https://enterprise.proxmox.com/debian/ceph-squid trixie/enterprise amd64 Packages
        500 https://enterprise.proxmox.com/debian/pve trixie/pve-enterprise amd64 Packages
          1 http://download.proxmox.com/debian/pve trixie/pve-test amd64 Packages
        100 /var/lib/dpkg/status
     19.2.2-pve5 500
        500 https://enterprise.proxmox.com/debian/ceph-squid trixie/enterprise amd64 Packages
     19.2.2-pve2 500
        500 https://enterprise.proxmox.com/debian/pve trixie/pve-enterprise amd64 Packages
          1 http://download.proxmox.com/debian/pve trixie/pve-test amd64 Packages
     18.2.7+ds-1 500
        500 http://ftp.cl.debian.org/debian trixie/main amd64 Packages

Max Carrara · Sep 16, 2025

gpothier said:

Which package(s) should we look for in the testing repo? I just enabled the repo, and I didn't see any newer version of CEph-related packages:

Code:

root@r630-1:~# apt list --upgradable
libnvpair3linux/stable 2.3.4-pve1 amd64 [upgradable from: 2.3.3-pve1]
libpve-common-perl/stable 9.0.10 all [upgradable from: 9.0.9]
libpve-network-api-perl/stable 1.1.7 all [upgradable from: 1.1.6]
libpve-network-perl/stable 1.1.7 all [upgradable from: 1.1.6]
librrd8t64/stable 1.7.2-4.2+pve3 amd64 [upgradable from: 1.7.2-4.2+pve2]
librrds-perl/stable 1.7.2-4.2+pve3 amd64 [upgradable from: 1.7.2-4.2+pve2]
libuutil3linux/stable 2.3.4-pve1 amd64 [upgradable from: 2.3.3-pve1]
libzfs6linux/stable 2.3.4-pve1 amd64 [upgradable from: 2.3.3-pve1]
libzpool6linux/stable 2.3.4-pve1 amd64 [upgradable from: 2.3.3-pve1]
lxc-pve/stable 6.0.5-1 amd64 [upgradable from: 6.0.4-2]
proxmox-kernel-6.14/stable 6.14.11-2 all [upgradable from: 6.14.8-2]
proxmox-kernel-helper/stable 9.0.4 all [upgradable from: 9.0.3]
pve-container/stable 6.0.12 all [upgradable from: 6.0.10]
pve-firmware/stable 3.16-4 all [upgradable from: 3.16-3]
pve-i18n/stable 3.6.0 all [upgradable from: 3.5.2]
pve-manager/stable 9.0.10 all [upgradable from: 9.0.6]
pve-yew-mobile-i18n/stable 3.6.0 all [upgradable from: 3.5.2]
qemu-server/stable 9.0.21 amd64 [upgradable from: 9.0.19]
rrdcached/stable 1.7.2-4.2+pve3 amd64 [upgradable from: 1.7.2-4.2+pve2]
zfs-initramfs/stable 2.3.4-pve1 all [upgradable from: 2.3.3-pve1]
zfs-zed/stable 2.3.4-pve1 amd64 [upgradable from: 2.3.3-pve1]
zfsutils-linux/stable 2.3.4-pve1 amd64 [upgradable from: 2.3.3-pve1]

Code:

root@r630-1:~# apt-cache policy ceph-mgr
ceph-mgr:
  Installed: 19.2.3-pve1
  Candidate: 19.2.3-pve1
  Version table:
 *** 19.2.3-pve1 500
        500 https://enterprise.proxmox.com/debian/ceph-squid trixie/enterprise amd64 Packages
        500 https://enterprise.proxmox.com/debian/pve trixie/pve-enterprise amd64 Packages
          1 http://download.proxmox.com/debian/pve trixie/pve-test amd64 Packages
        100 /var/lib/dpkg/status
     19.2.2-pve5 500
        500 https://enterprise.proxmox.com/debian/ceph-squid trixie/enterprise amd64 Packages
     19.2.2-pve2 500
        500 https://enterprise.proxmox.com/debian/pve trixie/pve-enterprise amd64 Packages
          1 http://download.proxmox.com/debian/pve trixie/pve-test amd64 Packages
     18.2.7+ds-1 500
        500 http://ftp.cl.debian.org/debian trixie/main amd64 Packages

Hi! You should be seeing Ceph version 19.2.3-pve2. Since it doesn't show up: Do you have the Ceph test repo enabled? That's separate from pve-test. Otherwise, the entire .sources entry can be found in our docs.

For completeness's sake I wanna mention that the testing repositories are for testing purposes only, so for anyone reading this, please don't run them in prod.

eugene-bg · Sep 16, 2025

Max Carrara said:
anyone be doing some testing, please let me know how it goes.

Deployed a patch from Ceph test repo to both my test environments.
Issue is gone.
Everything is working as it should.

Max Carrara said:
until the underlying cause has been fixed for good

Is there a place I could keep tracking the underlying issue resolve progress?

Thanks

Max Carrara · Sep 17, 2025

eugene-bg said:
Deployed a patch from Ceph test repo to both my test environments.
Issue is gone.
Everything is working as it should.

Excellent! Thank you for testing this!

eugene-bg said:
Is there a place I could keep tracking the underlying issue resolve progress?

I guess the best place (currently) would be the bug I've opened upstream: https://tracker.ceph.com/issues/72713

Ceph Managers Seg Faulting Post Upgrade (8 -> 9 upgrade)

New Member

New Member

New Member

Member

Active Member

New Member

Well-Known Member

New Member

Renowned Member

Member

New Member

New Member

Well-Known Member

New Member

Well-Known Member

Well-Known Member

New Member

Well-Known Member

New Member

Well-Known Member

We value your privacy