Proxmox 8 Ceph Quincy monitor no longer working on AMD Opteron 2427

jdancer · Jun 26, 2023

Yeah, yeah, I know. EOL CPU.

Ceph was working fine under Proxmox 7 using the same CPU.

I did pve7to8 upgrade and a clean install of Proxmox 8.

Both situations got the 'Caught signal (illegal instruction)' when attempting to start up a Ceph monitor.

It's either pointing to a bad binary or re-compile.

I've already posted my findings in the Proxmox VE 8.0 released! thread.

Has anyone done a clean install of Proxmox 8 Ceph on any other CPU?

t.lamprecht · Jun 27, 2023

As mentioned in the other thread, your CPU is over 14 years old and quite probably neither compiler projects, library developers nor Ceph or us test on that old HW, so glitches in there can go unnoticed. I'd recommend installing the newest BIOS/Firmware and CPU microcode available for that platform, with a bit of luck this makes the CPU compatible enough.

If that doesn't help, I'd recommend either trying to debug this on your own to get out more detail on what instruction causes this, e.g. checking the kernel log and maybe using gdb, maybe you can get out something to relay to compiler devs to get this fixed.
Otherwise, you can still run Proxmox VE 7 on that system if it worked, it's supported for roughly a year.
And afterwards, the security state of the OS is probably not mattering a lot in this case with the age and EOL state of that HW it's a leaky bucket anyway.

kifeo · Jun 29, 2023

I hope there will be some improvements here, as it seems to be the same issue with the AMD N40L/N54L :
AMD Turion(tm) II Neo N40L Dual-Core Processor and AMD Turion(tm) II Neo N54L Dual-Core Processor

No new firmware now

t.lamprecht · Jun 29, 2023

FWIW, the error seems to come from the GF-Complete libraries' initialization, namely:
https://github.com/ceph/gf-complete/blob/a6862d10c9db467148f20eef2c6445ac9afd94d8/src/gf.c#L474

This does a lot of lower level things, including some inline assembler – but also the C(++) code could be compiled down differently with newer compiler from Debian Bookworm.

Currently, we have no high-level enterprise support request for this, so we cannot allocate any time into investigating this more closely, I'm afraid. But if the community finds something, maybe even a fix, we'll gladly incorporate it.

dwm · Aug 1, 2023

t.lamprecht said:
FWIW, the error seems to come from the GF-Complete libraries' initialization, namely:
https://github.com/ceph/gf-complete/blob/a6862d10c9db467148f20eef2c6445ac9afd94d8/src/gf.c#L474

This does a lot of lower level things, including some inline assembler – but also the C(++) code could be compiled down differently with newer compiler from Debian Bookworm.

Currently, we have no high-level enterprise support request for this, so we cannot allocate any time into investigating this more closely, I'm afraid. But if the community finds something, maybe even a fix, we'll gladly incorporate it.

Looking a little deeper, it appears that modern GCC may over-optimise gf-complete and emit instructions that may not be present on older AMD64 CPUs: Debian bug https://bugs.debian.org/1012935.

The fix adopted by Debian is apparently to ensure that GCC is invoked with -O1: https://salsa.debian.org/openstack-...mmit/7751c075f868bf95873c6739d0d942f2a668c58f

The trick will be to inject the option into the Ceph CMake build system in the right place. By inspection of https://github.com/ceph/ceph/blob/4.../src/erasure-code/jerasure/CMakeLists.txt#L70, it looks like you might be able to cause this by patching CMakeLists.txt to inject the value "-O1" into the list of compiler flags?

(Of course, this might have performance implications, except reportedly the gf-complete library dynamically uses optimisations at runtime?)

The Ceph developers are busy doing the final go/no-go release process for Reef, but perhaps once that's settled down, they might be interested in accepting a patch upstream…

Diego Aguirre · Aug 23, 2023

Same issue with Intel(R) Xeon(R) CPU E5320 @ 1.86GHz

Diego Aguirre · Aug 28, 2023

There is a way to test if my CPU is compatible? I have several others clusters to upgrade!

t.lamprecht · Aug 28, 2023

Diego Aguirre said:
Same issue with Intel(R) Xeon(R) CPU E5320 @ 1.86GHz

That CPU is from 2006, we do not have any test HW for such old HW around, our oldest test HW is "already" based on the Nehalem microarchitecture, which should have already a new enough SSE4.2 support to not be affected, will see if we get a bit of spare time to hook it up and boot it again with the newest Proxmox VE release to verify if it would work there already.

Diego Aguirre said:
There is a way to test if my CPU is compatible?

As rough heuristic: If it was released in the last decade (i.e., 2010+) it should be OK.
Slightly more specifically, if it supports SSE 4.1 it should be fine, FWICT for Intel that would be the Penryn microarchitecture (~ 2007) and for AMD it would be the Bulldozer microarchitecture (~ 2011).

I currently only know of one possible workaround: reducing the optimization level on compilation to -O1, which would mean a quite high performance impact for the majority of our users, which we'd rather like to avoid. Ideally this would be fixed (or the fix backported) in GCC-12.

Kachidoki · Aug 29, 2023

Hello.
Just registered to report that happened to me with an "AMD Phenom II X6 1090T Black Edition" too (2010/2011). Stars microarchitecture, right before Bulldozer.

Unfortunately, it happened on a single node ceph. I am aware this is not the best scenario, but I am ok with that since the failure domain is equivalent to a simple NAS.
Fortunately, this is still a test machine, no critical data, mostly duplicated from my old main NAS and I do have backups.

Despite all the crash tests I've ran, I've never been able to loose a single data, ceph is so resilient and rock solid.
But I never thought a major update would break it down such hardly.

To continue my test as if it was a real life scenario, I will try to recover this node, at least partially. Downgrade seems not possible or way too risky, I don't want to worsen my test case right away.

t.lamprecht said:
I currently only know of one possible workaround: reducing the optimization level on compilation to -O1, which would mean a quite high performance impact for the majority of our users, which we'd rather like to avoid. Ideally this would be fixed (or the fix backported) in GCC-12.

I agree this should be avoided. Maybe at least can you indicate a way to rebuild ceph with -O1 for this minority of users? Github readme is enough? Especially in regard of settings, to run the own-compiled version in place of the stock version and use the existing node configuration.
I would like to be able to run ceph, at least temporarily, to prove that data are recoverable even in such a doomed case.

Best regards.

t.lamprecht · Aug 30, 2023

Kachidoki said:
Maybe at least can you indicate a way to rebuild ceph with -O1 for this minority of users? Github readme is enough?

Our main source mirror is on git.proxmox.com, GitHub is at best a read only mirror w.r.t to our projects.

So you'd need to clone the current branch for Ceph 17.2 Quincy for Proxmox VE 8:
git clone -b quincy-stable-8 git://git.proxmox.com/git/ceph.git

Then add something like this change:
https://salsa.debian.org/openstack-...mmit/03e0314af5e814a7ef74dcf4f9416d60c6322e51

Then install all the build dependencies, this can be automated via

Code:

apt install devscripts
mk-build-deps -ir

Start the build with:

Code:

# Disable building the dbgsym packages (they're huge)
export DEB_BUILD_OPTIONS=noautodbgsym
make deb

A few tips and disclaimers:

I did not build with above linked d/rules change, it should be enough, but I cannot guarantee so, I'm afraid.
Ceph builds are huge and use up lots of resources.
Especially memory usage scales with higher parallelism level, i.e., you need almost 2 GB per thread for build and linkage.
Depending on your HW a build can take quite a bit of time, here it takes something around 30 and 45 minutes with 56 cores assigned from a dual EPYC 7351 system.

Oh, and worth a try could be also trying to add -O1 as compile flag to just the ceph/src/erasure-code/jerasure/CMakeLists.txt file before hitting make deb. That would limit using the worse optimization level to just erasure coding, but from top of my head I'm not 100% sure if there won't be any other fallout then.

Kachidoki · Aug 31, 2023

Thank you @t.lamprecht for this guideline. I am currently trying to follow it (embedded sw dev speaking, not used to such high level builds).
Just to add a new limitation, I wanted to prepare the build environment under a debian 12 VM on the node itself (using a local storage as ceph is down), setting the CPU type as host to be sure gcc take the actual cpu type. But at a moment when I deleted and recreated the VM, I forgot to set "host" and left "x86-64-v2-AES", which drove me to an new error regarding the SSE instruction set:

Just wanted to let you known this.

t.lamprecht · Aug 31, 2023

Kachidoki said:
I forgot to set "host" and left "x86-64-v2-AES", which drove me to an new error regarding the SSE instruction set:

Yes, this is known. When we updated the default CPU type that is selected in the web UI's VM creation wizard from using the very old, and now deprecated, kvm64 to something newer with the Proxmox VE 8.0 release two months ago, we had to make a call with trade-offs for which x86 psABI model to use. After quite some discussion we decided to using the v2 level as all but relatively ancient CPUs support that. While it's a tradeoff, the VMs now get AES and SSE support enabled by default, if the host CPU supports it, which increases performance a lot for a broad array of applications and was something that (especially newer) users overlooked when starting out.

Kachidoki · Sep 5, 2023

Hello,

t.lamprecht said:
you need almost 2 GB per thread for build and linkage.
Depending on your HW a build can take quite a bit of time, here it takes something around 30 and 45 minutes with 56 cores assigned from a dual EPYC 7351 system.

I was unable to finalize the build under my VM, even with only one thread and 8GB allocated to the VM on this 6 core / 16GB node... Don't know exactly why, but OOM killed my qemu each time, and yet there is nothing else running but proxmox itself.

Then, I decided to stop playing and go to a more serious path. I used a fraction of an EPYC 7742 I had around, 32 cores allocated to a vmware guest, with 4GB per thread. The build is a lot faster for sure, but most importantly, it finished! Then I prepared several variants:

default setting build, just to be sure the build is ok...
inserting -O1 into CMakeList.txt only to CFLAGS
inserting -O1 into debian/rules only to CFLAGS
inserting -O1 into debian/rules to CFLAGS, CXXFLAGS, CPPFLAGS

I am almost sure to have seen -O1 into the parameters on the output console, at least for the last variant.

Now I maybe need a bit more advice on how to install the resulting .deb properly, and especially to be sure this is my own build that is used, because I suspect apt to re-use cache, or even re-download package. I tried to install using dpkg -i too, and even tried to unpack manually ceph-mon binary alone.
In each cases, the behavior stay the same, ceph-mon get an illegal instruction signal.

It was fun and I am so close, please help for the very last step.

t.lamprecht · Sep 6, 2023

Hmm, can you please post the full diff of your changes here in [code][/code] tags? As otherwise is hard to tell if there might be an error at your side or mine.

What you can also try is using this adding the following to ceph/debian/rules:

Code:

export DEB_CFLAGS_MAINT_APPEND = -O1
export DEB_CXXFLAGS_MAINT_APPEND = -O1

Note also that the ceph directory is only copied the first time, after that you need to run make clean before retrying a build with another change.

Kachidoki · Sep 7, 2023

t.lamprecht said:
you need to run make clean before retrying a build with another change

Re-started with your last proposal, from scratch, make clean, git reset --hard, git clean, and this time I used make deb 2>&1 | tee ~/make.log. I think make clean between trials was the missing part.

After log analysis, it appears that the optimization flag is taken into account, I can see normal xFLAGS with -O2 followed by the -O1 append. Normally the last one is used.

Then I tried apt reinstall ./ceph-mon_17.2.6-pve1+3_amd64.deb that has confirmed to use the right path. Don't remember exactly, but it didn't worked as-is.
Finally I just brute-forced the thing by rm ./*-dbg*.deb and apt reinstall ./*.deb, which failed at some point because of non existing /home/cephadm/.ssh, then I mkdir -p /home/cephadm and re-installed the package again.

And... that's it. ceph-mon is alive. BTW I had this issue too, "solved" by disabling restful module. After letting the missed scrub job finish its pass, ceph is now up and heathly. I can access the data like nothing happened.
Pretty dirty, but as it was only for fun, I am happy and had a lot of fun.

Now I can tell that ceph is really, really bullet proof.

Next I have to find a newer test machine (maybe my FX8350? pretty old too...) as replacement, and find what to do with the older one.

Oh, and it would be nice if pve7to8 could catch this problem BEFORE people pull the upgrade trigger.

Last thing, if needed, I can do more tests with this machine, even reinstall pve8 from scratch or do a pve7 to 8 upgrade and so on...

Thank you very much @t.lamprecht for the follow-up, it was very appreciable and instructive.

t.lamprecht · Sep 7, 2023

Kachidoki said:
And... that's it. ceph-mon is alive. BTW I had this issue too, "solved" by disabling restful module. After letting the missed scrub job finish its pass, ceph is now up and heathly. I can access the data like nothing happened.

Great to hear! So what way did you set the flag in the end? For all of ceph or just the erasure-coding/gf-complete parts?

Kachidoki said:
Then I tried apt reinstall ./ceph-mon_17.2.6-pve1+3_amd64.deb that has confirmed to use the right path. Don't remember exactly, but it didn't worked as-is.
Finally I just brute-forced the thing by rm ./*-dbg*.deb and apt reinstall ./*.deb, which failed at some point because of non existing /home/cephadm/.ssh, then I mkdir -p /home/cephadm and re-installed the package again.

If you need to do such a thing in the future, here's how you can make a local repo which you can use for a normal upgrade:

bump the version in the changelog, normally that's in debian/changelog, but here for ceph we take over most of upstream packaging and so the changelog is located in changelog.Debian. Quickest way is adding a +1 to the end of the version located in the first line at the top.
cleanly build
copy the packages on the host from where they should be installed
cd into that directory and run dpkg-scanpackages . >Packages
Add a repo entry for this to e.g. /etc/apt/sources.list like:
deb [trusted=yes] file:///path/to/pkg-archive ./
do a standard apt update followed by apt full-upgrade

Kachidoki said:
Oh, and it would be nice if pve7to8 could catch this problem BEFORE people pull the upgrade trigger.

Yes it definititefly would be good to do that until we can fix this more cleanly.
Would you mind opening a enhancement request for this over at out Bugzilla so we keep track of it:
https://bugzilla.proxmox.com/

Kachidoki · Sep 7, 2023

t.lamprecht said:
So what way did you set the flag in the end? For all of ceph or just the erasure-coding/gf-complete parts?

The last one, adding this to ceph/debian/rules:

Code:

export DEB_CFLAGS_MAINT_APPEND = -O1
export DEB_CXXFLAGS_MAINT_APPEND = -O1

BUT, for the sake of completeness, and now that I have more experience in local package management thanks to your precious help, I will retry by restoring the official packages, limiting the flag to ceph/src/erasure-code/jerasure/CMakeLists.txt, bumping the version and so on.

t.lamprecht said:
Would you mind opening a enhancement request for this over at out Bugzilla so we keep track of it:
https://bugzilla.proxmox.com/

Sure, I am checking this out right now!
EDIT: #4953

Kachidoki · Sep 9, 2023

t.lamprecht said:
So what way did you set the flag in the end? For all of ceph or just the erasure-coding/gf-complete parts?

Finally did the test by adding -O1 to ceph/src/erasure-code/jerasure/CMakeLists.txt COMPILE_FLAGS only. I paid attention to have a clean build and deployment this time. Obviously I restored the upstream packages in first place and confirmed the issue came back before this test.

It seems to be enough, ceph is working with this tiny bit fix.

I did played a little bit by adding a seventh osd, let rebalancing, read and write to CephFS and Ceph RBD, so far so good. However I do not use much advanced features.
Can't tell about the performance decrease, I don't really care and the system was already relatively slow before.

kifeo · Nov 8, 2023

kifeo said:
I hope there will be some improvements here, as it seems to be the same issue with the AMD N40L/N54L :
AMD Turion(tm) II Neo N40L Dual-Core Processor and AMD Turion(tm) II Neo N54L Dual-Core Processor

No new firmware now

The issue on the HP N54L is resolved with 17.2.7

t.lamprecht · Nov 8, 2023

Yes, that includes an explicit workaround for this issue:
https://git.proxmox.com/?p=ceph.git;a=commitdiff;h=e26c9c960404308acf1419d8d77a80cb1abdf4e4

Proxmox 8 Ceph Quincy monitor no longer working on AMD Opteron 2427

Renowned Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

New Member

Renowned Member

Renowned Member

Proxmox Staff Member

New Member

Proxmox Staff Member

New Member

Proxmox Staff Member

New Member

Proxmox Staff Member

New Member

Proxmox Staff Member

New Member

New Member

Well-Known Member

Proxmox Staff Member

We value your privacy