5.15.x Kernel and Issues

adamb · Jan 28, 2023

Hey guys. We have a 7 node cluster with roughly 600 VM's running. Cluster has been in place since Proxmox5.

3x DL 380 Gen9's
1x DL 380 Gen10
2x DL 560 Gen10's
1x Supermicro quad socket server with 3rd Gen Intel

We recently upgraded from Proxmox 7.1 to 7.3 which was also a move from the 5.13.x kernel to the 5.15.x kernel.

Last night one of our front ends hit a CPU lock up, crashed and was fenced. First time this has ever happened. It was one of the DL 560's with quad sockets.

It was running 5.15.74-1-pve, so I decided to atleast get them moved to 5.15.83-1-pve.

While doing so I realized that live migrating between front ends with different CPU's is resulting in the VM locking up, they need to be stopped then started to correct. All guests are running CentOS 7.9

All of the guests are configured to use Ivybridge as a common CPU type. They have been using Ivybridge since we set the cluster up on Proxmox5 many many years ago.

I tested SandyBridge, Skylake, all result in the same issue.

The only one that doesn't is kvm64.

Anyone else running into this issue?

I want to like Proxmox 7, but this has to be the worst release I've been apart of in the 10+ years I've been in this community. I really hope it gets better.

bbgeek17 · Jan 28, 2023

It sounds like you are running a serious production environment, since you are a subscriber I would be opening a support ticket.

In the meantime, could this be related to this bug https://bugzilla.proxmox.com/show_bug.cgi?id=4073 ?
Also for reference https://bugzilla.redhat.com/show_bug.cgi?id=1351442

I thought the fix was released, but I dont know for sure.

PS it looks like you need a 5.19 kernel if this is your issue.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

adamb · Jan 28, 2023

bbgeek17 said:
It sounds like you are running a serious production environment, since you are a subscriber I would be opening a support ticket.

In the meantime, could this be related to this bug https://bugzilla.proxmox.com/show_bug.cgi?id=4073 ?
Also for reference https://bugzilla.redhat.com/show_bug.cgi?id=1351442

I thought the fix was released, but I dont know for sure.

PS it looks like you need a 5.19 kernel if this is your issue.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

I like to check with the community as a first step, its never lead me wrong in the 10+ years we have been using proxmox.

I did find those bug as well. Getting chrony on all my nodes doesn't seem to help.

bbgeek17 · Jan 28, 2023

The first step would be to examine the VM logs and confirm whether the messages match the bug. If it does, then it's an interaction between hypervisor kernel/Qemu and VM kernel while handling CPU flags. It's unlikely to be influenced by userland time synchronization. I think, the chrony info in the bug is red herring.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

adamb · Jan 28, 2023

Its also worth mentioning.

DL 380 Gen9 = E5-2670
DL 560 Gen10 (2nd Gen Intel) = Xeon(R) Gold 6254
Supermicro (3rd gen Intel) = Xeon(R) Gold 6348H

I can live migrate from the E5-2670 to both the 2nd & 3rd gen intel's. I can live migrate between the 2nd and 3rd gen Intel's with no issue.

What I can't do is live migrate from either the 2nd gen Intel, or the 3rd Gen Intel back to the E5-2670.

t.lamprecht · Jan 28, 2023

adamb said:
What I can't do is live migrate from either the 2nd gen Intel, or the 3rd Gen Intel back to the E5-2670.

This is probably: https://forum.proxmox.com/threads/p...x-freeze-on-windows.109645/page-2#post-488479

But it's worth nothing that these are 2nd and 3rd gen scalable, not to be confused with xeon v2 / v3; and that the difference between those models, released over 7 years apart, are huge and that's why the regression hits you here - but for production setups you really get the best experience using homogenous HW.

adamb said:
I want to like Proxmox 7, but this has to be the worst release I've been apart of in the 10+ years I've been in this community. I really hope it gets better.

At this point this has to be a meme repeated at every major software release. But yeah, this one is a very unfortunate one, especially as it was triggered by a not so great backport by Intel/Ubuntu engineers for a CPU enablement that would not be released for then yet another year, and it put us a bit in a corner; as fixing it has severe implications to currently unaffected platforms.

adamb · Jan 28, 2023

t.lamprecht said:
This is probably: https://forum.proxmox.com/threads/p...x-freeze-on-windows.109645/page-2#post-488479

But it's worth nothing that these are 2nd and 3rd gen scalable, not to be confused with xeon v2 / v3; and that the difference between those models, released over 7 years apart, are huge and that's why the regression hits you here - but for production setups you really get the best experience using homogenous HW.

At this point this has to be a meme repeated at every major software release. But yeah, this one is a very unfortunate one, especially as it was triggered by a not so great backport by Intel/Ubuntu engineers for a CPU enablement that would not be released for then yet another year, and it put us a bit in a corner; as fixing it has severe implications to currently unaffected platforms.

Do you think this is something that will get resolved. I understand using homogenous HW makes sense, but in the real world it doesn't at all. The Gen9's are still solid servers.

t.lamprecht · Jan 28, 2023

adamb said:
Do you think this is something that will get resolved.

Cannot guarantee anything short term, one would need to implement some sort of opt-in behavior for the fix, either automatically or explicitly - but as the test backport of the fix was far from trivial, and we didn't really have any reports from enterprise covered setups yet about this IIRC (or they were satisfies with switching to the 6.1 opt-in kernel), the initiative for investing possible a lot of time in a risky fix wasn't done yet.

adamb said:
I understand using homogenous HW makes sense, but in the real world it doesn't at all. The Gen9's are still solid servers.

Sure does, note that I did never say that you should throw away the old ones or the like, but you can and should group matching servers together, they still could be in the same PVE cluster too.

adamb · Jan 28, 2023

t.lamprecht said:
Cannot guarantee anything short term, one would need to implement some sort of opt-in behavior for the fix, either automatically or explicitly - but as the test backport of the fix was far from trivial, and we didn't really have any reports from enterprise covered setups yet about this IIRC (or they were satisfies with switching to the 6.1 opt-in kernel), the initiative for investing possible a lot of time in a risky fix wasn't done yet.

Sure does, note that I did never say that you should throw away the old ones or the like, but you can and should group matching servers together, they still could be in the same PVE cluster too.

Appreciate the input.

I am going to do some testing with this.

https://pve.proxmox.com/wiki/Manual:_cpu-models.conf

Mainly because kvm64 is working 100%, makes me think its a specific flag I might be able to narrow down and simply do away with.

t.lamprecht · Jan 28, 2023

adamb said:
Mainly because kvm64 is working 100%, makes me think its a specific flag I might be able to narrow down and simply do away with.

Hmm, tbh. I'm not 100% sure that it'd be then that bug; On the one hand, I have some faint recollection that this was guest vCPU model independent, but on the other, 1) it's quite a bit ago since I checked it out more closely and 2) when actually re-checking my linked thread I did not find anyone mention it happening with kvm64...

Would be great if you could report any findings back, maybe it helps to bring up some new idea how to fix it, or to clarify that your issue is another one.

adamb · Jan 28, 2023

t.lamprecht said:
Hmm, tbh. I'm not 100% sure that it'd be then that bug; On the one hand, I have some faint recollection that this was CPU model independent, but on the other, 1) it's quite a bit ago since I checked it out more closely and 2) when actually re-checking my linked thread I did not find anyone mention it happening with kvm64...

Would be great if you could report any findings back, maybe it helps to bring up some new idea how to fix it, or to clarify that your issue is another one.

I will definitely update this thread with what I find. Going to go watch avatar and enjoy the weekend. Really appreciate the input!

adamb · Jan 30, 2023

Well I got a bit ahead of myself. Doing more testing with kvm64 it has the same issue as other CPU types.

I can migrate from older CPU types to newer CPU types, but not from newer CPU's to older CPU's.

Working on testing 5.19.x now.

adamb · Jan 30, 2023

Can confirm that both the 5.19.x and 6.1 kernels correct the issue 100%.

I am a bit confused at this point.

@t.lamprecht

What is your stance on this? In one thread and this one you indicate homogeneous is the best bet, but the fact is some of my clusters are now a decade old and have never had a issue with this.

You guys released 7.3.x based on the 5.15 kernel and in my personal opinion it was a mistake. This isn't just a small bug, I am going to go out on a limb, but this bug directly effects some of the largest production clusters running proxmox. I highly doubt the larger clusters are running all homogeneous clusters.

At this point I think you guys need to take a serious look at moving away from 5.15.x, live migration is critically important to production environments, which is what most of us are paying for.

I hope you guys take this serious, this bug alone should be enough to move away from 5.15.x.

t.lamprecht · Jan 30, 2023

adamb said:
In one thread and this one you indicate homogeneous is the best bet,

It isn't only the best bet, it's the only one that really can be guaranteed for. All mixed variants have some non-zero chance of incompat; with more difference between HW relating to higher incompat chance. Also note, 1) new kernels will enable newer features in HW (one just cannot virtualize away all (side)effects) and 2) have also a higher (even though not so big) likelihood that decade old HW gets regressions, as most kernel devs just don't use that anymore, so just because it worked for some time it doesn't mean that it has to, always - as then we'd be stuck forever at the same level w.r.t. performances and features.

adamb said:
You guys released 7.3.x based on the 5.15 kernel and in my personal opinion it was a mistake.

Note that this wasn't an issue with initial 5.15, as said it snuck in with the Intel Saphire Rapids enablement and nobody complained during moving this slowly through PVE's upgrade channels, from test to no-sub; that means that this was hardly affecting a significant amount of setups as otherwise the forum would have been full of threads reporting that overnight.
If you have interest in reducing the chance of your heterogeneous setup breaking I'd recommend setting up a test lab using pvetest repository, as then such things can be caught earlier - there so many HW & use case combinations out there that we just really cannot cover them all.

adamb said:
I highly doubt the larger clusters are running all homogeneous clusters.

We know for a fact (e.g., through our enterprise support channel) that most do, and from those setups with mixed HW group them by generation - e.g., with HA you can use the HA groups to avoid any automatic live migration between two (or more) partitions of nodes.

adamb said:
live migration is critically important to production environments, which is what most of us are paying for.

The subscription just cannot do wonders to guarantee that mixing HW that are multiple generation apart will always work, that's why it's recommended to use homogenous enterprise HW.

adamb said:
I hope you guys take this serious,

We take it serious, but as said, it sadly wasn't caught in the safety nets and until now we had not really any case reported via enterprise support channels IIRC, so this was classified as low impact in practice for enterprise setups.

adamb said:
this bug alone should be enough to move away from 5.15.x.

You can if you want by using the official, but opt-in 6.1 based kernel.

But, actually I still had one idea left lingering since a bit, and build a kernel with that applied, its downloadable here:
http://download.proxmox.com/temp/pve-kernel-5.15.83-fpu-amx-revert/

sha256 sums:

Code:

19046eec93537e47cbb41b006e74ef14b58c6d9bdef8e1f76345ef2c5cd3385b  linux-tools-5.15_5.15.83-1_amd64.deb
950fe6a4ccaad419e1319b84322d6aeba6eca7b9c4cf5feb2f9c079715e31a56  pve-headers-5.15.83-fpu-amx-revert-1-pve_5.15.83-1_amd64.deb
17577bcc242cef680a83133043fe84b960aedfa23c4b37b133a827a7182016ca  pve-kernel-5.15.83-fpu-amx-revert-1-pve_5.15.83-1_amd64.deb

Testing it would help to drive a possible solution forward, so I'd appreciated having feedback on that.
Note that I'm mostly interested in migrations between nodes with a big generation difference that both have booted the linked kernel, but information in addition to that, e.g. about behavior from guest coming (or going) from/to nodes with other kernel booted (e.g. plain 5.15) is naturally appreciated too.

adamb · Jan 30, 2023

t.lamprecht said:
It isn't only the best bet, it's the only one that really can be guaranteed for. All mixed variants have some non-zero chance of incompat; with more difference between HW relating to higher incompat chance. Also note, 1) new kernels will enable newer features in HW (one just cannot virtualize away all (side)effects) and 2) have also a higher (even though not so big) likelihood that decade old HW gets regressions, as most kernel devs just don't use that anymore, so just because it worked for some time it doesn't mean that it has to, always - as then we'd be stuck forever at the same level w.r.t. performances and features.

Note that this wasn't an issue with initial 5.15, as said it snuck in with the Intel Saphire Rapids enablement and nobody complained during moving this slowly through PVE's upgrade channels, from test to no-sub; that means that this was hardly affecting a significant amount of setups as otherwise the forum would have been full of threads reporting that overnight.
If you have interest in reducing the chance of your heterogeneous setup breaking I'd recommend setting up a test lab using pvetest repository, as then such things can be caught earlier - there so many HW & use case combinations out there that we just really cannot cover them all.

We know for a fact (e.g., through our enterprise support channel) that most do, and from those setups with mixed HW group them by generation - e.g., with HA you can use the HA groups to avoid any automatic live migration between two (or more) partitions of nodes.

The subscription just cannot do wonders to guarantee that mixing HW that are multiple generation apart will always work, that's why it's recommended to use homogenous enterprise HW.

We take it serious, but as said, it sadly wasn't caught in the safety nets and until now we had not really any case reported via enterprise support channels IIRC, so this was classified as low impact in practice for enterprise setups.

You can if you want by using the official, but opt-in 6.1 based kernel.

But, actually I still had one idea left lingering since a bit, and build a kernel with that applied, its downloadable here:
http://download.proxmox.com/temp/pve-kernel-5.15.83-fpu-amx-revert/

sha256 sums:

Code:

19046eec93537e47cbb41b006e74ef14b58c6d9bdef8e1f76345ef2c5cd3385b linux-tools-5.15_5.15.83-1_amd64.deb 950fe6a4ccaad419e1319b84322d6aeba6eca7b9c4cf5feb2f9c079715e31a56 pve-headers-5.15.83-fpu-amx-revert-1-pve_5.15.83-1_amd64.deb 17577bcc242cef680a83133043fe84b960aedfa23c4b37b133a827a7182016ca pve-kernel-5.15.83-fpu-amx-revert-1-pve_5.15.83-1_amd64.deb

Testing it would help to drive a possible solution forward, so I'd appreciated having feedback on that.
Note that I'm mostly interested in migrations between nodes with a big generation difference that both have booted the linked kernel, but information in addition to that, e.g. about behavior from guest coming (or going) from/to nodes with other kernel booted (e.g. plain 5.15) is naturally appreciated too.

Its simply not what we were sold on with proxmox. Or my experience seeing as I have been here since Proxmox 2.x. This is definitely a new position for you guys within the last couple of years, but its not a position you were in 5-6-7 years ago. The idea was we could run without homogeneous hardware.

We have been solid for almost a decade on our setup. The link you provided in a previous post is enough to know that there are plenty of production environments out there without homogeneous hardware. I highly doubt companies can just replace entire clusters when new models come out. HAA groups are great, but they do not solve all the issues (for example we have one older model left in a cluster)

I understand it wasn't caught in your safety nets at first, but now its very obvious and it seems you guys aren't making the proper moves to correct the issue. To me 5.15.x should be dumped as live migration between generations is critical to production.

I will do some testing with the 5.15.x kernel you linked in now. I am all for helping you guys as I always have been (feel free to look at my post history and ticket history).

However, this is still the biggest bomb we have ever experienced with proxmox and I don't think we are average customers. We have been here for a very long time, like I said, clusters that have been around for almost a decade at this point.

adamb · Jan 31, 2023

Can confirm that pve-kernel-5.15.39-3-pve-guest-fpu_5.15.39-3_amd64.deb also solves the issue for me.

t.lamprecht · Feb 1, 2023

adamb said:
Its simply not what we were sold on with proxmox
The idea was we could run without homogeneous hardware.

We never guaranteed live-migration between arbitrary HW, or where were you sold on that? And you can run in on homogeneuos HW just like you can do so on commodity/consumer HW, and for 99% of Proxmox VE's huge array of features it won't matter, but you cannot expect live-migration to work out of the Box always between an arbitrary CPU generation difference. Nobody is able to guarantee you that for all possible HW combinations, if they do guarantee it, they either lie or have no idea.

What I say is, if live-migration is a must-have feature, and you want to be the certain it works then using as identical as possible HW is the best bet, and as said, this does not mean to abandon old HW. I certainly never communicated otherwise (quite the contrary actually), and I work at Proxmox VE for ~ 8 years.

adamb said:
We have been solid for almost a decade on our setup.

The 6348H CPU you mentioned was only released in 2020 Q2, so if you mean that setup it just could not have worked for over a decade.

adamb said:
I highly doubt companies can just replace entire clusters when new models come out

No, that's also nothing I ever stated! As said, bigger enterprise user group their clusters by HW, and if their small/mid-size they have those nodes in the same cluster but ensure live-migration only happens between identical, or at least somewhat from the same generation of fitting nodes. If you only got one left that's naturally a bummer.

adamb said:
To me 5.15.x should be dumped as live

If we'd always "dump" a kernel due to a handful reports out of our high six figure PVE 7.x use base we'd probably have no kernel at all.

adamb said:
as live migration between generations is critical to production.

Sure, but production critical systems need to consist of fitting HW for that to be guaranteed so, else it'll be the best effort and, yes, as it works surprisingly good most of the time even though CPU's with about a decade difference are mixed, it can come also to a surprise if it doesn't.

adamb said:
I will do some testing with the 5.15.x kernel you linked in now. I am all for helping you guys as I always have been (feel free to look at my post history and ticket history).

However, this is still the biggest bomb we have ever experienced with proxmox and I don't think we are average customers. We have been here for a very long time, like I said, clusters that have been around for almost a decade at this point.

Please don't get me wrong, I certainly recognize your username and appreciate using Proxmox VE for such a long time; that's why I spent extra time to communicate my point w.r.t. homogenous HW which I wouldn't bother otherwise in the community forum, and that's why I also implemented the other idea I hand lingering in my mind on Monday and provided it for testing. Ideally we get this to work again on 5.15 without causing a fall-out on upgrading to a fixed version, note that I never stated that we don't want to do so.

adamb said:
Can confirm that pve-kernel-5.15.39-3-pve-guest-fpu_5.15.39-3_amd64.deb also solves the issue for me.

This is the one from the other thread, not the one I posted on Monday in this thread - the latter would be much more simple to upgrade too, so getting feedback on that would be great.

adamb · Feb 1, 2023

I do appreciate your time, im not trying to beat you guys up. It was a bug we didn't expect, when you go a decade without live migration issues, it left a bad taste in our mouth.

Testing this one right now

http://download.proxmox.com/temp/pve-kernel-5.15.83-fpu-amx-revert/

I will report back shortly.

adamb · Feb 2, 2023

Unfortunately the 5.15.83-fpu-amx-revert-1-pve kernel still has the same issue for me.

spirit · Feb 2, 2023

Personnaly, I'm still running my prod on 5.13, and will skip 5.15 for 5.19 or 6.1.

I have try 5.15 and have vm hanging after migration, between epyc v2 and epyc v3.
No problem with 5.19 or 6.1.

5.15.x Kernel and Issues

Famous Member

Distinguished Member

Famous Member

Distinguished Member

Famous Member

Proxmox Staff Member

Famous Member

Proxmox Staff Member

Famous Member

Proxmox Staff Member

Famous Member

Famous Member

Famous Member

@t.lamprecht​

Proxmox Staff Member

Famous Member

Famous Member

Proxmox Staff Member

Famous Member

Famous Member

Distinguished Member

We value your privacy

@t.lamprecht