Is the fix (from this linux-mm thread) included in
pve-kernel-6.2.16-5-pve
? I've still seen SIGSEGVs for pverados
after upgrading...pve-kernel-6.2.16-5-pve
? I've still seen SIGSEGVs for pverados
after upgrading...no, it's not. That version was bumped even before the issue was debugged. It will come in via stable patches eventually or if we backport it. But it's only a cosmetic issue, no actual problems except the faulty messages.Is the fix (from this linux-mm thread) included inpve-kernel-6.2.16-5-pve
? I've still seen SIGSEGVs forpverados
after upgrading...
[ 277.638986] pverados[19363]: segfault at 5588cfa55010 ip 00005588cb7fc09d sp 00007ffe2d1e6360 error 7 in perl[5588cb721000+195000] likely on CPU 28 (core 5, socket 0)
[ 277.638999] Code: 0f 95 c2 c1 e2 05 08 55 00 41 83 47 08 01 48 8b 53 08 22 42 23 0f b6 c0 66 89 45 02 49 8b 07 8b 78 60 48 8b 70 48 44 8d 6f 01 <44> 89 68 60 41 83 fd 01 0f 8f 4d 04 00 00 48 8b 56 08 49 63 c5 48
Yes, it might take a while until the fix comes in via stable backports. It's not a crucial issue after all, only cosmetic.fyi, still segfaults on 6.2.16-6-pve.
Code:[ 277.638986] pverados[19363]: segfault at 5588cfa55010 ip 00005588cb7fc09d sp 00007ffe2d1e6360 error 7 in perl[5588cb721000+195000] likely on CPU 28 (core 5, socket 0) [ 277.638999] Code: 0f 95 c2 c1 e2 05 08 55 00 41 83 47 08 01 48 8b 53 08 22 42 23 0f b6 c0 66 89 45 02 49 8b 07 8b 78 60 48 8b 70 48 44 8d 6f 01 <44> 89 68 60 41 83 fd 01 0f 8f 4d 04 00 00 48 8b 56 08 49 63 c5 48
The potential for the wrong logging is there in all kernels with this commit, i.e. starting fromThanks for quick reply @fiona.
Can you explain why I only see this on Clusters, that have been upgraded all the way up from 6.x to 8 but not on Clusters that were born as 7.x? I am just curious.
//edit: sorry, I have to correct myself. i also see this on clusters that came from 7.x.
regards.
6.2.16-4-pve
. It is racy, so if you don't see it on certain machines, you might just be lucky.6.2.16-10-pve too - but no functional consequences indeed. Still monitoring is complaining about all the "segfaults" in the kern.log6.2.16-8-pve still segfaults.
2023-08-31T05:11:25.094116+03:00 kettu ceph-crash[559]: WARNING:ceph-crash:post
/var/lib/ceph/crash/2023-08-31T02:00:53.493690Z_85057cb5-e910-495b-bf56-082c2af27a95
as client.crash.kettu failed: Error initializing cluster client:
ObjectNotFound('RADOS object not found (error calling conf_read_file)')
logcheck
to stay aware of unexpected events on our systems. While this new message is easy to filter out, the original one is not, as it is a multi-line message, and we would rather not filter out SIGSEGV messages in general.FYI, we did backport the fix and it will be included in the next kernel version (the one after 6.2.16-10-pve): https://git.proxmox.com/?p=pve-kernel.git;a=commit;h=762b8cebe9fc4cc39f34808d2820a95ea13adfaeWith 6.2.16-10-pve I started additionally getting these messages:
Code:2023-08-31T05:11:25.094116+03:00 kettu ceph-crash[559]: WARNING:ceph-crash:post /var/lib/ceph/crash/2023-08-31T02:00:53.493690Z_85057cb5-e910-495b-bf56-082c2af27a95 as client.crash.kettu failed: Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)')
We are using a package calledlogcheck
to stay aware of unexpected events on our systems. While this new message is easy to filter out, the original one is not, as it is a multi-line message, and we would rather not filter out SIGSEGV messages in general.
So while technically this is a "cosmetic" issue only, it is still impacting our daily ops, and we are looking forward to it being fixed.
Oh, sorry didn't realize this was a different issue at first. That has nothing to do with the kernel upgrade. Please see https://bugzilla.proxmox.com/show_bug.cgi?id=4759 for more information.With 6.2.16-10-pve I started additionally getting these messages:
Code:2023-08-31T05:11:25.094116+03:00 kettu ceph-crash[559]: WARNING:ceph-crash:post /var/lib/ceph/crash/2023-08-31T02:00:53.493690Z_85057cb5-e910-495b-bf56-082c2af27a95 as client.crash.kettu failed: Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)')
We are using a package calledlogcheck
to stay aware of unexpected events on our systems. While this new message is easy to filter out, the original one is not, as it is a multi-line message, and we would rather not filter out SIGSEGV messages in general.
So while technically this is a "cosmetic" issue only, it is still impacting our daily ops, and we are looking forward to it being fixed.
Oh, neither did I. Thank you for following up, Fiona.Oh, sorry didn't realize this was a different issue at first. That has nothing to do with the kernel upgrade.
please make sure you are using a kernel >= 6.2.16-11, i.e. install upgrades:I have a fresh installation of proxmox8 and I face same issue.
Thanks for replying :-Hi,
please make sure you are using a kernel >= 6.2.16-11, i.e. install upgrades:
https://pve.proxmox.com/wiki/Package_Repositories
https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#system_software_updates
Since you have three nodes, you can upgrade+reboot each node individually. Just make sure the reboot of a node is finished and all services e.g. for Ceph are started, before you reboot the next one.If i check it with correct command , I see that it is using 6.2.16-8 . It is a running cluster. If we have upupgade it to 6.2.16-11. What is the correct way to do this with out disturbing cluster.
Since you have three nodes, you can upgrade+reboot each node individually. Just make sure the reboot of a node is finished and all services e.g. for Ceph are started, before you reboot the next one.