pverados segfault

fyi, still segfaults on 6.2.16-6-pve.

Code:
[  277.638986] pverados[19363]: segfault at 5588cfa55010 ip 00005588cb7fc09d sp 00007ffe2d1e6360 error 7 in perl[5588cb721000+195000] likely on CPU 28 (core 5, socket 0)
[  277.638999] Code: 0f 95 c2 c1 e2 05 08 55 00 41 83 47 08 01 48 8b 53 08 22 42 23 0f b6 c0 66 89 45 02 49 8b 07 8b 78 60 48 8b 70 48 44 8d 6f 01 <44> 89 68 60 41 83 fd 01 0f 8f 4d 04 00 00 48 8b 56 08 49 63 c5 48
 
fyi, still segfaults on 6.2.16-6-pve.

Code:
[  277.638986] pverados[19363]: segfault at 5588cfa55010 ip 00005588cb7fc09d sp 00007ffe2d1e6360 error 7 in perl[5588cb721000+195000] likely on CPU 28 (core 5, socket 0)
[  277.638999] Code: 0f 95 c2 c1 e2 05 08 55 00 41 83 47 08 01 48 8b 53 08 22 42 23 0f b6 c0 66 89 45 02 49 8b 07 8b 78 60 48 8b 70 48 44 8d 6f 01 <44> 89 68 60 41 83 fd 01 0f 8f 4d 04 00 00 48 8b 56 08 49 63 c5 48
Yes, it might take a while until the fix comes in via stable backports. It's not a crucial issue after all, only cosmetic.
 
Thanks for quick reply @fiona.

Can you explain why I only see this on Clusters, that have been upgraded all the way up from 6.x to 8 but not on Clusters that were born as 7.x? I am just curious.

//edit: sorry, I have to correct myself. i also see this on clusters that came from 7.x.

regards.
 
Last edited:
Thanks for quick reply @fiona.

Can you explain why I only see this on Clusters, that have been upgraded all the way up from 6.x to 8 but not on Clusters that were born as 7.x? I am just curious.

//edit: sorry, I have to correct myself. i also see this on clusters that came from 7.x.

regards.
The potential for the wrong logging is there in all kernels with this commit, i.e. starting from 6.2.16-4-pve. It is racy, so if you don't see it on certain machines, you might just be lucky.
 
  • Like
Reactions: Lephisto
With 6.2.16-10-pve I started additionally getting these messages:

Code:
2023-08-31T05:11:25.094116+03:00 kettu ceph-crash[559]: WARNING:ceph-crash:post
 /var/lib/ceph/crash/2023-08-31T02:00:53.493690Z_85057cb5-e910-495b-bf56-082c2af27a95
 as client.crash.kettu failed: Error initializing cluster client:
 ObjectNotFound('RADOS object not found (error calling conf_read_file)')

We are using a package called logcheck to stay aware of unexpected events on our systems. While this new message is easy to filter out, the original one is not, as it is a multi-line message, and we would rather not filter out SIGSEGV messages in general.

So while technically this is a "cosmetic" issue only, it is still impacting our daily ops, and we are looking forward to it being fixed.
 
  • Like
Reactions: fhloston
Hi,
With 6.2.16-10-pve I started additionally getting these messages:

Code:
2023-08-31T05:11:25.094116+03:00 kettu ceph-crash[559]: WARNING:ceph-crash:post
 /var/lib/ceph/crash/2023-08-31T02:00:53.493690Z_85057cb5-e910-495b-bf56-082c2af27a95
 as client.crash.kettu failed: Error initializing cluster client:
 ObjectNotFound('RADOS object not found (error calling conf_read_file)')

We are using a package called logcheck to stay aware of unexpected events on our systems. While this new message is easy to filter out, the original one is not, as it is a multi-line message, and we would rather not filter out SIGSEGV messages in general.

So while technically this is a "cosmetic" issue only, it is still impacting our daily ops, and we are looking forward to it being fixed.
FYI, we did backport the fix and it will be included in the next kernel version (the one after 6.2.16-10-pve): https://git.proxmox.com/?p=pve-kernel.git;a=commit;h=762b8cebe9fc4cc39f34808d2820a95ea13adfae
 
  • Like
Reactions: Kimmo and fhloston
With 6.2.16-10-pve I started additionally getting these messages:

Code:
2023-08-31T05:11:25.094116+03:00 kettu ceph-crash[559]: WARNING:ceph-crash:post
 /var/lib/ceph/crash/2023-08-31T02:00:53.493690Z_85057cb5-e910-495b-bf56-082c2af27a95
 as client.crash.kettu failed: Error initializing cluster client:
 ObjectNotFound('RADOS object not found (error calling conf_read_file)')

We are using a package called logcheck to stay aware of unexpected events on our systems. While this new message is easy to filter out, the original one is not, as it is a multi-line message, and we would rather not filter out SIGSEGV messages in general.

So while technically this is a "cosmetic" issue only, it is still impacting our daily ops, and we are looking forward to it being fixed.
Oh, sorry didn't realize this was a different issue at first. That has nothing to do with the kernel upgrade. Please see https://bugzilla.proxmox.com/show_bug.cgi?id=4759 for more information.
 
Following is the error.
I was not able to use df command , when I start df it stuck in between .

2.743699] AppArmor: AppArmor Filesystem Enabled
[ 2.813210] ERST: Error Record Serialization Table (ERST) support is initialized.
[ 3.287863] RAS: Correctable Errors collector initialized.
[ 6.529780] EXT4-fs (dm-2): mounted filesystem 98630d6b-8864-4cdf-be53-bc0da31b6525 with ordered data mode. Quota mode: none.
[ 7.496684] ACPI Error: No handler for Region [SYSI] (0000000096bc81c9) [IPMI] (20221020/evregion-130)
[ 7.496789] ACPI Error: Region IPMI (ID=7) has no handler (20221020/exfldio-261)
[ 7.496894] ACPI Error: Aborting method \_SB.PMI0._GHL due to previous error (AE_NOT_EXIST) (20221020/psparse-529)
[ 7.496998] ACPI Error: Aborting method \_SB.PMI0._PMC due to previous error (AE_NOT_EXIST) (20221020/psparse-529)
[ 7.655095] ZFS: Loaded module v2.1.12-pve1, ZFS pool version 5000, ZFS filesystem version 5
[ 57.248652] usb 1-1.5: Failed to suspend device, error -71
[ 730.770999] pverados[8198]: segfault at 55b0f8c0a030 ip 000055b0f8c0a030 sp 00007ffdeebc9228 error 14 in perl[55b0f8bde000+195000] likely on CPU 61 (core 10, socket 1)
[ 750.313475] pverados[8280]: segfault at 55b0f8c0a030 ip 000055b0f8c0a030 sp 00007ffdeebc9228 error 14 in perl[55b0f8bde000+195000] likely on CPU 57 (core 8, socket 1)
[ 6950.517683] pverados[33951]: segfault at 55b0f8c0a030 ip 000055b0f8c0a030 sp 00007ffdeebc9228 error 14 in perl[55b0f8bde000+195000] likely on CPU 60 (core 10, socket 0)
 
Thanks for replying :-

I check the running kernel version its shows following o/p.



root@171:~# uname -a
Linux 171 6.2.16-8-pve #1 SMP PREEMPT_DYNAMIC PMX 6.2.16-8 (2023-08-02T12:17Z) x86_64 GNU/Linux

root@172:~# uname -a
Linux i172 6.2.16-8-pve #1 SMP PREEMPT_DYNAMIC PMX 6.2.16-8 (2023-08-02T12:17Z) x86_64 GNU/Linux

root@173:~# uname -a
Linux 173 6.2.16-8-pve #1 SMP PREEMPT_DYNAMIC PMX 6.2.16-8 (2023-08-02T12:17Z) x86_64 GNU/Linux

If i check it with correct command , I see that it is using 6.2.16-8 . It is a running cluster. If we have upupgade it to 6.2.16-11. What is the correct way to do this with out disturbing cluster.
 
If i check it with correct command , I see that it is using 6.2.16-8 . It is a running cluster. If we have upupgade it to 6.2.16-11. What is the correct way to do this with out disturbing cluster.
Since you have three nodes, you can upgrade+reboot each node individually. Just make sure the reboot of a node is finished and all services e.g. for Ceph are started, before you reboot the next one.
 
  • Like
Reactions: Ayush
Since you have three nodes, you can upgrade+reboot each node individually. Just make sure the reboot of a node is finished and all services e.g. for Ceph are started, before you reboot the next one.

After the upgrade , it seem to be fixed and I upgraded to the latest version on all the 3 nodes of proxmox. ;-

171 6.5.11-8-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-8 (2024-01-30T12:27Z) x86_64 GNU/Linux
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!