Random segmentation faults

baudinpr

Member
Feb 14, 2011
40
0
6
Hi all,

I'm running PVE versione 1.7 with the last update before version 1.8
The server has 2 Intel(R) Xeon(R) CPU E5420 @ 2.50GHz (8 core) and 16 Gb of RAM.
The kernel version is: 2.6.32-4-pve
About the datastore, this is the situation:

Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/pve-root 99083868 1563196 92487508 2% /
tmpfs 8202660 0 8202660 0% /lib/init/rw
udev 10240 708 9532 7% /dev
tmpfs 8202660 0 8202660 0% /dev/shm
/dev/mapper/pve-data 601871232 3017620 598853612 1% /var/lib/vz
/dev/mapper/pve2-data
3756925768 39512176 3526572760 2% /var/lib/vz2
(I have added a pve2-data to the default pve-data...)

There is 1 virtual machine running WIN2008 Ent 32 bit using 4 Gb of ram, 4 core and the VM is working well.

The problem is:
In a rondom way, I have some segmentation faults on the host PVE Linux...
This is a running example:
hostname:~# find
Segmentation fault

Doing a strace of "find":
hostname:~# strace find
execve("/usr/bin/find", ["find"], [/* 15 vars */]) = 0
brk(0) = 0x1970000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f58d7174000
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f58d7172000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=15989, ...}) = 0
mmap(NULL, 15989, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f58d716e000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/librt.so.1", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0@#\0\0\0\0\0\0@"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=35784, ...}) = 0
mmap(NULL, 2132968, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f58d6d51000
mprotect(0x7f58d6d59000, 2093056, PROT_NONE) = 0
mmap(0x7f58d6f58000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x7000) = 0x7f58d6f58000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/libm.so.6", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P>\0\0\0\0\0\0@"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=534736, ...}) = 0
mmap(NULL, 2629848, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f58d6ace000
mprotect(0x7f58d6b50000, 2093056, PROT_NONE) = 0
mmap(0x7f58d6d4f000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x81000) = 0x7f58d6d4f000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/libc.so.6", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\300\342\1\0\0\0\0\0@"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1375536, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f58d716d000
mmap(NULL, 3482232, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f58d677b000
mprotect(0x7f58d68c5000, 2093056, PROT_NONE) = 0
mmap(0x7f58d6ac4000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x149000) = 0x7f58d6ac4000
mmap(0x7f58d6ac9000, 17016, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f58d6ac9000
close(3) = 0
access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
open("/lib/libpthread.so.0", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320W\0\0\0\0\0\0@"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=130114, ...}) = 0
mmap(NULL, 2208624, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f58d655f000
mprotect(0x7f58d6575000, 2097152, PROT_NONE) = 0
mmap(0x7f58d6775000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16000) = 0x7f58d6775000
mmap(0x7f58d6777000, 13168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f58d6777000
close(3) = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++

I don't understand why this could happen.
The same hardware with a Debian Lenny was used for 2 years before PVE installation without any problem.

Any help will be very appreciated.

Best regards
Piero Baudino
 
Hi Udo,

thanks for your reply.

About the disks I'm using an external SAS DAS, and the array and disks are ok.
About the memory I'm not 100% sure.

No "dmesg" stranges kernel messages and the syslogs are clean...

Piero
 
Hi Udo,

thanks for your reply.

About the disks I'm using an external SAS DAS, and the array and disks are ok.
About the memory I'm not 100% sure.

No "dmesg" stranges kernel messages and the syslogs are clean...

Piero
Hi,
find and libraries ok?

Is the checksum the same?
Code:
md5sum /usr/bin/find
1813093bd4ebb21b37501f8d96833226  /usr/bin/find

ldd /usr/bin/find
        linux-vdso.so.1 =>  (0x00007ffff9bc9000)
        librt.so.1 => /lib/librt.so.1 (0x00007f0b0f48e000)
        libm.so.6 => /lib/libm.so.6 (0x00007f0b0f20b000)
        libc.so.6 => /lib/libc.so.6 (0x00007f0b0eeb8000)
        libpthread.so.0 => /lib/libpthread.so.0 (0x00007f0b0ec9c000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f0b0f697000)

Udo
 
Hi Udo,

hostname:~# md5sum /usr/bin/find
1813093bd4ebb21b37501f8d96833226 /usr/bin/find
hostname:~# ldd /usr/bin/find
/usr/bin/ldd: line 117: 29907 Segmentation fault LD_TRACE_LOADED_OBJECTS=1 LD_WARN= LD_BIND_NOW= LD_LIBRARY_VERSION=$verify_out LD_VERBOSE= "$@"

Tnx
Piero
 
Hi Udo,

hostname:~# md5sum /usr/bin/find
1813093bd4ebb21b37501f8d96833226 /usr/bin/find
hostname:~# ldd /usr/bin/find
/usr/bin/ldd: line 117: 29907 Segmentation fault LD_TRACE_LOADED_OBJECTS=1 LD_WARN= LD_BIND_NOW= LD_LIBRARY_VERSION=$verify_out LD_VERBOSE= "$@"

Tnx
Piero
Hi,
looks strange... something wrong with the libraries?! Is it a normal proxmox-installation? Or ontop of lenny? Or something "homebrew-try" with squeeze?
Do you have changed the sources.list?

Udo

Udo
 
Hi Udo,

I have just found was caused seg faults...
File /lib/libm-2.7.so was with the right file size and date/time, but with a different md5sum of the same file in another proxmox server.
Replaced it and now all is working....
The problem is "how" this could be happen....

The PVE installation is 1.7 with the last update before versione 1.8.
The only packages I have installed are:
- mdadm (I have a 3,5 Tb raid5 MD array used for data backup mounted)
- samba (just used sometimes for download data from backup).
This is the output of "df":
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/pve-root 99083868 1563688 92487016 2% /
tmpfs 8202660 0 8202660 0% /lib/init/rw
udev 10240 708 9532 7% /dev
tmpfs 8202660 0 8202660 0% /dev/shm
/dev/mapper/pve-data 601871232 3017620 598853612 1% /var/lib/vz
/dev/mapper/pve2-data
3756925768 39544404 3526540532 2% /var/lib/vz2
/dev/sdg1 516040 42084 447744 9% /boot
/dev/md0 4807165280 3553221284 1009754012 78% /backup

Bye
Piero
 
Hi Udo,

I have just found was caused seg faults...
File /lib/libm-2.7.so was with the right file size and date/time, but with a different md5sum of the same file in another proxmox server.
Replaced it and now all is working....
The problem is "how" this could be happen....
...
Hi,
i guess harddisk-problem (but you wrote, you have an raid?!).

Udo
 
Hi Udo,

yes, I have RAID1 array on /dev/pve-root
Before replaicing /lib/libm-2.7.so with a non corrupt version, I have renamed it to /lib/libm-2.7.so.old
The curious thing is: tomorrow /lib/libm-2.7.so and /lib/libm-2.7.so.old are the same file with the same md5sum :-)

Something LVM related ?

Regards
Piero