The story of a hard system segfault
March 2015.
The problem was easy to reproduce. Using
vi
, vim
, or many other programs would trigger the fault.Let's open the dump and see who's the culprit.
# gdb vim vim.core
[...]
[New process 101653]
Core was generated by `vim'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x000000080142c39a in kill () from /lib/libc.so.7
(gdb) bt
#0 0x000000080142c39a in kill () from /lib/libc.so.7
#1 0x00000000004ceeac in ?? ()
#2 0x00007ffffffff193 in ?? ()
#3 0x00000000004cdb50 in ?? ()
#4 0x0000000000000000 in ?? ()
Huh? A problem in
libc
? The system has been working perfectly for a while, and its system binaries are mounted in read-only, so there's no way the file was changed. How can there be a problem there?I tried rebuilding both the system and all the ports (twice, once with gcc and once with clang, in case it was a compiler problem). Nothing changed.
After days of searching and compiling, I found the culprit:
OpenSSL 1.0.2
. I've uninstalled the port version and rebuilt my ports to use the system one (which was up to date since I recompiled the base system in the story). Everything was working fine again afterwards.But then why would
vi
crash? It's not linked to OpenSSL
in any way.It was hard to pin down that problem, but once you know the answer, it's absolutely obvious.
I was using LDAP to sync my users across my systems.
OpenSSL
would be used by OpenLDAP
by NSS_LDAP
to query information about the owner of files. That's how to get to have OpenSSL
and vi
in the same concept.I think there's a lesson here.
Edit: (2015-09-01): this bug is related: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=198788.