conflict in symbols exposed by tcmalloc and glibc - glibc

I was recently debugging a crash in a product and identified the cause to be a conflict in the memory allocation symbols exposed by glibc and tcmalloc. I wrote the following sample code for exposing this issue:
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <assert.h>
#include <stdlib.h>
int main()
{
struct addrinfo hints = {0}, *res = NULL;
hints.ai_family = AF_UNSPEC;
hints.ai_socktype = SOCK_STREAM;
int rc = getaddrinfo("myserver", NULL, &hints, &res);
assert(rc == 0);
return 0;
}
I compiled it using the following command:
g++ temp.cpp -g -lresolv
I executed the program using the following command:
LD_PRELOAD=/path/to/libtcmalloc_minimal.so.4 ./a.out
The program crashes with the following stack:
#0 0x00007ffff6c7c875 in *__GI_raise (sig=) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007ffff6c7de51 in *__GI_abort () at abort.c:92
#2 0x00007ffff6cbd8bf in __libc_message (do_abort=2, fmt=0x7ffff6d8c460 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:186
#3 0x00007ffff6cc30c8 in malloc_printerr (action=2, str=0x7ffff6d88fec "free(): invalid pointer", ptr=) at malloc.c:6282
#4 0x00007ffff6cc810c in *__GI___libc_free (mem=) at malloc.c:3733
#5 0x00007ffff6839e89 in _nss_dns_gethostbyname4_r (name=0x400814 "myserver", pat=0x7fffffffdfa8, buffer=0x7fffffffd9b0 "myserver.mydomain.com", buflen=1024, errnop=0x7fffffffdfbc, herrnop=0x7fffffffdf98, ttlp=0x0) at nss_dns/dns-host.c:341
#6 0x00007ffff6d11917 in gaih_inet (name=0x400814 "myserver", service=0x7fffffffdf88, req=0x7fffffffe1d0, pai=0x7fffffffe160, naddrs=0x7fffffffe168) at ../sysdeps/posix/getaddrinfo.c:880
#7 0x00007ffff6d14301 in *__GI_getaddrinfo (name=0x400814 "myserver", service=0x0, hints=0x7fffffffe1d0, pai=0x7fffffffe200) at ../sysdeps/posix/getaddrinfo.c:2452
#8 0x00000000004006f0 in main () at temp.cpp:12
The reason for this is that the free() function called by _nss_dns_gethostbyname4_r() from libnss_dns.so is from libc.so while the corresponding malloc() was called from libresolv.so from libtcmalloc_minimal.so. The addresses of tcmalloc's malloc() and free() functions are getting into the GOT of libresolv.so leading to this crash. The crash goes away if I don't link my program to libresolv.so.
Now for my question. Is there any documentation which explains how to safely use tcmalloc to avoid crashes like this ?

glibc has some documentation for interposing malloc:
Replacing malloc
Something else must be going here, though. Typical builds of glibc and glibc will get this right (even in fairly old versions of either package).
My best guess is you are using some SUSE glibc variant, which uses RTLD_DEEPBIND for NSS modules. This results in a known issue with malloc interposition. SUSE suggests setting the RTLD_DEEPBIND=0 environment variable as a workaround.

Related

Why calling dlopen sometimes breaks my application by damaging class variables content?

I am trying to load library with dlopen().
But call to this dlopen() function sometimes (not always) damages my class variables and then app goes to segmentation fault.
Below is not precise code (pseudocode), but explanation what happens:
class MyClass {
public:
int MyVar;
void Print() { printf("Simply breakpoint\n"); };
void LoadLibrary() { dlopen("/usr/lib/x86_64-linux-gnu/libavcodec.so.58.54.100",RTLD_LAZY); };
MyClass() {
MyVar = 12345;
printf("MyVar address %p\n",&MyVar);
Print();
LoadLibrary();
};
}
void main()
{
MyClass obj;
}
I do debug it with gdb following way:
>gdb MyApp
>break Print
>run
when it stops at Print function breakpoint I see printed address of variable MyVar.
MyVar address 0x7fff900bc2bc
Also I can check its content.
Then I do
>watch *0x7fff900bc2bc
Hardware watchpoint 2: *0x7fff900bc2bc
>cont
When it continues it breaks on unexpected writing to my variable MyVar:
Thread 1 "MyApp" hit Hardware watchpoint 2: *0x7fff900bc2bc
Old value = 12345
New value = 32767
memmove () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:356
356 ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: No such file or directory.
(gdb) backtrace
#0 memmove () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:356
#1 0x00007ffff7fde759 in _dl_map_object_deps (map=map#entry=0x7fff90145110, preloads=preloads#entry=0x0,
npreloads=npreloads#entry=0, trace_mode=trace_mode#entry=0, open_mode=open_mode#entry=-2147483648)
at dl-deps.c:446
#2 0x00007ffff7fe4db0 in dl_open_worker (a=a#entry=0x7fffa6fd80f0) at dl-open.c:571
#3 0x00007ffff53dd928 in __GI__dl_catch_exception (exception=<optimized out>, operate=<optimized out>,
args=<optimized out>) at dl-error-skeleton.c:208
#4 0x00007ffff7fe460a in _dl_open (file=0x42d8ee0 "/usr/lib/x86_64-linux-gnu/libavcodec.so.58.54.100",
mode=-2147483646, caller_dlopen=<optimized out>, nsid=-2, argc=2, argv=0x7fffffffea88, env=0x54037d0)
at dl-open.c:837
#5 0x00007ffff57bc34c in dlopen_doit (a=a#entry=0x7fffa6fd8310) at dlopen.c:66
#6 0x00007ffff53dd928 in __GI__dl_catch_exception (exception=exception#entry=0x7fffa6fd82b0,
operate=<optimized out>, args=<optimized out>) at dl-error-skeleton.c:208
#7 0x00007ffff53dd9f3 in __GI__dl_catch_error (objname=0x7fff900d8770, errstring=0x7fff900d8778,
mallocedp=0x7fff900d8768, operate=<optimized out>, args=<optimized out>) at dl-error-skeleton.c:227
#8 0x00007ffff57bcb59 in _dlerror_run (operate=operate#entry=0x7ffff57bc2f0 <dlopen_doit>,
args=args#entry=0x7fffa6fd8310) at dlerror.c:170
#9 0x00007ffff57bc3da in __dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:87
#10 0x000000000209ec5b in MyClass::LoadLibrary() ()
.......
From stack backtrace I see that MyVar is damaged by call to dlopen()
But why?
What I am doing wrong?
How to resolve?
Unfortunately I cannot show all source code because it is huge and involves many different components, many threads, many 3rd party libraries.
I cannot simply dynamically link my app with libavcodec because it is already statically linked in 3rd party library but 3rd party library is built without required features unfortunately (without VAAPI support). Dynamic linking makes symbol conflicts.
That is why I was decided try to load libavcodec manually by dlopen() and get all required function pointers from dlsym().
But why? What I am doing wrong? How to resolve?
You didn't say which version of GLIBC you are using (or which distribution).
The code in GLIBC-2.27 dl-deps.c reads:
struct link_map **l_initfini = (struct link_map **)
malloc ((2 * nneeded + 1) * sizeof needed[0]);
if (l_initfini == NULL)
_dl_signal_error (ENOMEM, map->l_name, NULL,
N_("cannot allocate dependency list"));
l_initfini[0] = l;
memcpy (&l_initfini[1], needed, nneeded * sizeof needed[0]);
memcpy (&l_initfini[nneeded + 1], l_initfini,
nneeded * sizeof needed[0]); // line 446
You also didn't say whether MyClass is heap or stack allocated.
One way that the GLIBC code could write over your variable is when you have already corrupted heap earlier. This is especially likely if MyClass is in fact heap-allocated (which it appears to be given the 0x7fff900bc2bc address).
The fact that this "write over" happens only some of the time is also symptomatic of heap corruption.
As the very first step, I would run the program under Valgrind and make sure that no heap corruption (heap buffer overflow, free unallocated, double-free, etc.) is detected before LoadLibrary() runs.

KLEE does not find uninitialized variable error

I am learning KLEE now and I wrote a simple code:
#include "klee/klee.h"
#include <stdio.h>
#include <stdlib.h>
int test(int *p)
{
int *q = (int *) malloc(sizeof(int));
if ((*p) == (*q)) {
printf("reading uninitialized heap memory");
}
return 0;
}
int main()
{
int *p = (int *) malloc(sizeof(int));
test(p);
return 0;
}
First, I generate LLVM bitcode, and then I execute KLEE to the bitcode.
Following is all output:
KLEE: output directory is "/Users/yjy/WorkSpace/Test/klee-out-13"
Using STP solver backend
KLEE: WARNING: undefined reference to function: printf
KLEE: WARNING ONCE: calling external: printf(140351601907424)
reading uninitialized heap memory
KLEE: done: total instructions = 61
KLEE: done: completed paths = 4
KLEE: done: generated tests = 4
I suppose that KLEE should give me an error that the q pointer is not initialized, but it doesn't. Why KLEE does not give me an error or warning about this? KLEE can not detect this error? Thanks in advance!
TLTR: KLEE has not implemented this feature. Clang can check this directly.
KLEE currently support add/sub/mul/div overflow checking. To use this feature, you have to compile the source code with clang -fsanitize=signed-integer-overflow or clang -fsanitize=unsigned-integer-overflow .
The idea is that a function call is inserted into the bytecode (e.g. __ubsan_handle_add_overflow) when you use the clang sanitizer. Then KLEE will handle the overflow checking once it meets the function call.
Clang support
MemorySanitizer, AddressSanitizer UndefinedBehaviorSanitizer. They are defined in projects/compiler-rt/lib directory. MemorySanitizer is the one you are looking for, which is a detector of uninitialized reads.
You can remove the KLEE function call and check with clang directly.
➜ ~ clang -g -fsanitize=memory st.cpp
➜ ~ ./a.out
==16031==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x490954 (/home/hailin/a.out+0x490954)
#1 0x7f21b72f382f (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
#2 0x41a1d8 (/home/hailin/a.out+0x41a1d8)
SUMMARY: MemorySanitizer: use-of-uninitialized-value (/home/hailin/a.out+0x490954)
Exiting

Segmentation fault on using boost syslog and cpp-netlib

After adding boost syslog into source code, segmentation fault appears inside cpp-netlib library.
I was able to prepare minimum working code snippet to reproduce the problem.
#include <boost/network/protocol/http/client.hpp>
#include <boost/log/utility/setup/file.hpp>
#include <boost/log/sinks/syslog_backend.hpp>
#include <iostream>
using namespace boost::network;
using namespace boost::network::http;
namespace sinks = boost::log::sinks;
int main()
{
client::request request_("http://www.boost.org/");
client client_;
client::response response_ = client_.get(request_);
std::string body_ = body(response_);
std::cout << "body: " << body_;
using syslog_sinkT = sinks::synchronous_sink <sinks::syslog_backend>;
boost::shared_ptr <sinks::syslog_backend> backend = boost::make_shared <sinks::syslog_backend> ();
boost::shared_ptr<syslog_sinkT> sink = boost::make_shared <syslog_sinkT> (backend);
}
When last 2 lines are commented, segmentation fault disappears and everything works fine.
gdb stack trace (approximately, may vary):
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xf7c29b40 (LWP 19874)]
0x00000000 in ?? ()
(gdb) where
#0 0x00000000 in ?? ()
#1 0x083c376c in boost::asio::detail::task_io_service_operation::complete (bytes_transferred=260, ec=..., owner=...,
this=0xf6900710)
at /home/kostidov/prj/third_party-master/boost/boost_1_60_0/__public__/v0/Linux-libc6/include/boost/asio/detail/task_io_service_o$
eration.hpp:38
#2 boost::asio::detail::task_io_service::do_run_one (ec=..., this_thread=..., lock=..., this=<optimized out>)
at /home/kostidov/prj/third_party-master/boost/boost_1_60_0/__public__/v0/Linux-libc6/include/boost/asio/detail/impl/task_io_serv$
ce.ipp:372
#3 boost::asio::detail::task_io_service::run (ec=..., this=0x84ea280)
at /home/kostidov/prj/third_party-master/boost/boost_1_60_0/__public__/v0/Linux-libc6/include/boost/asio/detail/impl/task_io_serv$
ce.ipp:149
#4 boost::asio::io_service::run (this=0x84e9a94)
at /home/kostidov/prj/third_party-master/boost/boost_1_60_0/__public__/v0/Linux-libc6/include/boost/asio/impl/io_service.ipp:59
#5 0x083b5766 in boost::_mfi::mf0<unsigned int, boost::asio::io_service>::operator() (p=<optimized out>, this=<optimized out>)
at /home/kostidov/prj/third_party-master/boost/boost_1_60_0/__public__/v0/Linux-libc6/include/boost/bind/mem_fn_template.hpp:49
#6 boost::_bi::list1<boost::_bi::value<boost::asio::io_service*> >::operator()<unsigned int, boost::_mfi::mf0<unsigned int, boost::a$
io::io_service>, boost::_bi::list0> (a=<synthetic pointer>, f=..., this=0x84ea94c)
at /home/kostidov/prj/third_party-master/boost/boost_1_60_0/__public__/v0/Linux-libc6/include/boost/bind/bind.hpp:249
#7 boost::_bi::bind_t<unsigned int, boost::_mfi::mf0<unsigned int, boost::asio::io_service>, boost::_bi::list1<boost::_bi::value<boo$
t::asio::io_service*> > >::operator() (this=0x84ea944)
at /home/kostidov/prj/third_party-master/boost/boost_1_60_0/__public__/v0/Linux-libc6/include/boost/bind/bind.hpp:1222
#8 boost::detail::thread_data<boost::_bi::bind_t<unsigned int, boost::_mfi::mf0<unsigned int, boost::asio::io_service>, boost::_bi::$
ist1<boost::_bi::value<boost::asio::io_service*> > > >::run (this=0x84ea828)
at /home/kostidov/prj/third_party-master/boost/boost_1_60_0/__public__/v0/Linux-libc6/include/boost/thread/detail/thread.hpp:116
#9 0x0840a4f8 in boost::(anonymous namespace)::thread_proxy (param=0x84ea828) at libs/thread/src/pthread/thread.cpp:167
#10 0xf7de1f70 in start_thread () from /lib/i386-linux-gnu/libpthread.so.0
#11 0xf7d18bee in clone () from /lib/i386-linux-gnu/libc.so.6
Problem exists on Ubuntu 14.04 with cpp-netlib 0.11.2 and both boost versions 1_58_0 and 1_60_0. Boost, cpp-netlib and my application are compiled with -std=c++11.
Note 1. Segmentation fault appears inside cpp-netlib before reaching syslog_backend creation. Only presence of last 2 lines guarantees SIGSEGV reproduction.
Note 2. Reproduces only with syslog_backend. Any other logging targets (file, consol) work fine.
The best idea I have is the problem may lay inside boost during static variables initialisation, but I have no proves regarding this version.
Any suggestions?
Seems like I used too many compile options for building both boost and cpp-netlib.
I prepared new build for both boost and cpp-netlib once again, but this time I used as less additional options as possible.
And it works fine.
EDIT: I found the key which causes the error. It's BOOST_ASIO_ENABLE_HANDLER_TRACKING It was defined during the boost compilation, but wasn't defined during compilation of cpp-netlib and my application.
https://svn.boost.org/trac/boost/ticket/11945

Problems doing syscall hooking

I use the following module code to hooks syscall, (code credited to someone else, e.g., Linux Kernel: System call hooking example).
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/unistd.h>
#include <asm/semaphore.h>
#include <asm/cacheflush.h>
void **sys_call_table;
asmlinkage int (*original_call) (const char*, int, int);
asmlinkage int our_sys_open(const char* file, int flags, int mode)
{
printk(KERN_ALERT "A file was opened\n");
return original_call(file, flags, mode);
}
int set_page_rw(long unsigned int _addr)
{
struct page *pg;
pgprot_t prot;
pg = virt_to_page(_addr);
prot.pgprot = VM_READ | VM_WRITE;
return change_page_attr(pg, 1, prot);
}
int init_module()
{
// sys_call_table address in System.map
sys_call_table = (void*)0xffffffff804a1ba0;
original_call = sys_call_table[1024];
set_page_rw(sys_call_table);
sys_call_table[1024] = our_sys_open;
return 0;
}
void cleanup_module()
{
// Restore the original call
sys_call_table[1024] = original_call;
}
When insmod the compiled .ko file, terminal throws "Killed". When looking into 'cat /proc/modules' file, I get the Loading status.
my_module 10512 1 - Loading 0xffffffff882e7000 (P)
As expected, I can not rmmod this module, as it complains its in use. The system is rebooted to get a clean-slate status.
Later on, after commenting two code lines in the above source sys_call_table[1024] = our_sys_open; and sys_call_table[1024] = original_call;, it can insmod successfully. More interestingly, when uncommenting these two lines (change back to the original code), the compiled module can be insmod successfully. I dont quite understand why this happens? And is there any way to successfully compile the code and insmod it directly?
I did all this on Redhat with linux kernel 2.6.24.6.
I think you should take a look to the kprobes API, which is well documented in Documentation/krpobes.txt. It gives you the ability to install handler on every address (e.g. syscall entry) so that you can do what you want. Added bonus is that your code would be more portable.
If you're only interested in tracing those syscalls you can use the audit subsystem, coding your own userland daemon which will be able to receive events on a NETLINK socket from the audit kthread. libaudit provides a simple API to register/read events.
If you do have a good reason with not using kprobes/audit, I would suggest that you check that the value you are trying to write to is not above the page that you set writable. A quick calculation shows that:
offset_in_sys_call_table * sizeof(*sys_call_table) = 1024 * 8 = 8192
which is two pages after the one you set writable if you are using 4K pages.

Solaris pstack output: what does "SYS#0" mean?

I encountered "SYS#0" at the top of a stack and cannot find any documentation as to what that means.
Compiler: g++
OS: Solaris 9
Arch: SPARC
Memory Manager libhoard_32.so from Hoard 3.5.1
We used "gcore" to generate a core file. Looking at the output of running the "pstack" command against the core file, the only thread that was doing anything interesting had the following at the very top of its call stack:
ff309858 SYS#0 ()
ff309848 void MyHashMap<const void*,unsigned,AlignedMmapInstance<65536U>::SourceHeap>::set(const void*,unsigned) (ff31eed4, 9bf20000, 10000, 40, 9bf1fff0, ff31e738) + 134
...
pflags for that LWP shows:
/8: flags = PR_STOPPED|PR_ISTOP|PR_ASLEEP
why = PR_REQUESTED
sigmask = 0xfffffeff,0x00003fff
I could not find any mention of this syntax in the Sun documentation.
Edit: The process appears to have hung sometime prior to doing the gcore. Is "SYS#0" somehow interrelated with process hangs?
Edit: Added next stack frame and link to Hoard, pflags output
Edit: The accepted answer is correct. In addition, at least on SPARC, the g1 register should contain the system call number, but this did not appear to be the case in our core file.
The topic "what is an indirect system call?" is probably good material for another question.
Try this:
$ cat foo.c
#include <stdio.h>
int main(int argc, char *argv[]) {
char buf[1024];
proc_sysname(0, buf, 1024);
printf("%s\n", buf);
}
$ gcc -ofoo -lproc foo.c
$ ./foo
SYS#0
$
SYS#0 is therefore the string that represents system call zero. If you look in <sys/syscall.h> (the system call table) you will find the following:
/* syscall enumeration MUST begin with 1 */
/*
* SunOS/SPARC uses 0 for the indirect system call SYS_syscall
* but this doesn't count because it is just another way
* to specify the real system call number.
*/
#define SYS_syscall 0
The indirect system call syscall(SYS_syscall, foo, bar, ...) is equivalent to the direct call syscall(foo, bar, ...).

Resources