Comprehensive list of GLIBC functions that can execute a file (execv, execve, fexecve, posix_spawn,..) - fork

I am writing an LD_PRELOAD utility that wraps all calls to exec() type functions.
But holy cow, there are a lot of them. So far I have found:
execv, execvp, execve, execvpe
fexecve, execveat,
execl, execlp, execle, execlpe,
posix_spawn, posix_spawnp
Is there a comprehensive list somewhere of all the lib functions that execute another program (and aren't just wrappers to one of these functions)?
As an example, I just discovered that whatever the perl library IPC::Open3 uses is not on the list above, so I don't see the exec that happens. (strace sees it, but it claims that everything on the list above is just 'execve' which is not actually true.)

At least on my (Debian GNU/Linux) system, the execve system call comes from Perl calling execvp:
#0 0x00007ffff7d35787 in execve () at ../sysdeps/unix/syscall-template.S:120
#1 0x00007ffff7d3603b in __execvpe_common (file=0x5555559ee710 "some", argv=0x5555558e31f0, envp=0x5555558c6980, exec_script=<optimized out>) at execvpe.c:136
#2 0x00005555556bcf50 in Perl_do_aexec5 ()
#3 0x00005555556b14ef in Perl_pp_exec ()
#4 0x0000555555652cf6 in Perl_runops_standard ()
#5 0x00005555555c6a6c in perl_run ()
#6 0x000055555559c472 in main ()
(gdb) fr 2
#2 0x00005555556bcf50 in Perl_do_aexec5 ()
(gdb) x/i $pc-5
0x5555556bcf4b <Perl_do_aexec5+427>: callq 0x55555559b2c0 <execvp#plt>
Using ltrace also confirms that:
ltrace -f --library=libc.so.6 perl foo.pl |& grep execv
[pid 1789012] perl->execvp(0x55cd3c4a4bc0, 0x55cd3c3a28e0, 0x7fff82e61770, 0x7fd4f3317212) = 0xffffffff
...

Related

File doesn't open in Fortran [duplicate]

I get the following error when I execute a fortran code compiled with gfortran. The error is followed by a backtrace for this error pointing to memory locations.
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x2b2f8e39da2d in ???
#1 0x2b2f8e39cc63 in ???
#2 0x311823256f in ???
#3 0x311827a7be in ???
#4 0x2b2f8e39cff2 in ???
#5 0x2b2f8e4adde6 in ???
#6 0x2b2f8e4ae047 in ???
#7 0x2b2f8e4a62d7 in ???
#8 0x40482a in instrument_
at /home/user/model/instrument.f:90
#9 0x406c1e in funcdet
at /home/user/model/funcsynth.f:67
#10 0x406c98 in main
at /home/user/model/funcsynth.f:78
Segmentation fault (core dumped)
I would like to know where the first instance of error arises - is it the first line of the backtrace or the lastline? Also, strategies that might help me debug the issue.
Update:
Upon backtracing, the line 90 of instrument involves opening a file like so:
out_file3 = 'new_file'
OPEN(unit=3,file=out_file3,status='unknown')
To determine the issue behind I've incorporated error checking by doing so:
OPEN(unit=3,file=out_file3,status='unknown',iostat=io_status, err=100)
100 write(STDOUT,*) 'io status=', io_status
The code exits with the error:
Error: Invalid value for ERR specification at (1). How do I determine the appropriate value for ERR specification? This led me to suspect that unit=3 might be the cause for error, I've changed the value for unit, but everytime get the "Segmentation fault (core dumped)" error.
Update 2:
OPEN(unit=3,file=out_file3,status='unknown',err=17)
17 write(STDOUT,*) 'Problem'
Still get the Segmentation fault (core dumped) error at the line corresponding to OPEN.... I can only guess that unit=3 is the root cause of the problem.
Update 3
Attempt at a self sufficient example:
character*280 testfile,finalfile,outfile
DIR = '/storage/work/user/'
testfile = 'test.dat'
CALL getenv(DIR,outfile)
CALL sappend(outfile,testfile,finalfile)
OPEN(unit=3,file=finalfine,status='new')
write(3,*) 'Test'
END

XIO: fatal IO error 11 caused by 32-bit libxcb

Yes, this question has been asked before, but reading the answers didn't enlighten me much.
I wrote a C program that crashes after a few days of use. An important point is that it does NOT generate a core file, even though everything is set up so that it should (core_pattern, ulimit -c unlimited, etc. I can trigger a core dump fine with kill -SIGQUIT).
The programs extensively logs what it does, but there's no hint about the crash in the log.
The only message displayed at the crash (or before?) is:
XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
after 2322 requests (2322 known processed) with 0 events remaining.
So two questions:
- how is it possible for a program to crash (return $?=1) without core dump.
- what is this error message about and what can I do ?
System is RedHat Enterprise 6.4
Edit:
I managed to force a core dump by calling abort() from inside an atexit() callback:
(gdb) bt
#0 0x00bc8424 in __kernel_vsyscall ()
#1 0x0085a861 in raise () from /lib/libc.so.6
#2 0x0085c13a in abort () from /lib/libc.so.6
#3 0x0808f5cf in Unexpected () at MyCode.c:1378
#4 0x0085de9f in exit () from /lib/libc.so.6
#5 0x00c85701 in _XDefaultIOError () from /usr/lib/libX11.so.6
#6 0x00c85797 in _XIOError () from /usr/lib/libX11.so.6
#7 0x00c84055 in _XReply () from /usr/lib/libX11.so.6
#8 0x00c68b8f in XGetImage () from /usr/lib/libX11.so.6
#9 0x004fd6a7 in ?? () from /usr/local/lib/libcvi.so
#10 0x00478ad5 in ?? () from /usr/local/lib/libcvi.so
...
#29 0x001eed9d in ?? () from /usr/local/lib/libcvi.so
#30 0x001eee41 in RunUserInterface () from /usr/local/lib/libcvi.so
#31 0x0808fab4 in main (argc=2, argv=0xbfbdc984) at MyCode.c:1540
Anyone can enlighten me as to this X11 problem ? libcvi.so is not mine, only MyCode.c (LabWindows/CVI).
Edit 2014-12-05:
Here's an even more precise backtrace. Things definitely happen in X11, but I'm no X11 programmer, so looking at the source code for X from the provided linestell me only that the X server (?) is temporarily unavailable. Is there any way to simply tell it to ignore this error if it's only temporary ?
#4 0x00965eaf in __run_exit_handlers (status=1) at exit.c:78
#5 exit (status=1) at exit.c:100
#6 0x00c356b1 in _XDefaultIOError (dpy=0x88aeb80) at XlibInt.c:1292
#7 0x00c35747 in _XIOError (dpy=0x88aeb80) at XlibInt.c:1498
#8 0x00c340a6 in _XReply (dpy=0x88aeb80, rep=0xbf82fa90, extra=0, discard=0) at xcb_io.c:708
#9 0x00c18c0f in XGetImage (dpy=0x88aeb80, d=27263845, x=0, y=0, width=60, height=20, plane_mask=4294967295, format=2) at GetImage.c:75
#10 0x005f46a7 in ?? () from /usr/local/lib/libcvi.so
Corresponding lines:
XlibInt.c: _XDefaultIOError()
1292: exit(1);
XlibInt.c: _XIOError
1498: _XDefaultIOError(dpy);
xcb_io.c: _XReply()
708: if(!reply) _XIOError(dpy);
GetImage.c: XGetImage()
74: if (_XReply (dpy, (xReply *) &rep, 0, xFalse) == 0 || ...
OK, I finally found the cause (thanks to someone at National Instruments), a better diagnostic and a workaround.
The bug is in many versions of libxcb and is a 32-bit counter rollover problem that has been known for a few years: https://bugs.freedesktop.org/show_bug.cgi?id=71338
Not all versions of libxcb are affected libxcb-1.9-5 has it, libxcb-1.5-1 doesn't. From the bug list, 64-bits OS shouldn't be affected, but I managed to trigger it on at least one version.
Which brings me to a better diagnostic. The following program will crash in less than 15 minutes on affected libraries (better than the entire week it previously took):
// Compile with: gcc test.c -lX11 && time ./a.out
#include <X11/Xlib.h>
void main(void) {
Display *d = XOpenDisplay(NULL);
if (d)
for(;;)
XNoOp(d);
}
And one final thing, the above prog compiled and ran on a 64-bit system works fine, compiled and ran on an old 32-bit system also works fine, but if I transfer the 32-bit version to the 64-bit system, it crashes after a few minutes.
I just had a program that acted exactly like this, with exactly the same error message. I would expect the counter error to process 2^32 events before crashing.
The program was structured so that a worker thread has a separate X connection to the X thread so that it can send messages to the X thread to update the window.
In the end I traced the problem down to a place where the function sending the events to the window to redraw it was called by multiple threads, without a mutex on it, and since X to the same X connection is not re-entrant, crashed with this exact error. Put in a mutex on the function and no problems since.

grep -f on OS X produces segfault

If you've got a Mac, try this:
echo 'abcd*' > grepfile
echo 'abc$' >> grepfile
echo '^abc' >> grepfile
echo "fojeiwuroiuwet\nljfajsljkfabcdddjlfkajlkj\nabcaaa\nzzzabc\n" | grep -f grepfile
Here's the version:
$ grep --v
grep (BSD grep) 2.5.1-FreeBSD
This is a machine (a rMBP of the 2012 flavor) that's kept up with Apple's software updates, so I am on 10.8.4.
I verified that GNU grep compiled from source does not suffer from this problem. Indeed it is version 2.14, which is a whole lot of versions past 2.5.1.
But how might one achieve the task of testing some input against a series of regexes otherwise, without some vastly inefficient loop that forks a grep for each regex?
Edit: The approach I took to work around this was something akin to: while read REGEX; do [[ ... =~ $REGEX ]] ... done < regexfile.
Question: Is this a known bug with this version of grep? How can we set up our systems so they will work properly with a file of regexes to grep?
Update: Looks like some folks are reporting it works fine (even with this particular FreeBSD 2.5.1 grep). What are some steps I can take to try to figure out which .so/.dylib's it might be using? Can someone do a shasum /usr/bin/grep and tell me if it works for you? (I'm not sure if that would provide much information, but what I'm after is whether my computer's configuration is screwed up, or if this is some actual existing issue with this version of the software.)
$ shasum /usr/bin/grep
eac59389d09642decbb8551e2c975f795934bfbf /usr/bin/grep
Here is more info:
$ codesign -dvvv /usr/bin/grep
Executable=/usr/bin/grep
Identifier=com.apple.zgrep
Format=Mach-O thin (x86_64)
CodeDirectory v=20100 size=224 flags=0x0(none) hashes=6+2 location=embedded
Hash type=sha1 size=20
CDHash=93b823c000188f8737653d8333c90a6db9361d70
Signature size=4064
Authority=Software Signing
Authority=Apple Code Signing Certification Authority
Authority=Apple Root CA
Info.plist=not bound
Sealed Resources=none
Internal requirements count=2 size=208
Further investigation:
$ gdb /usr/bin/grep
GNU gdb 6.3.50-20050815 (Apple version gdb-1824) (Thu Nov 15 10:42:43 UTC 2012)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "x86_64-apple-darwin"...Reading symbols for shared libraries .... done
(gdb) start -f grepfile
Function "main" not defined.
Make breakpoint pending on future shared library load? (y or [n])
Starting program: /usr/bin/grep -f grepfile
Reading symbols for shared libraries +++.............................. done
abc
abc
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000101000000
0x00007fff89b5d1b3 in memchr ()
(gdb) where
#0 0x00007fff89b5d1b3 in memchr ()
#1 0x00007fff89b8e45a in __sfvwrite ()
#2 0x00007fff89b8e861 in fwrite ()
#3 0x0000000100003138 in _mh_execute_header ()
#4 0x0000000100002988 in _mh_execute_header ()
#5 0x0000000100001c28 in _mh_execute_header ()
#6 0x00007fff8e2d57e1 in start ()
(gdb)
I have rebooted the machine as well. It repeatably does the same thing in gdb.
I've got OSX 10.8.4 on MacBook Air and your example doesn't crash by default, but only when adding --color parameter.
Explanation
This crash usually happens when you're mixing wilcard (asterisk sign) with the terminal colours and this is the software bug.
Also check another simpler example:
echo 'abc*' | grep --color=auto -e ".*" -e a
Here it seems that --color=auto makes the difference (without it or setting to never, then it doesn't crash).
So I assume that your grep is using colors in terminal by default.
Solution
Make sure that your grep is not an alias to the grep with colors enabled, or the colors are not enabled by default.
You can always try to run grep with --color=never.
For permanent solution, I've reported the bug report:
http://www.freebsd.org/cgi/query-pr.cgi?pr=181263
so the issue can be fixed in the software it-self.
If you'd like to access more detailed log of your crash, go to Console, Show Log List and find the crash log under User Diagnostic Reports.
E.g.:
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libsystem_c.dylib 0x00007fff8a8da1b3 memchr + 15
1 libsystem_c.dylib 0x00007fff8a90b45a __sfvwrite + 578
2 libsystem_c.dylib 0x00007fff8a90b861 fwrite + 114
3 grep 0x000000010a4a3138 0x10a4a0000 + 12600
4 grep 0x000000010a4a2988 0x10a4a0000 + 10632
5 grep 0x000000010a4a1c28 0x10a4a0000 + 7208
6 libdyld.dylib 0x00007fff8daf77e1 start + 1
Here is some more detailed explanation about the crash:
A quick test
$ env -i bsdgrep -Fi without_nls usr.bin/grep/grep.c
$ env -i gnugrep -Fi without_nls usr.bin/grep/grep.c
#ifndef WITHOUT_NLS
#ifndef WITHOUT_NLS
#ifndef WITHOUT_NLS
shows that bsd fgrep already fails to ignore case. And if you throw
a few more options to the mix it'd crash, e.g.
$ env -i LC_CTYPE=en_US.UTF-8 TERM=xterm bsdgrep --color -Fir without_nls usr.bin/grep/
[...]
Program received signal SIGSEGV, Segmentation fault.
0x0000000801007ff2 in memchr (s=0x61167a, c=10, n=18446744073707490297) at /usr/src/lib/libc/string/memchr.c:48
48 if (*p++ == (unsigned char)c)
(gdb) bt
#0 0x0000000801007ff2 in memchr (s=0x61167a, c=10, n=18446744073707490297) at /usr/src/lib/libc/string/memchr.c:48
#1 0x0000000801007b03 in __sfvwrite (fp=0x801247770, uio=0x7fffffffd8f0) at /usr/src/lib/libc/stdio/fvwrite.c:170
#2 0x0000000801007698 in fwrite (buf=0x608c03, size=18446744073709551606, count=1, fp=0x801247770)
at /usr/src/lib/libc/stdio/fwrite.c:95
#3 0x0000000000405498 in printline (line=0x7fffffffdb70, sep=58, matches=0x7fffffffd990, m=9)
at /usr/src/usr.bin/grep/util.c:500
#4 0x0000000000404f51 in procline (l=0x7fffffffdb70, nottext=0) at /usr/src/usr.bin/grep/util.c:381
#5 0x000000000040489f in procfile (fn=0x80140b600 "usr.bin/grep/nls/es_ES.ISO8859-1.msg") at /usr/src/usr.bin/grep/util.c:239
#6 0x00000000004044d7 in grep_tree (argv=0x7fffffffdd30) at /usr/src/usr.bin/grep/util.c:163
#7 0x0000000000403ea9 in main (argc=5, argv=0x7fffffffdd10) at /usr/src/usr.bin/grep/grep.c:689
Source: http://lists.freebsd.org/pipermail/freebsd-current/2011-August/026502.html
Also it seems that there are different grep binaries on different OSX even with the same version:
$ grep --v
grep (BSD grep) 2.5.1-FreeBSD
$ shasum /usr/bin/grep
350ee11e1868e18c9707ea7035184a114f40badf /usr/bin/grep
$ codesign -dvvv /usr/bin/grep
Executable=/usr/bin/grep
Identifier=com.apple.zgrep
Format=Mach-O thin (x86_64)
CodeDirectory v=20100 size=224 flags=0x0(none) hashes=6+2 location=embedded
Hash type=sha1 size=20
CDHash=1537b3ed49878d5d18482859a37318164a2a40f1
Signature size=4064
Authority=Software Signing
Authority=Apple Code Signing Certification Authority
Authority=Apple Root CA
Info.plist=not bound
Sealed Resources=none
Internal requirements count=2 size=176

Get instruction pointer of running application on Unix

Is there a way to get the instruction pointer of a running application Unix?
I have a running process (C++) and want to get its current location, and thereafter in GDB (on a different machine) map the location to source location ('list' command).
On Linux, there is /proc/[pid]/stat.
From "man proc":
stat Status information about the process. This is used by
ps(1). It is defined in /usr/src/linux/fs/proc/array.c.
...
kstkeip %lu
The current EIP (instruction pointer).
AFAICT, the 29th field of the output corresponds to the current instruction pointer of the process. For example:
gdb date
GNU gdb Red Hat Linux (6.0post-0.20040223.20rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...(no debugging symbols found)...Using host libthread_db library "/lib64/tls/libthread_db.so.1".
(gdb) set stop-on-solib-events 1
(gdb) run
(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...[Thread debugging using libthread_db enabled]
[New Thread 182896391360 (LWP 27968)]
(no debugging symbols found)...Stopped due to shared library event
(gdb) c
[Switching to Thread 182896391360 (LWP 27968)]
Stopped due to shared library event
(gdb) where
#0 0x00000036b060bb20 in _dl_debug_state_internal () from /lib64/ld-linux-x86-64.so.2
#1 0x00000036b060b51c in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
#2 0x00000036b0600f72 in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#3 0x0000000000000001 in ?? ()
#4 0x0000007fbff62728 in ?? ()
#5 0x0000000000000000 in ?? ()
(gdb) shell cat /proc/27968/stat
27968 (date) T 27839 27968 8955 34817 27839 4194304 42 0 330 0 0 0 0 0 18 0 0 0 1881668573 6144000 78 18446744073709551615 4194304 4234416 548680739552 18446744073709551615 234887363360 0 0 0 0 18446744071563322838 0 0 17 0 0 0 0 0 0 0
(gdb) p/a 234887363360 <--- the value of 29th field
$1 = 0x36b060bb20 <_dl_debug_state_internal>
The instruction pointer can be retrieved on Linux with the following code:
pid_t traced_process;
struct user_regs_struct regs;
ptrace(PTRACE_ATTACH, traced_process, NULL, NULL);
ptrace(PTRACE_GETREGS, traced_process, NULL, &regs);
printf("EIP: %lx\n", regs.eip);
You will need to temporarily stop the process or thread in order to get its current instruction pointer. You can do it by attaching to the process with ptrace() or (on HP-UX) ttrace() and accessing the registers.
If you're using gdb anyway, you can simply attach yourself to a running process like this:
gdb program 1234
where program is the name of the executable you're debugging, and 1234 is the PID. You can then use all of the facilities of gdb to debug the process.

Ruby/Glibc coredump (double free or corruption)

I am using a distributed continuous integration tool which I have written by myself in Ruby. It uses a fork of Mike Perham's "politics" for distribution of the tasks. The "politics" module is using threads for the mDNS part.
Every now and then I encounter a core dump which I don't understand:
*** glibc detected *** ruby: double free or corruption (fasttop): 0x086d8600 ***
======= Backtrace: =========
/lib/libc.so.6[0xb7cef494]
/lib/libc.so.6[0xb7cf0b93]
/lib/libc.so.6(cfree+0x6d)[0xb7cf3c7d]
/usr/lib/libruby18.so.1.8[0xb7e8adf8]
/usr/lib/libruby18.so.1.8(ruby_xmalloc+0x85)[0xb7e8b395]
/usr/lib/libruby18.so.1.8[0xb7e5065e]
...
/usr/lib/libruby18.so.1.8[0xb7e717f4]
/usr/lib/libruby18.so.1.8[0xb7e74296]
/usr/lib/libruby18.so.1.8(rb_yield+0x27)[0xb7e7fb57]
======= Memory map: ========
...
I am running on Gentoo and have rebuild Ruby and Glibc with "-gdbg" and turned off the striping to get a meaningful core:
...
Core was generated by `ruby /home/develop/dcc/bin/dcc-worker'.
Program terminated with signal 6, Aborted.
#0 0xb7f20410 in __kernel_vsyscall ()
(gdb) bt
#0 0xb7f20410 in __kernel_vsyscall ()
#1 0xb7cacb60 in *__GI___open_catalog (cat_name=0x6 <Address 0x6 out of bounds>, nlspath=0xbf9d6f00 " ", env_var=0x0, catalog=0x1) at open_catalog.c:237
#2 0xb7cae498 in __sigdelset (set=0x6) from /lib/libc.so.6
#3 *__GI_sigfillset (set=0x6) at ../signal/sigfillset.c:42
#4 0xb7ce952d in freopen64 (filename=0x2 <Address 0x2 out of bounds>, mode=0xb7db02c8 "\" total=\"%zu\" count=\"%zu\"/>\n", fp=0x9) at freopen64.c:47
#5 0xb7cef494 in _IO_str_init_readonly (sf=0x86d8600, ptr=0xb7eef5a9 "te\213V\b\205\322\017\204\220", size=-1210273804) at strops.c:88
#6 0xb7cf0b93 in mALLINFo (av=0xb) at malloc.c:5865
#7 0xb7cf3c7d in __libc_calloc (n=141395456, elem_size=3214793136) at malloc.c:4019
#8 0xb7e8adf8 in ?? () at gc.c:1390 from /usr/lib/libruby18.so.1.8
#9 0x086d8600 in ?? ()
#10 0xb7e89400 in rb_gc_disable () at gc.c:256
#11 0xb7e8b395 in add_freelist () at gc.c:1087
#12 gc_sweep () at gc.c:1186
#13 garbage_collect () at gc.c:1524
#14 0xb7e5065e in ?? () from /usr/lib/libruby18.so.1.8
#15 0x00000340 in ?? ()
#16 0x00000000 in ?? ()
(gdb)
Hmm??? For me this looks like it's totally Ruby intern. On other "double free or corruption" problems here at stackoverflow I have seen that maybe threads are part of the problem.
Also the problem does not occur at the exactly same position. I have another backtrace which is much longer but the crash is also in garbage_collect but with a slightly different path:
(gdb) bt
#0 0xffffe430 in __kernel_vsyscall ()
#1 0xf7c8b8c0 in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2 0xf7c8d1f5 in *__GI_abort () at abort.c:88
#3 0xf7cc7e35 in __libc_message (do_abort=2, fmt=0xf7d8daa8 "*** glibc detected *** %s: %s: 0x%s ***\n") at ../sysdeps/unix/sysv/linux/libc_fatal.c:170
#4 0xf7ccdd24 in malloc_printerr (action=2, str=0xf7d8dbec "double free or corruption (fasttop)", ptr=0x911f5d0) at malloc.c:6197
#5 0xf7ccf403 in _int_free (av=0xf7daa380, p=0x911f5c8) at malloc.c:4750
#6 0xf7cd24ad in *__GI___libc_free (mem=0x911f5d0) at malloc.c:3716
#7 0xf7e68768 in obj_free () at gc.c:1366
#8 gc_sweep () at gc.c:1174
#9 garbage_collect () at gc.c:1524
#10 0xf7e68be5 in rb_newobj () at gc.c:436
#11 0xf7eb9840 in str_alloc (klass=0) at string.c:67
... (150 lines of rb_eval/call/yield etc.)
Has anyone a suggestion how to isolate and maybe solve this problem?
Quick, easy, and not as helpful: export MALLOC_CHECK_=2. This causes glibc to do some extra level of checking during free(), to avoid heap corruption. It will abort() and give a core dump as soon as it detects corruption, instead of waiting until there's an actual problem caused by the corruption.
Not quite as quick and easy, but much more helpful (if you get it working): valgrind.
Valgrind makes it easy to find heap corruption issues. There are some spurious errors reported when using Ruby 1.8 under valgrind, but they can be eliminated using this ruby patch (and configuring with --enable-valgrind) or using a valgrind suppression file. To run your ruby program under valgrind, just prefix the command with valgrind:
valgrind ruby /home/develop/dcc/bin/dcc-worker
If the crashing process is a child of the process you are running, use valgrind --trace-children=yes. Look in particular for invalid writes, which are a sign of heap corruption.
I got this very same error in a simple 'C' program called rd_test; it would just read a given number of bytes using read(2) from a given input file (could be a device file).
The actual bug turned out to be a buffer overflow of 1 byte (as i did
...
buf[n]='\0';
...
where 'n' is the number of bytes read into the buffer 'buf').
Silly me.
BUT, the thing is I never caught that until I ran it with valgrind!
So IMHO valgrind is definitely worth running on cases like this.
The 'double free or corruption' error went away as soon as i got rid of the offending bug.
I got the same error message , not in ruby but in a zenity-program .
I discovered it had something todo with me closing two times an open pipe !
Check if You dont free two-or more times the same heap-memory , closing again already closed files or pipes .
Goodluck

Resources