I would like to trace all function calls for a given library in a process, but the process is going to exit and re-open regularly, and I want to keep tracing.
I am doing this now:
oneshot$target:LIBRARY::entry
{
printf("%s\n", probefunc);
}
However, this only lets me provide one pid at a time. Can I keep this going?
I want something like:
*:LIBRARY::entry
/execname == "foo"/
but that * doesn't work there.
Thanks!
I don't think you can do this with just a single dtrace script. You'd need two (at least...). And you need to have the ability to run the destructive system() action, which most likely means root access.
Assume you want to run this script on any new ls process:
#!/usr/sbin/dtrace -s
pid$1:libc::entry
{
printf( "func: %s\n", probefunc );
}
Assuming the path to that script is /root/dtrace/tracelibc.d, the following script will start dtrace on any new ls process that gets started. Note that you need #pragma D option destructive to be able to start dtrace on the new process:
#!/usr/sbin/dtrace -s
#pragma D option destructive
#pragma D option quiet
proc:::exec-success
/ "ls" == basename( execname ) /
{
printf( "tracing process %d\n", pid );
system( "/root/dtrace/tracelibc.d %d", pid );
}
That should work, but in this case ls is such a short-lived process that something like this happens quite often:
dtrace: failed to compile script /root/dtrace/tracelibc.d: line 10:
failed to grab process 12289
The process is gone by the time dtrace gets going. If you're tracing long-lived processes and don't care that you might miss the first few probes because dtrace takes a while to attach, you're done.
But, if you want to trace short-lived processes, you need to stop the process right when it starts, then restart it after dtrace attaches:
#!/usr/sbin/dtrace -s
#pragma D option destructive
#pragma D option quiet
proc:::exec-success
/ "ls" == basename( execname ) /
{
printf( "stopping process %d\n", pid );
system( "/root/dtrace/tracelibc.d %d", pid );
stop();
}
and start it back up in tracelibc.d:
#!/usr/sbin/dtrace -s
#pragma D option destructive
BEGIN
{
system( "prun %d", $1 );
}
pid$1:libc::entry
{
printf( "func: %s\n", probefunc );
}
Note that I'm using Solaris prun to restart the stopped process. You'd have to look at the Mac dtrace documentation for the stop() call to get the Mac equivalent of Solaris prun.
But... ooops. The two scripts above combine to produce:
stopping process 12274
dtrace: failed to compile script /root/dtrace/tracelibc.d: line 10:
probe description pid12274:libc::entry does not match any probes
Why does this say pid12274:libc::entry doesn't match any probes? Oh, yeah - when exec returns, the libc.so shared object hasn't been loaded into memory yet. We need a probe that's guaranteed to exist in the target process, and that gets called after libc.so is loaded, but before any processing gets done. main should suffice. So the main script to start it all becomes:
#!/usr/sbin/dtrace -s
#pragma D option destructive
#pragma D option quiet
proc:::exec-success
/ "ls" == basename( execname ) /
{
printf( "stopping process %d\n", pid );
system( "/root/dtrace/tracemain.d %d", pid );
stop();
}
That starts the tracemain.d script, that restarts the process, loads the tracelibc.d script, and stops the process again:
#!/usr/sbin/dtrace -s
#pragma D option destructive
#pragma D option quiet
BEGIN
{
system( "prun %d", $1 );
}
pid$1::main:entry
{
system( "/root/dtrace/tracelibc.d %d", $1 );
stop();
/* this instance of dtrace is now done */
exit( 0 );
}
And tracelibc.d adds its own system( "prun %d", $1 ); in the BEGIN probe, and it looks like:
#!/usr/sbin/dtrace -s
#pragma D option destructive
BEGIN
{
system( "prun %d", $1 );
}
pid$1:libc::entry
{
printf( "func: %s\n", probefunc );
}
Those three really slow up the ls process, but they do produce the expected output - and there's a lot of it, as expected.
Related
The following C snippet is supposed to be run by Windows IIS, as a CGI .exe program.
It outputs three character "a, b, c" with a 10 second delay between them.
However, if I use a browser to access the program, and then reloads the browser page to access the program again - then I get two processes running in parallell on the IIS.
At the browser I will of course only see the output of process 2, as the TCP connection to process 1 has been closed after the first "a" was received.
On the Windows server process 2 happily runs to completion, but processes 1 runs only until it outputs the second character "b".
The WriteFile that outputs that "b" is successful, and also the following log write "Done" is also excuted (thus, there is no fatil exception in WriteFile).
But then, suddenly, process 1 is terminated.
My theory is that IIS detects that some output is received from process 1, and that IIS then forcibly terminates it (as the client is disconnected)
If I add a 10ms sleep (commented below) after the WriteFile, then process 1 does not even execute the log write "Done".
I suppose that this is due to the fact that IIS needs a little time to perform that Terminate call, and without the Sleep the process has time to execute at least the log write "Done" before IIS terminates.
Does anybody recognize this?
And how do I stop IIS from terminating the process (except by beginning by forking it into a new process, that is not owned by IIS)
I really would like to run process 1 all the way to the end, even if no client is "listening" to it...
#include <stdio.h>
#include <windows.h>
void out(char *text)
{
int i;
int written;
char buf[1000];
FILE *fp;
for(i = 0; text[i] != '\0'; i++)
buf[i] = (text[i] == '\n' ? '^' : text[i]);
buf[i] = '\0';
if((fp = fopen("/temp/testkill.txt", "a")) != NULL) {
fprintf(fp, "%d: Write %s\n", _getpid(), buf);
fclose(fp);
}
if(WriteFile(GetStdHandle(STD_OUTPUT_HANDLE), text, strlen(text), &written, NULL) == 0)
written = -1;
// Sleep(10);
if((fp = fopen("/temp/testkill.txt", "a")) != NULL) {
fprintf(fp, "%d: Done! %s (%d)\n", _getpid(), buf, written);
fclose(fp);
}
}
main()
{
out("Content-Type: text/html\n\n<html><body>\n");
out("a");
Sleep(10000);
out("b");
Sleep(10000);
out("c");
}
Background
I have some very complex application. It is composition of couple libraries.
Now QA team found the some problem (something reports an error).
Fromm logs I can see that application is leaking a file descriptors (+1000 after 7 hours of automated tests).
QA team has delivered rapport "opened files and ports" from "Activity monitor" and I know exactly to which server connection is not closed.
From full application logs I can see that leak is quite systematic (there is no sudden burst), but I was unable to reproduce issue to see even a small leak of file descriptors.
Problem
Even thou I'm sure for which server connection is never closed, I'm unable to find code responsible.
I'm unable reproduce issue.
In logs I can see that all resources my library maintains are properly freed, still server address suggest this is my responsibility or NSURLSession (which is invalidated).
Since there are other libraries and application code it self there is small chance that leak is caused by third party code.
Question
How to locate code responsible for leaking file descriptor?
Best candidate is use dtruss which looks very promising.
From documentation I can see it can print stack backtraces -s when system API is used.
Problem is that I do not know how to use this in such way that I will not get flooded with information.
I need only information who created opened file descriptor and if it was closed destroyed.
Since I can't reproduce issue I need a script which could be run by QA team so the could deliver me an output.
If there are other ways to find the source of file descriptor leak please let me know.
There is bunch of predefined scripts which are using dtruss, but I don't see anything what is matching my needs.
Final notes
What is strange the only code I'm aware is using problematic connection, do not use file descriptors directly, but uses custom NSURLSession (configured as: one connection per host, minimum TLS 1.0, disable cookies, custom certificate validation). From logs I can see NSURLSession is invalidated properly. I doubt NSURLSession is source of leak, but currently this is the only candidate.
OK, I found out how to do it - on Solaris 11, anyway. I get this output (and yes, I needed root on Solaris 11):
bash-4.1# dtrace -s fdleaks.d -c ./fdLeaker
open( './fdLeaker' ) returned 3
open( './fdLeaker' ) returned 4
open( './fdLeaker' ) returned 5
falloc fp: ffffa1003ae56590, fd: 3, saved fd: 3
falloc fp: ffffa10139d28f58, fd: 4, saved fd: 4
falloc fp: ffffa10030a86df0, fd: 5, saved fd: 5
opened file: ./fdLeaker
leaked fd: 3
libc.so.1`__systemcall+0x6
libc.so.1`__open+0x29
libc.so.1`open+0x84
fdLeaker`main+0x2b
fdLeaker`_start+0x72
opened file: ./fdLeaker
leaked fd: 4
libc.so.1`__systemcall+0x6
libc.so.1`__open+0x29
libc.so.1`open+0x84
fdLeaker`main+0x64
fdLeaker`_start+0x72
The fdleaks.d dTrace script that finds leaked file descriptors:
#!/usr/sbin/dtrace
/* this will probably need tuning
note there can be significant performance
impacts if you make these large */
#pragma D option nspec=4
#pragma D option specsize=128k
#pragma D option quiet
syscall::open*:entry
/ pid == $target /
{
/* arg1 might not have a physical mapping yet so
we can't call copyinstr() until open() returns
and we don't have a file descriptor yet -
we won't get that until open() returns anyway */
self->path = arg1;
}
/* arg0 is the file descriptor being returned */
syscall::open*:return
/ pid == $target && arg0 >= 0 && self->path /
{
/* get a speculation ID tied to this
file descriptor and start speculative
tracing */
openspec[ arg0 ] = speculation();
speculate( openspec[ arg0 ] );
/* this output won't appear unless the associated
speculation id is commited */
printf( "\nopened file: %s\n", copyinstr( self->path ) );
printf( "leaked fd: %d\n\n", arg0 );
ustack();
/* free the saved path */
self->path = 0;
}
syscall::close:entry
/ pid == $target && arg0 >= 0 /
{
/* closing the fd, so discard the speculation
and free the id by setting it to zero */
discard( openspec[ arg0 ] );
openspec[ arg0 ] = 0;
}
/* Solaris uses falloc() to open a file and associate
the fd with an internal file_t structure
When the kernel closes file descriptors that the
process left open, it uses the closeall() function
which walks the internal structures then calls
closef() using the file_t *, so there's no way
to get the original process file descritor in
closeall() or closef() dTrace probes.
falloc() is called on open() to associate the
file_t * with a file descriptor, so this
saves the pointers passed to falloc()
that are used to return the file_t * and
file descriptor once they're filled in
when falloc() returns */
fbt::falloc:entry
/ pid == $target /
{
self->fpp = args[ 2 ];
self->fdp = args[ 3 ];
}
/* Clause-local variables to make casting clearer */
this int fd;
this uint64_t fp;
/* array to associate a file descriptor with its file_t *
structure in the kernel */
int fdArray[ uint64_t fp ];
fbt::falloc:return
/ pid == $target && self->fpp && self->fdp /
{
/* get the fd and file_t * values being
returned to the caller */
this->fd = ( * ( int * ) self->fdp );
this->fp = ( * ( uint64_t * ) self->fpp );
/* associate the fd with its file_t * */
fdArray[ this->fp ] = ( int ) this->fd;
/* verification output */
printf( "falloc fp: %x, fd: %d, saved fd: %d\n", this->fp, this->fd, fdArray[ this->fp ] );
}
/* if this gets called and the dereferenced
openspec array element is a still-valid
speculation id, the fd associated with
the file_t * passed to closef() was never
closed by the process itself */
fbt::closef:entry
/ pid == $target /
{
/* commit the speculative tracing since
this file descriptor was leaked */
commit( openspec[ fdArray[ arg0 ] ] );
}
First, I wrote this little C program to leak fds:
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
int main( int argc, char **argv )
{
int ii;
for ( ii = 0; ii < argc; ii++ )
{
int fd = open( argv[ ii ], O_RDONLY );
fprintf( stderr, "open( '%s' ) returned %d\n", argv[ ii ], fd );
fd = open( argv[ ii ], O_RDONLY );
fprintf( stderr, "open( '%s' ) returned %d\n", argv[ ii ], fd );
fd = open( argv[ ii ], O_RDONLY );
fprintf( stderr, "open( '%s' ) returned %d\n", argv[ ii ], fd );
close( fd );
}
return( 0 );
}
Then I ran it under this dTrace script to figure out what the kernel does to close orphaned file descriptors, dtrace -s exit.d -c ./fdLeaker:
#!/usr/sbin/dtrace -s
#pragma D option quiet
syscall::rexit:entry
{
self->exit = 1;
}
syscall::rexit:return
/ self->exit /
{
self->exit = 0;
}
fbt:::entry
/ self->exit /
{
printf( "---> %s\n", probefunc );
}
fbt:::return
/ self->exit /
{
printf( "<--- %s\n", probefunc );
}
That produced a lot of output, and I noticed closeall() and closef() functions, examined the source code, and wrote the dTrace script.
Note also that the process exit dTrace probe on Solaris 11 is the rexit one - that probably changes on OSX.
The biggest problem on Solaris is getting the file descriptor for the file in the kernel code that closes orphaned file descriptors. Solaris doesn't close by file descriptor, it closes by struct file_t pointers in the kernel open files structures for the process. So I had to examine the Solaris source to figure out where the fd is associated with the file_t * - and that's in the falloc() function. The dTrace script associates a file_t * with its fd in an associative array.
None of that is likely to work on OSX.
If you're lucky, the OSX kernel will close orphaned file descriptors by the file descriptor itself, or at least provide something that tells you the fd is being closed, perhaps an auditing function.
My project is to implement a simple shell program with background processing by way of ending an arglist with &, as in most UNIX shells. My problem is how to debug the shell in GDB when background processing requires child processes to be created.
My child processing code goes like
int id;
int child=-1;
int running=0;
if ((strcmp(args[0], "&")==0){
if ((id==fork())==-1)
perror("Couldn't start the background process");
else if (id==0){ //start the child process
running++;
printf("Job %d started, PID: %d\n", running, getpid());
signal(SIGINT, SIG_DFL);
signal(SIGQUIT, SIG_DFL);
execvp(args[0], args);
perror("Can't execute command);
exit(1);
else {
int jobNum= running-(running-1);
if ( (waitpid(-1, &child, WNOHANG) == -1)
perror("Child Wait");
else
printf("[%d] exited with status %d\n", jobNum, child>>8);
}
When I try to run a command, like ps &, and set the breakpoint to the function parser, the command executes without hitting the breakpoint. This is confusing and renders the debugger useless in this instance. What can I do about it?
I think you want
set follow-fork-mode child
also note that the line
if ((id==fork())==-1)
is comparing an uninitialized value against the return value of fork().
I believe you wanted an assignment.
I'm writing a fake shell, where I create a child process and then call execvp(). In the normal shell, when I enter an unknown command such as 'hello' it returns 'hello: Command not found.' However, when I pass hello into execvp(), it doesn't return any error by default and just continues running the rest of my program like nothing happened. What's the easiest way to find out if nothing was actually run? here's my code:
if(fork() == 0)
{
execvp(cmd, args);
}
else
{
int status = 0;
int corpse = wait(&status);
printf(Child %d exited with a status of %d\n", corpse, status);
}
I know that if corpse < 0, then it's an unknown command, but there are other conditions in my code not listed where I don't want to wait (such as if & is entered at the end of a command). Any suggestions?
All of the exec methods can return -1 if there was an error (errno is set appropriately). You aren't checking the result of execvp so if it fails, the rest of your program will continue executing. You could have something like this to prevent the rest of your program from executing:
if (execvp(cmd, args) == -1)
exit(EXIT_FAILURE);
You also want to check the result of fork() for <0.
You have two independent concerns.
1) is the return value of execvp. It shouldn't return. If it does there is a problem. Here's what I get execvp'ing a bad command. You don't want to wait if execvp fails. Always check the return values.
int res = execvp(argv[1], argv);
printf ("res is %i %s\n", res, strerror(errno));
// => res is -1 No such file or directory
2) The other concern is background processes and such. That's the job of a shell and you're going to need to figure out when your program should wait immediately and when you want to save the pid from fork and wait on it later.
I'm writing a tool that calls through to DTrace to trace the program that the user specifies.
If my tool uses dtrace -c to run the program as a subprocess of DTrace, not only can I not pass any arguments to the program, but the program runs with all the privileges of DTrace—that is, as root (I'm on Mac OS X). This makes certain things that should work break, and obviously makes a great many things that shouldn't work possible.
The other solution I know of is to start the program myself, pause it by sending it SIGSTOP, pass its PID to dtrace -p, then continue it by sending it SIGCONT. The problem is that either the program runs for a few seconds without being traced while DTrace gathers the symbol information or, if I sleep for a few seconds before continuing the process, DTrace complains that objc<pid>:<class>:<method>:entry matches no probes.
Is there a way that I can run the program under the user's account, not as root, but still have DTrace able to trace it from the beginning?
Something like sudo dtruss -f sudo -u <original username> <command> has worked for me, but I felt bad about it afterwards.
I filed a Radar bug about it and had it closed as a duplicate of #5108629.
Well, this is a bit old, but why not :-)..
I don't think there is a way to do this simply from command line, but as suggested, a simple launcher application, such as the following, would do it. The manual attaching could of course also be replaced with a few calls to libdtrace.
int main(int argc, char *argv[]) {
pid_t pid = fork();
if(pid == 0) {
setuid(123);
seteuid(123);
ptrace(PT_TRACE_ME, 0, NULL, 0);
execl("/bin/ls", "/bin/ls", NULL);
} else if(pid > 0) {
int status;
wait(&status);
printf("Process %d started. Attach now, and click enter.\n", pid);
getchar();
ptrace(PT_CONTINUE, pid, (caddr_t) 1, 0);
}
return 0;
}
This script takes the name of the executable (for an app this is the info.plist's CFBundleExecutable) you want to monitor to DTrace as a parameter (you can then launch the target app after this script is running):
string gTarget; /* the name of the target executable */
dtrace:::BEGIN
{
gTarget = $$1; /* get the target execname from 1st DTrace parameter */
/*
* Note: DTrace's execname is limited to 15 characters so if $$1 has more
* than 15 characters the simple string comparison "($$1 == execname)"
* will fail. We work around this by copying the parameter passed in $$1
* to gTarget and truncating that to 15 characters.
*/
gTarget[15] = 0; /* truncate to 15 bytes */
gTargetPID = -1; /* invalidate target pid */
}
/*
* capture target launch (success)
*/
proc:::exec-success
/
gTarget == execname
/
{
gTargetPID = pid;
}
/*
* detect when our target exits
*/
syscall::*exit:entry
/
pid == gTargetPID
/
{
gTargetPID = -1; /* invalidate target pid */
}
/*
* capture open arguments
*/
syscall::open*:entry
/
((pid == gTargetPID) || progenyof(gTargetPID))
/
{
self->arg0 = arg0;
self->arg1 = arg1;
}
/*
* track opens
*/
syscall::open*:return
/
((pid == gTargetPID) || progenyof(gTargetPID))
/
{
this->op_kind = ((self->arg1 & O_ACCMODE) == O_RDONLY) ? "READ" : "WRITE";
this->path0 = self->arg0 ? copyinstr(self->arg0) : "<nil>";
printf("open for %s: <%s> #%d",
this->op_kind,
this->path0,
arg0);
}
If the other answer doesn't work for you, can you run the program in gdb, break in main (or even earlier), get the pid, and start the script? I've tried that in the past and it seemed to work.
Create a launcher program that will wait for a signal of some sort (not necessarily a literal signal, just an indication that it's ready), then exec() your target. Now dtrace -p the launcher program, and once dtrace is up, let the launcher go.
dtruss has the -n option where you can specify name of process you want to trace, without starting it (Credit to latter part of #kenorb's answer at https://stackoverflow.com/a/11706251/970301). So something like the following should do it:
sudo dtruss -n "$program"
$program
There exists a tool darwin-debug that ships in Apple's CLT LLDB.framework which will spawn your program and pause it before it does anything. You then read the pid out of the unix socket you pass as an argument, and after attaching the debugger/dtrace you continue the process.
darwin-debug will exec itself into a child process <PROGRAM> that is
halted for debugging. It does this by using posix_spawn() along with
darwin specific posix_spawn flags that allows exec only (no fork), and
stop at the program entry point. Any program arguments <PROGRAM-ARG> are
passed on to the exec as the arguments for the new process. The current
environment will be passed to the new process unless the "--no-env"
option is used. A unix socket must be supplied using the
--unix-socket=<SOCKET> option so the calling program can handshake with
this process and get its process id.
See my answer on related question "How can get dtrace to run the traced command with non-root priviledges?" [sic].
Essentially, you can start a (non-root) background process which waits 1sec for DTrace to start up (sorry for race condition), and snoops the PID of that process.
sudo true && \
(sleep 1; cat /etc/hosts) &; \
sudo dtrace -n 'syscall:::entry /pid == $1/ {#[probefunc] = count();}' $! \
&& kill $!
Full explanation in linked answer.