I am using htop on osx and I can't seem to find out what a 'C' status in the 'S' status column means for a process status.
What does a C process status mean in htop?
Here are the different values that the s, stat and state output specifiers (header "STAT" or "S") will display to describe the state of a process:
D Uninterruptible sleep (usually IO)
R Running or runnable (on run queue)
S Interruptible sleep (waiting for an event to complete)
T Stopped, either by a job control signal or because it is being traced.
W Paging (not valid since the 2.6.xx kernel)
X Dead (should never be seen)
Z Defunct ("zombie") process, terminated but not reaped by its parent.
Source: man ps
I recently came across a second list:
R Running
S Sleeping in an interruptible wait
D Waiting in uninterruptible disk sleep
Z Zombie
T Stopped (on a signal) or (before Linux 2.6.33) trace stopped
t Tracing stop (Linux 2.6.33 onward)
W Paging (only before Linux 2.6.0)
X Dead (from Linux 2.6.0 onward)
x Dead (Linux 2.6.33 to 3.13 only)
K Wakekill (Linux 2.6.33 to 3.13 only)
W Waking (Linux 2.6.33 to 3.13 only)
P Parked (Linux 3.9 to 3.13 only)
http://man7.org/linux/man-pages/man5/proc.5.html in the "/proc/[pid]/stat" section:
htop author here. I am not aware of such status code in the htop codebase.
Keep in mind that htop is written for Linux only, so there is no support for macOS/OSX. When I hear of people running it on OSX they are often using an outdated, unsupported fork (the latest version of htop is 2.0.1, including macOS support).
I've got the same question recently. We can try to look it up in the htop sources:
process->state =
ProcessList_decodeState( p->p_stat == SZOMB ? SZOMB : ki->state );
static int
ProcessList_decodeState( int st ) {
switch ( st ) {
case SIDL:
return 'C';
case SRUN:
return 'R';
case SSLEEP:
return 'S';
case SSTOP:
return 'T';
case SZOMB:
return 'Z';
default:
return '?';
}
}
So we go to Unix definition of process states at /usr/include/sys/proc.h:
/* Status values. */
#define SIDL 1 /* Process being created by fork. */
#define SRUN 2 /* Currently runnable. */
#define SSLEEP 3 /* Sleeping on an address. */
#define SSTOP 4 /* Process debugging or suspension. */
#define SZOMB 5 /* Awaiting collection by parent. */
So, 'C' status is meant to be 'Process being created by fork'. What is it? According to old unix sources, it's a transient state that happens when there's not enough memory when forking a process and the parent needs to be swapped out.
What??
Back to htop source. Where do we get the ki->state?
// For all threads in process:
error = thread_info( ki->thread_list[j], THREAD_BASIC_INFO,
( thread_info_t ) & ki->thval[j].tb,
&thread_info_count );
tstate = ProcessList_machStateOrder( ki->thval[j].tb.run_state,
ki->thval[j].tb.sleep_time );
if ( tstate < ki->state )
ki->state = tstate;
// Below...
static int
ProcessList_machStateOrder( int s, long sleep_time ) {
switch ( s ) {
case TH_STATE_RUNNING:
return 1;
case TH_STATE_UNINTERRUPTIBLE:
return 2;
case TH_STATE_WAITING:
return ( sleep_time > 20 ) ? 4 : 3;
case TH_STATE_STOPPED:
return 5;
case TH_STATE_HALTED:
return 6;
default:
return 7;
}
}
// In mach/thread_info.h:
#define TH_STATE_RUNNING 1 /* thread is running normally */
#define TH_STATE_STOPPED 2 /* thread is stopped */
#define TH_STATE_WAITING 3 /* thread is waiting normally */
#define TH_STATE_UNINTERRUPTIBLE 4 /* thread is in an uninterruptible wait */
#define TH_STATE_HALTED 5 /* thread is halted at a clean point */
We have the following (messed up) mapping:
Thread state | Mapped to | htop state| 'top' state | 'ps' state
----------------------------------------------------------------------------
TH_STATE_RUNNING | SIDL(1) | 'C' | running | 'R'
TH_STATE_UNINTERRUPTIBLE | SRUN(2) | 'R' | stuck | 'U'
TH_STATE_WAITING (short) | SSLEEP(3) | 'S' | sleeping | 'S'
TH_STATE_WAITING (long) | SSTOP(4) | 'T' | idle | 'I'
TH_STATE_STOPPED | SZOMB(5) | 'Z' | stopped | 'T'
So, the real answer is: 'C' means that the process is currently running.
How did it happen? Seems that the ki->state handling was borrowed from ps source and wasn't adjusted for Unix process codes.
Update: this bug is fixed. Yay open source!
Related
I would like to trace all function calls for a given library in a process, but the process is going to exit and re-open regularly, and I want to keep tracing.
I am doing this now:
oneshot$target:LIBRARY::entry
{
printf("%s\n", probefunc);
}
However, this only lets me provide one pid at a time. Can I keep this going?
I want something like:
*:LIBRARY::entry
/execname == "foo"/
but that * doesn't work there.
Thanks!
I don't think you can do this with just a single dtrace script. You'd need two (at least...). And you need to have the ability to run the destructive system() action, which most likely means root access.
Assume you want to run this script on any new ls process:
#!/usr/sbin/dtrace -s
pid$1:libc::entry
{
printf( "func: %s\n", probefunc );
}
Assuming the path to that script is /root/dtrace/tracelibc.d, the following script will start dtrace on any new ls process that gets started. Note that you need #pragma D option destructive to be able to start dtrace on the new process:
#!/usr/sbin/dtrace -s
#pragma D option destructive
#pragma D option quiet
proc:::exec-success
/ "ls" == basename( execname ) /
{
printf( "tracing process %d\n", pid );
system( "/root/dtrace/tracelibc.d %d", pid );
}
That should work, but in this case ls is such a short-lived process that something like this happens quite often:
dtrace: failed to compile script /root/dtrace/tracelibc.d: line 10:
failed to grab process 12289
The process is gone by the time dtrace gets going. If you're tracing long-lived processes and don't care that you might miss the first few probes because dtrace takes a while to attach, you're done.
But, if you want to trace short-lived processes, you need to stop the process right when it starts, then restart it after dtrace attaches:
#!/usr/sbin/dtrace -s
#pragma D option destructive
#pragma D option quiet
proc:::exec-success
/ "ls" == basename( execname ) /
{
printf( "stopping process %d\n", pid );
system( "/root/dtrace/tracelibc.d %d", pid );
stop();
}
and start it back up in tracelibc.d:
#!/usr/sbin/dtrace -s
#pragma D option destructive
BEGIN
{
system( "prun %d", $1 );
}
pid$1:libc::entry
{
printf( "func: %s\n", probefunc );
}
Note that I'm using Solaris prun to restart the stopped process. You'd have to look at the Mac dtrace documentation for the stop() call to get the Mac equivalent of Solaris prun.
But... ooops. The two scripts above combine to produce:
stopping process 12274
dtrace: failed to compile script /root/dtrace/tracelibc.d: line 10:
probe description pid12274:libc::entry does not match any probes
Why does this say pid12274:libc::entry doesn't match any probes? Oh, yeah - when exec returns, the libc.so shared object hasn't been loaded into memory yet. We need a probe that's guaranteed to exist in the target process, and that gets called after libc.so is loaded, but before any processing gets done. main should suffice. So the main script to start it all becomes:
#!/usr/sbin/dtrace -s
#pragma D option destructive
#pragma D option quiet
proc:::exec-success
/ "ls" == basename( execname ) /
{
printf( "stopping process %d\n", pid );
system( "/root/dtrace/tracemain.d %d", pid );
stop();
}
That starts the tracemain.d script, that restarts the process, loads the tracelibc.d script, and stops the process again:
#!/usr/sbin/dtrace -s
#pragma D option destructive
#pragma D option quiet
BEGIN
{
system( "prun %d", $1 );
}
pid$1::main:entry
{
system( "/root/dtrace/tracelibc.d %d", $1 );
stop();
/* this instance of dtrace is now done */
exit( 0 );
}
And tracelibc.d adds its own system( "prun %d", $1 ); in the BEGIN probe, and it looks like:
#!/usr/sbin/dtrace -s
#pragma D option destructive
BEGIN
{
system( "prun %d", $1 );
}
pid$1:libc::entry
{
printf( "func: %s\n", probefunc );
}
Those three really slow up the ls process, but they do produce the expected output - and there's a lot of it, as expected.
Background
I have some very complex application. It is composition of couple libraries.
Now QA team found the some problem (something reports an error).
Fromm logs I can see that application is leaking a file descriptors (+1000 after 7 hours of automated tests).
QA team has delivered rapport "opened files and ports" from "Activity monitor" and I know exactly to which server connection is not closed.
From full application logs I can see that leak is quite systematic (there is no sudden burst), but I was unable to reproduce issue to see even a small leak of file descriptors.
Problem
Even thou I'm sure for which server connection is never closed, I'm unable to find code responsible.
I'm unable reproduce issue.
In logs I can see that all resources my library maintains are properly freed, still server address suggest this is my responsibility or NSURLSession (which is invalidated).
Since there are other libraries and application code it self there is small chance that leak is caused by third party code.
Question
How to locate code responsible for leaking file descriptor?
Best candidate is use dtruss which looks very promising.
From documentation I can see it can print stack backtraces -s when system API is used.
Problem is that I do not know how to use this in such way that I will not get flooded with information.
I need only information who created opened file descriptor and if it was closed destroyed.
Since I can't reproduce issue I need a script which could be run by QA team so the could deliver me an output.
If there are other ways to find the source of file descriptor leak please let me know.
There is bunch of predefined scripts which are using dtruss, but I don't see anything what is matching my needs.
Final notes
What is strange the only code I'm aware is using problematic connection, do not use file descriptors directly, but uses custom NSURLSession (configured as: one connection per host, minimum TLS 1.0, disable cookies, custom certificate validation). From logs I can see NSURLSession is invalidated properly. I doubt NSURLSession is source of leak, but currently this is the only candidate.
OK, I found out how to do it - on Solaris 11, anyway. I get this output (and yes, I needed root on Solaris 11):
bash-4.1# dtrace -s fdleaks.d -c ./fdLeaker
open( './fdLeaker' ) returned 3
open( './fdLeaker' ) returned 4
open( './fdLeaker' ) returned 5
falloc fp: ffffa1003ae56590, fd: 3, saved fd: 3
falloc fp: ffffa10139d28f58, fd: 4, saved fd: 4
falloc fp: ffffa10030a86df0, fd: 5, saved fd: 5
opened file: ./fdLeaker
leaked fd: 3
libc.so.1`__systemcall+0x6
libc.so.1`__open+0x29
libc.so.1`open+0x84
fdLeaker`main+0x2b
fdLeaker`_start+0x72
opened file: ./fdLeaker
leaked fd: 4
libc.so.1`__systemcall+0x6
libc.so.1`__open+0x29
libc.so.1`open+0x84
fdLeaker`main+0x64
fdLeaker`_start+0x72
The fdleaks.d dTrace script that finds leaked file descriptors:
#!/usr/sbin/dtrace
/* this will probably need tuning
note there can be significant performance
impacts if you make these large */
#pragma D option nspec=4
#pragma D option specsize=128k
#pragma D option quiet
syscall::open*:entry
/ pid == $target /
{
/* arg1 might not have a physical mapping yet so
we can't call copyinstr() until open() returns
and we don't have a file descriptor yet -
we won't get that until open() returns anyway */
self->path = arg1;
}
/* arg0 is the file descriptor being returned */
syscall::open*:return
/ pid == $target && arg0 >= 0 && self->path /
{
/* get a speculation ID tied to this
file descriptor and start speculative
tracing */
openspec[ arg0 ] = speculation();
speculate( openspec[ arg0 ] );
/* this output won't appear unless the associated
speculation id is commited */
printf( "\nopened file: %s\n", copyinstr( self->path ) );
printf( "leaked fd: %d\n\n", arg0 );
ustack();
/* free the saved path */
self->path = 0;
}
syscall::close:entry
/ pid == $target && arg0 >= 0 /
{
/* closing the fd, so discard the speculation
and free the id by setting it to zero */
discard( openspec[ arg0 ] );
openspec[ arg0 ] = 0;
}
/* Solaris uses falloc() to open a file and associate
the fd with an internal file_t structure
When the kernel closes file descriptors that the
process left open, it uses the closeall() function
which walks the internal structures then calls
closef() using the file_t *, so there's no way
to get the original process file descritor in
closeall() or closef() dTrace probes.
falloc() is called on open() to associate the
file_t * with a file descriptor, so this
saves the pointers passed to falloc()
that are used to return the file_t * and
file descriptor once they're filled in
when falloc() returns */
fbt::falloc:entry
/ pid == $target /
{
self->fpp = args[ 2 ];
self->fdp = args[ 3 ];
}
/* Clause-local variables to make casting clearer */
this int fd;
this uint64_t fp;
/* array to associate a file descriptor with its file_t *
structure in the kernel */
int fdArray[ uint64_t fp ];
fbt::falloc:return
/ pid == $target && self->fpp && self->fdp /
{
/* get the fd and file_t * values being
returned to the caller */
this->fd = ( * ( int * ) self->fdp );
this->fp = ( * ( uint64_t * ) self->fpp );
/* associate the fd with its file_t * */
fdArray[ this->fp ] = ( int ) this->fd;
/* verification output */
printf( "falloc fp: %x, fd: %d, saved fd: %d\n", this->fp, this->fd, fdArray[ this->fp ] );
}
/* if this gets called and the dereferenced
openspec array element is a still-valid
speculation id, the fd associated with
the file_t * passed to closef() was never
closed by the process itself */
fbt::closef:entry
/ pid == $target /
{
/* commit the speculative tracing since
this file descriptor was leaked */
commit( openspec[ fdArray[ arg0 ] ] );
}
First, I wrote this little C program to leak fds:
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
int main( int argc, char **argv )
{
int ii;
for ( ii = 0; ii < argc; ii++ )
{
int fd = open( argv[ ii ], O_RDONLY );
fprintf( stderr, "open( '%s' ) returned %d\n", argv[ ii ], fd );
fd = open( argv[ ii ], O_RDONLY );
fprintf( stderr, "open( '%s' ) returned %d\n", argv[ ii ], fd );
fd = open( argv[ ii ], O_RDONLY );
fprintf( stderr, "open( '%s' ) returned %d\n", argv[ ii ], fd );
close( fd );
}
return( 0 );
}
Then I ran it under this dTrace script to figure out what the kernel does to close orphaned file descriptors, dtrace -s exit.d -c ./fdLeaker:
#!/usr/sbin/dtrace -s
#pragma D option quiet
syscall::rexit:entry
{
self->exit = 1;
}
syscall::rexit:return
/ self->exit /
{
self->exit = 0;
}
fbt:::entry
/ self->exit /
{
printf( "---> %s\n", probefunc );
}
fbt:::return
/ self->exit /
{
printf( "<--- %s\n", probefunc );
}
That produced a lot of output, and I noticed closeall() and closef() functions, examined the source code, and wrote the dTrace script.
Note also that the process exit dTrace probe on Solaris 11 is the rexit one - that probably changes on OSX.
The biggest problem on Solaris is getting the file descriptor for the file in the kernel code that closes orphaned file descriptors. Solaris doesn't close by file descriptor, it closes by struct file_t pointers in the kernel open files structures for the process. So I had to examine the Solaris source to figure out where the fd is associated with the file_t * - and that's in the falloc() function. The dTrace script associates a file_t * with its fd in an associative array.
None of that is likely to work on OSX.
If you're lucky, the OSX kernel will close orphaned file descriptors by the file descriptor itself, or at least provide something that tells you the fd is being closed, perhaps an auditing function.
I'm building a device driver of sorts that consumes data from a keyboard emulating device.
The device is a card swipe, so its behavior is as follows:
User walks up, swipes card
I get a string of characters (key codes, really, including modifier keys for capital letters)
I don't know how many characters I'm going to get
I don't know when I'm getting something
Since I don't know how many characters I'm going to get, blocking reads on the keyboard tty aren't useful - I'd end up blocking after the last character. What I'm doing is, in Ruby, using the IO module to perform async reads against the keyboard device, and using a timeout to determine that the end of data was reached. This works fine logically (even a user swiping his or her card fast will do so slower than the send rate between characters).
The issue is that sometimes, I lose data from the middle of the string. My hunch is that there's some sort of buffer overflow happening because I'm reading the data too slowly. Trying to confirm this, I inserted small waits in between each key process. Longer waits (20ms+) do exacerbate the problem. However, a wait of around 5ms actually makes it go away? The only explanation I can come up with is that the async read itself is expensive (because Ruby), and doing them without a rate limit is actually slower than doing them with a 5ms delay.
Does this sound rational? Are there other ideas on what this could be?
The ruby is actually JRuby 9000. The machine is Ubuntu LTS 16.
Edit: here's a snippet of the relevant code
private def read_swipe(buffer_size, card_reader_input, pause_between_reads, seconds_to_complete)
limit = Time.now + seconds_to_complete.seconds
swipe_data = ''
begin
start_time = Time.now
sleep pause_between_reads
batch = card_reader_input.read_nonblock(buffer_size)
swipe_data << batch
rescue IO::WaitReadable
IO.select([card_reader_input], nil, nil, 0.5)
retry unless limit < start_time
end while start_time < limit
swipe_data
end
where card_reader_input = File.new(event_handle, 'rb')
I am not sure about Ruby code but you can use linux sysfs to access the characters coming out of keyboard 'like' device, and if feasible you can call C code from ruby application. I had done this for barcode reader and following is the code:
static int init_barcode_com(char* bcr_portname)
{
int fd;
/* Open the file descriptor in non-blocking mode */
fd = open(bcr_portname, O_RDONLY | O_NOCTTY | O_NDELAY);
cout << "Barcode Reader FD: " << fd <<endl;
if (fd == -1)
{
cerr << "ERROR: Cannot open fd for barcode communication with error " << fd <<endl;
}
fcntl(fd, F_SETFL, 0);
/* Set up the control structure */
struct termios toptions;
/* Get currently set options for the tty */
tcgetattr(fd, &toptions);
/* Set custom options */
/* 9600 baud */
cfsetispeed(&toptions, B9600);
cfsetospeed(&toptions, B9600);
/* 8 bits, no parity, no stop bits */
toptions.c_cflag &= ~PARENB;
toptions.c_cflag &= ~CSTOPB;
toptions.c_cflag &= ~CSIZE;
toptions.c_cflag |= CS8;
/* no hardware flow control */
toptions.c_cflag &= ~CRTSCTS;
/* enable receiver, ignore status lines */
toptions.c_cflag |= CREAD | CLOCAL;
/* disable input/output flow control, disable restart chars */
toptions.c_iflag &= ~(IXON | IXOFF | IXANY);
/* disable canonical input, disable echo,
* disable visually erase chars,
* disable terminal-generated signals */
toptions.c_lflag &= ~(ICANON | ECHO | ECHOE | ISIG);
/* disable output processing */
toptions.c_oflag &= ~OPOST;
/* wait for n (in our case its 1) characters to come in before read returns */
/* WARNING! THIS CAUSES THE read() TO BLOCK UNTIL ALL */
/* CHARACTERS HAVE COME IN! */
toptions.c_cc[VMIN] = 0;
/* no minimum time to wait before read returns */
toptions.c_cc[VTIME] = 100;
/* commit the options */
tcsetattr(fd, TCSANOW, &toptions);
/* Wait for the Barcode to reset */
usleep(10*1000);
return fd;
}
static int read_from_barcode_reader(int fd, char* bcr_buf)
{
int i = 0, nbytes = 0;
char buf[1];
/* Flush anything already in the serial buffer */
tcflush(fd, TCIFLUSH);
while (1) {
nbytes = read(fd, buf, 1); // read a char at a time
if (nbytes == -1) {
return -1; // Couldn't read
}
if (nbytes == 0) {
return 0;
}
if (buf[0] == '\n' || buf[0] == '\r') {
return 0;
}
bcr_buf[i] = buf[0];
i++;
}
return 0;
}
Now that you do not know how many characters your going to get you can use VMIN and VTIME combination to address your concern. This document details various possibilities with VMIN and VTIME.
Consider the kernel module testhrarr.c given in debugging - Watch a variable (memory address) change in Linux kernel, and print stack trace when it changes? - Stack Overflow, which tracks write accesses to a memory location.
Now, I'm trying to track read/write accesses, which I've done by changing this line in the testhrarr_init function:
attr.bp_type = HW_BREAKPOINT_W | HW_BREAKPOINT_R;
First subquestion here is this: on my platform (Linux kernel 2.6.38); HW_BREAKPOINT_W on its own works, and HW_BREAKPOINT_W | HW_BREAKPOINT_R for read/write works too - however trying only read access with HW_BREAKPOINT_R doesn't work; as I get "Breakpoint registration failed" in /var/log/syslog, and insmod fails with "insmod: error inserting './testhrarr.ko': -1 Invalid parameters". Does anyone have any idea why?
Anyways, the problem here is that once I'm watching for read/write access with HW_BREAKPOINT_W | HW_BREAKPOINT_R, I cannot tell in my handler whether the stack trace I'm getting is due to a read or a write access. I have gone through:
struct perf_event_attr in include/linux/perf_event.h
struct hw_perf_event include/linux/perf_event.h
include/linux/hw_breakpoint.h for the HW_BREAKPOINT_* enum definitions
... and I couldn't find anywhere an explicit technique to retrieve from the hw breakpoint handler, whether it was a read or write access that triggered. I only found something in kernel/trace/trace_ksym.c, but trace_ksym has been deprecated (not even found in my 2.6.38), and besides it simply reads attr.bp_type, which is what we ourselves set in the kernel module.
So, after some brute-forcing, I realized that the hw_perf_event->interrupts may contain this information; so I modified the handler callback as:
static void sample_hbp_handler(struct perf_event *bp,
struct perf_sample_data *data,
struct pt_regs *regs)
{
struct perf_event_attr attr = bp->attr;
struct hw_perf_event hw = bp->hw;
char hwirep[8];
//it looks like printing %llu, data->type here causes segfault/oops when `cat` runs?
// apparently, hw.interrupts changes depending on read/write access (1 or 2)
// when only HW_BREAKPOINT_W, getting hw.interrupts == 1 always;
// only HW_BREAKPOINT_R - fails for me
// when both, hw.interrupts is either 1 or 2
// defined in include/linux/hw_breakpoint.h:
// HW_BREAKPOINT_R = 1, HW_BREAKPOINT_W = 2,
if (hw.interrupts == HW_BREAKPOINT_R) {
strcpy(hwirep, "_R");
} else if (hw.interrupts == HW_BREAKPOINT_W) {
strcpy(hwirep, "_W");
} else {
strcpy(hwirep, "__");
}
printk(KERN_INFO "+--- %s value is accessed (.bp_type %d, .type %d, state %d htype %d hwi %llu / %s ) ---+\n", ksym_name, attr.bp_type, attr.type, hw.state, hw.info.type, hw.interrupts, hwirep);
dump_stack();
printk(KERN_INFO "|___ Dump stack from sample_hbp_handler ___|\n");
}
... and this seems to work somewhat; as I get the following in syslog:
$ grep "testhrarr_arr_first value" /var/log/syslog
kernel: [ 200.887620] +--- testhrarr_arr_first value is accessed (.bp_type 3, .type 5, state 0 htype 131 hwi 1 / _R ) ---+
kernel: [ 200.892163] +--- testhrarr_arr_first value is accessed (.bp_type 3, .type 5, state 0 htype 131 hwi 1 / _R ) ---+
kernel: [ 200.892634] +--- testhrarr_arr_first value is accessed (.bp_type 3, .type 5, state 0 htype 131 hwi 2 / _W ) ---+
kernel: [ 200.912192] +--- testhrarr_arr_first value is accessed (.bp_type 3, .type 5, state 0 htype 131 hwi 1 / _R ) ---+
kernel: [ 200.912713] +--- testhrarr_arr_first value is accessed (.bp_type 3, .type 5, state 0 htype 131 hwi 2 / _W ) ---+
kernel: [ 200.932138] +--- testhrarr_arr_first value is accessed (.bp_type 3, .type 5, state 0 htype 131 hwi 1 / _R ) ---+
... however, if attr.bp_type is just HW_BREAKPOINT_W, then just three hw.interrupts == 1 are reported (which would be then wrongly reported as _R with the above code).
So then, if I just invert the meanings of _R and _W, I may get something that matches what I guess should occur - but this is quite obviously a shot in the dark, since I have no idea what hw_perf_event->interrupts should actually stand for.
So - does anyone know the proper way of determining the "direction" (read or write) of access to a hardware-watched memory location?
Edit: the answer for my first subquestion: for my architecture, x86, there is this piece of code:
http://lxr.free-electrons.com/source/arch/x86/kernel/hw_breakpoint.c?v=2.6.38#L252
static int arch_build_bp_info(struct perf_event *bp)
{
struct arch_hw_breakpoint *info = counter_arch_bp(bp);
info->address = bp->attr.bp_addr;
/* Type */
switch (bp->attr.bp_type) {
case HW_BREAKPOINT_W:
info->type = X86_BREAKPOINT_WRITE;
break;
case HW_BREAKPOINT_W | HW_BREAKPOINT_R:
info->type = X86_BREAKPOINT_RW;
break;
case HW_BREAKPOINT_X:
info->type = X86_BREAKPOINT_EXECUTE;
// ...
default:
return -EINVAL;
}
...
}
... which clearly states that X86 has either _WRITE or _RW breakpoints; so if we try to set up just for HW_BREAKPOINT_R, the process would fail returning -EINVAL.
So, I guess, I'd need the answer here primarily for X86, although if there is a generic portable mechanism for determining read/write access, I'd rather know about that...
I'm writing a tool that calls through to DTrace to trace the program that the user specifies.
If my tool uses dtrace -c to run the program as a subprocess of DTrace, not only can I not pass any arguments to the program, but the program runs with all the privileges of DTrace—that is, as root (I'm on Mac OS X). This makes certain things that should work break, and obviously makes a great many things that shouldn't work possible.
The other solution I know of is to start the program myself, pause it by sending it SIGSTOP, pass its PID to dtrace -p, then continue it by sending it SIGCONT. The problem is that either the program runs for a few seconds without being traced while DTrace gathers the symbol information or, if I sleep for a few seconds before continuing the process, DTrace complains that objc<pid>:<class>:<method>:entry matches no probes.
Is there a way that I can run the program under the user's account, not as root, but still have DTrace able to trace it from the beginning?
Something like sudo dtruss -f sudo -u <original username> <command> has worked for me, but I felt bad about it afterwards.
I filed a Radar bug about it and had it closed as a duplicate of #5108629.
Well, this is a bit old, but why not :-)..
I don't think there is a way to do this simply from command line, but as suggested, a simple launcher application, such as the following, would do it. The manual attaching could of course also be replaced with a few calls to libdtrace.
int main(int argc, char *argv[]) {
pid_t pid = fork();
if(pid == 0) {
setuid(123);
seteuid(123);
ptrace(PT_TRACE_ME, 0, NULL, 0);
execl("/bin/ls", "/bin/ls", NULL);
} else if(pid > 0) {
int status;
wait(&status);
printf("Process %d started. Attach now, and click enter.\n", pid);
getchar();
ptrace(PT_CONTINUE, pid, (caddr_t) 1, 0);
}
return 0;
}
This script takes the name of the executable (for an app this is the info.plist's CFBundleExecutable) you want to monitor to DTrace as a parameter (you can then launch the target app after this script is running):
string gTarget; /* the name of the target executable */
dtrace:::BEGIN
{
gTarget = $$1; /* get the target execname from 1st DTrace parameter */
/*
* Note: DTrace's execname is limited to 15 characters so if $$1 has more
* than 15 characters the simple string comparison "($$1 == execname)"
* will fail. We work around this by copying the parameter passed in $$1
* to gTarget and truncating that to 15 characters.
*/
gTarget[15] = 0; /* truncate to 15 bytes */
gTargetPID = -1; /* invalidate target pid */
}
/*
* capture target launch (success)
*/
proc:::exec-success
/
gTarget == execname
/
{
gTargetPID = pid;
}
/*
* detect when our target exits
*/
syscall::*exit:entry
/
pid == gTargetPID
/
{
gTargetPID = -1; /* invalidate target pid */
}
/*
* capture open arguments
*/
syscall::open*:entry
/
((pid == gTargetPID) || progenyof(gTargetPID))
/
{
self->arg0 = arg0;
self->arg1 = arg1;
}
/*
* track opens
*/
syscall::open*:return
/
((pid == gTargetPID) || progenyof(gTargetPID))
/
{
this->op_kind = ((self->arg1 & O_ACCMODE) == O_RDONLY) ? "READ" : "WRITE";
this->path0 = self->arg0 ? copyinstr(self->arg0) : "<nil>";
printf("open for %s: <%s> #%d",
this->op_kind,
this->path0,
arg0);
}
If the other answer doesn't work for you, can you run the program in gdb, break in main (or even earlier), get the pid, and start the script? I've tried that in the past and it seemed to work.
Create a launcher program that will wait for a signal of some sort (not necessarily a literal signal, just an indication that it's ready), then exec() your target. Now dtrace -p the launcher program, and once dtrace is up, let the launcher go.
dtruss has the -n option where you can specify name of process you want to trace, without starting it (Credit to latter part of #kenorb's answer at https://stackoverflow.com/a/11706251/970301). So something like the following should do it:
sudo dtruss -n "$program"
$program
There exists a tool darwin-debug that ships in Apple's CLT LLDB.framework which will spawn your program and pause it before it does anything. You then read the pid out of the unix socket you pass as an argument, and after attaching the debugger/dtrace you continue the process.
darwin-debug will exec itself into a child process <PROGRAM> that is
halted for debugging. It does this by using posix_spawn() along with
darwin specific posix_spawn flags that allows exec only (no fork), and
stop at the program entry point. Any program arguments <PROGRAM-ARG> are
passed on to the exec as the arguments for the new process. The current
environment will be passed to the new process unless the "--no-env"
option is used. A unix socket must be supplied using the
--unix-socket=<SOCKET> option so the calling program can handshake with
this process and get its process id.
See my answer on related question "How can get dtrace to run the traced command with non-root priviledges?" [sic].
Essentially, you can start a (non-root) background process which waits 1sec for DTrace to start up (sorry for race condition), and snoops the PID of that process.
sudo true && \
(sleep 1; cat /etc/hosts) &; \
sudo dtrace -n 'syscall:::entry /pid == $1/ {#[probefunc] = count();}' $! \
&& kill $!
Full explanation in linked answer.