Tracing all functions calls and printing out their parameters (userspace) - debugging

I want to see what functions are called in my user-space C99 program and in what order. Also, which parameters are given.
Can I do this with DTrace?
E.g. for program
int g(int a, int b) { puts("I'm g"); }
int f(int a, int b) { g(5+a,b);g(8+b,a);}
int main() {f(5,2);f(5,3);}
I wand see a text file with:
main(1,{"./a.out"})
f(5,2);
g(10,2);
puts("I'm g");
g(10,5);
puts("I'm g");
f(5,3);
g(10,3);
puts("I'm g");
g(11,5);
puts("I'm g");
I want not to modify my source and the program is really huge - 9 thousand of functions.
I have all sources; I have a program with debug info compiled into it, and gdb is able to print function parameters in backtrace.
Is the task solvable with DTrace?
My OS is one of BSD, Linux, MacOS, Solaris. I prefer Linux, but I can use any of listed OS.

Here's how you can do it with DTrace:
script='pid$target:a.out::entry,pid$target:a.out::return { trace(arg1); }'
dtrace -F -n "$script" -c ./a.out
The output of this command is like as follows on FreeBSD 14.0-CURRENT:
dtrace: description 'pid$target:a.out::entry,pid$target:a.out::return ' matched 17 probes
I'm g
I'm g
I'm g
I'm g
dtrace: pid 39275 has exited
CPU FUNCTION
3 -> _start 34361917680
3 -> handle_static_init 140737488341872
3 <- handle_static_init 2108000
3 -> main 140737488341872
3 -> f 2
3 -> g 2
3 <- g 32767
3 -> g 5
3 <- g 32767
3 <- f 0
3 -> f 3
3 -> g 3
3 <- g 32767
3 -> g 5
3 <- g 32767
3 <- f 0
3 <- main 0
3 -> __do_global_dtors_aux 140737488351184
3 <- __do_global_dtors_aux 0
The annoying thing is that I've not found a way to print all the function arguments (see How do you print an associative array in DTrace?). A hacky workaround is to add trace(arg2), trace(arg3), etc. The problem is that for nonexistent arguments there will be garbage printed out.

Yes, you can do this with dtrace. But you probably will never be able to do it on linux. I've tried multiple versions of the linux port of dtrace and it's never done what I wanted. In fact, it once caused a CPU panic. Download the dtrace toolkit from http://www.brendangregg.com/dtrace.html. Then set your PATH accordingly. Then execute this:
dtruss -a yourprogram args...

Your question is exceedingly likely to be misguided. For any non-trivial program, printing the sequense of all function calls executed with their parameters will result in multi-MB or even multi-GB output, that you will not be able to make any sense of (too much detail for a human to understand).
That said, I don't believe you can achieve what you want with dtrace.
You might begin by using GCC -finstrument-functions flag, which would easily allow you to print function addresses on entry/exit to every function. You can then trivialy convert addresses into function names with addr2line. This gives you what you asked for (except parameters).
If the result doesn't prove to be too much detail, you can set a breakpoint on every function in GDB (with rb . command), and attach continue command to every breakpoint. This will result in a steady stream of breakpoints being hit (with parameters), but the execution will likely be at least 100 to 1000 times slower.

Related

How can I implement the following loop in Erlang?

I have the following pseudocode:
for ( int i = 0; i < V ; i++ )
{
for( int j = 0 ; j < V ; j++ )
{
if( ( i != j ) && ( tuple {i,j} belong to E ) )
{
R[i] := {i,j};
}
}
}
I want to parallelise this code using erlang. How can I achieve the same thing using Erlang? I am new to Erlang...
Edit:
I know that the following code runs both the calls to say/2 concurrently:
-module(pmap).
-export([say/2]).
say(_,0) ->
io:format("Done ~n");
say(Value,Times) ->
io:format("Hello ~n"),
say(Value,Times-1).
start_concurrency(Value1, Value2) ->
spawn(pmap, say, [Value1, 3]),
spawn(pmap, say, [Value2, 3]).
However, here we are hardcoding the functions. So, suppose I want to call say 1000 times, do I need to write spawn(pmap, say, [Valuex, 3]) 1000 times? I can use recursion, but won't it be giving a sequential performance?
Edit:
I tried the following code, where I aim to create 3 threads, where each thread wants to run a say function. I want to run these 3 say functions concurrently(Please comment in the box for more clarification):
-module(pmap).
-export([say/1,test/1,start_concurrency/1]).
say(0) ->
io:format("Done ~n");
say(Times) ->
io:format("Hello ~p ~n",[Times]),
say(Times-1).
test(0) ->
spawn(pmap, say, [3]);
test(Times) ->
spawn(pmap, say, [3]),
test(Times-1).
start_concurrency(Times) ->
test(Times).
Is this code correct?
I want to run these 3 say functions concurrently. Is this code
correct?
You can get rid of your start_concurrency(N) function because it doesn't do anything. Instead, you can call test(N) directly.
I aim to create 3 threads
In erlang, you create processes.
In erlang, indenting is 4 spaces--not 2.
Don't put blank lines between multiple function clauses for a function definition.
If you want to see concurrency in action, then there has to be some waiting in the tasks you are running concurrently. For example:
-module(a).
-compile(export_all).
say(0) ->
io:format("Process ~p finished.~n", [self()]);
say(Times) ->
timer:sleep(rand:uniform(1000)), %%perhaps waiting to receive data from an http request
io:format("Hello ~p from process ~p~n",[Times, self()]),
say(Times-1).
loop(0) ->
spawn(a, say, [3]);
loop(Times) ->
spawn(a, say, [3]),
loop(Times-1).
In the shell:
3> c(a).
a.erl:2: Warning: export_all flag enabled - all functions will be exported
{ok,a}
4> a:loop(3).
<0.84.0>
Hello 3 from process <0.82.0>
Hello 3 from process <0.81.0>
Hello 2 from process <0.82.0>
Hello 3 from process <0.83.0>
Hello 2 from process <0.81.0>
Hello 3 from process <0.84.0>
Hello 2 from process <0.83.0>
Hello 1 from process <0.81.0>
Process <0.81.0> finished.
Hello 1 from process <0.82.0>
Process <0.82.0> finished.
Hello 2 from process <0.84.0>
Hello 1 from process <0.83.0>
Process <0.83.0> finished.
Hello 1 from process <0.84.0>
Process <0.84.0> finished.
5>
If there is no random waiting in the tasks that you are running concurrently, then the tasks will complete sequentially:
-module(a).
-compile(export_all).
say(0) ->
io:format("Process ~p finished.~n", [self()]);
say(Times) ->
%%timer:sleep(rand:uniform(1000)),
io:format("Hello ~p from process ~p~n",[Times, self()]),
say(Times-1).
loop(0) ->
spawn(a, say, [3]);
loop(Times) ->
spawn(a, say, [3]),
loop(Times-1).
In the shell:
5> c(a).
a.erl:2: Warning: export_all flag enabled - all functions will be exported
{ok,a}
6> a:loop(3).
Hello 3 from process <0.91.0>
Hello 3 from process <0.92.0>
Hello 3 from process <0.93.0>
Hello 3 from process <0.94.0>
<0.94.0>
Hello 2 from process <0.91.0>
Hello 2 from process <0.92.0>
Hello 2 from process <0.93.0>
Hello 2 from process <0.94.0>
Hello 1 from process <0.91.0>
Hello 1 from process <0.92.0>
Hello 1 from process <0.93.0>
Hello 1 from process <0.94.0>
Process <0.91.0> finished.
Process <0.92.0> finished.
Process <0.93.0> finished.
Process <0.94.0> finished.
7>
When there is no random waiting in the tasks that you are running concurrently, then concurrency provides no benefit.

Performance gain on call chaining?

Do you gain any performance, even if it's minor, by chaining function calls as shown below or is it just coding style preference?
execute() ->
step4(step3(step2(step1())).
Instead of
execute() ->
S1 = step1(),
S2 = step2(S1),
S3 = step3(S2),
step4(S3).
I was thinking whether in the 2nd version the garbage collector has some work to do for S1, S2, S3. Should that apply for the 1st version as well?
They are identical after compilation. You can confirm this by running the erl file through erlc -S and reading the generated .S file:
$ cat a.erl
-module(a).
-compile(export_all).
step1() -> ok.
step2(_) -> ok.
step3(_) -> ok.
step4(_) -> ok.
execute1() ->
step4(step3(step2(step1()))).
execute2() ->
S1 = step1(),
S2 = step2(S1),
S3 = step3(S2),
step4(S3).
$ erlc -S a.erl
$ cat a.S
{module, a}. %% version = 0
...
{function, execute1, 0, 10}.
{label,9}.
{line,[{location,"a.erl",9}]}.
{func_info,{atom,a},{atom,execute1},0}.
{label,10}.
{allocate,0,0}.
{line,[{location,"a.erl",10}]}.
{call,0,{f,2}}.
{line,[{location,"a.erl",10}]}.
{call,1,{f,4}}.
{line,[{location,"a.erl",10}]}.
{call,1,{f,6}}.
{call_last,1,{f,8},0}.
{function, execute2, 0, 12}.
{label,11}.
{line,[{location,"a.erl",12}]}.
{func_info,{atom,a},{atom,execute2},0}.
{label,12}.
{allocate,0,0}.
{line,[{location,"a.erl",13}]}.
{call,0,{f,2}}.
{line,[{location,"a.erl",14}]}.
{call,1,{f,4}}.
{line,[{location,"a.erl",15}]}.
{call,1,{f,6}}.
{call_last,1,{f,8},0}.
...
As you can see, both execute1 and execute2 result in identical code (the only thing different are line numbers and label numbers.

Extracting plain text output from binary file

I am working with Graphchi's pagerank example: https://github.com/GraphChi/graphchi-cpp/wiki/Example-Apps#pagerank-easy
The example app writes a binary file with vertex information that I would like to read/convert to a plan text file (to later call into R or some other language).
The documentation states that:
"GraphChi will write the values of the edges in a binary file, which is easy to handle in other programs. Name of the file containing vertex values is GRAPH-NAME.4B.vout. Here "4B" refers to the vertex-value being a 4-byte type (float)."
The 'easy to handle' part is what I'm struggling with - I have experience with high level languages but not C++ or dealing with binary files. I have found a few things through searching stackoverflow but no luck yet in reading this file. Ideally this would be done through bash or python.
thanks very much for your help on this.
Update: hexdump graph-name.4B.vout | head -5 gives:
0000000 999a 3e19 7468 3e7f 7d2a 3e93 d8e0 3ec4
0000010 cec6 3fe4 d551 3f08 eff2 3e54 999a 3e19
0000020 999a 3e19 3690 3e8c 0080 3f38 9ea3 3ef5
0000030 b7d6 3f66 999a 3e19 10e3 3ee1 400c 400d
0000040 a3df 3e7c 999a 3e19 979c 3e91 5230 3f18
Here is example code how you can use GraphCHi to write the output out as a string:
https://github.com/GraphChi/graphchi-cpp/wiki/Vertex-Aggregators
But the array is simple byte array. Here is example how to read it in python:
import struct
from array import array as binarray
import sys
inputfile = sys.argv[1]
data = open(inputfile).read()
a = binarray('c')
a.fromstring(data)
s = struct.Struct("f")
l = len(a)
print "%d bytes" %l
n = l / 4
for i in xrange(0, n):
x = s.unpack_from(a, i * 4)[0]
print ("%d %f" % (i, x))
I was having the same trouble. Luckily I work with a bunch of network engineers who helped me out! On Mac Linux, the following command works to print the 4B.vout data one line per node, with the integer values the same as is given in the summary file. If your file is called eg, filename.4B.vout, then some command line perl gets you:
cat filename.4B.vout | LANG= perl -0777 -e '$,=\"\n\"; print unpack(\"L*\",<>),\"\";'
Edited to add: this is for the assignments of connected component ID and community ID, written implicitly the 1st line is the ID of the node labeled 0, the 2nd line is the node labeled 1 etc. But I am copypasting here so I'm not sure how it would need to change for floats. It works great for the integer values per node.

'Segmentation Fault' while debugging expect script

I am debugging an expect program with the traditional debugging way by passing the -D 1 flag for the following script.
#!/usr/bin/expect
proc p3 {} {
set m 0
}
proc p2 {} {
set c 4
p3
set d 5
}
proc p1 {} {
set a 2
p2
set a 5
}
p1
With the debugger command w, I am trying to see the stack frame and got the following error.
dinesh#mypc:~/pgms/expect$ expect -D 1 stack.exp
1: proc p3 {} {
set m 0
}
dbg1.0> n
1: proc p2 {} {
set c 4
p3
set d 5
}
dbg1.1>
1: proc p1 {} {
set a 2
p2
set a 5
}
dbg1.2>
1: p1
dbg1.3> s
2: set a 2
dbg2.4>
2: p2
dbg2.5>
3: set c 4
dbg3.6> w
0: expect {-D} {1} {stack.exp}
Segmentation fault (core dumped)
dinesh#mypc:~/pgms/expect$
I am having the expect version 5.45.
Is there anything wrong in my way of command execution ?
In order to achieve the debugging trace, Expect pokes its fingers inside the implementation of Tcl. In particular, it has copies of the definitions of some of the internal structures used inside Tcl (e.g., the definition of the implementation of procedures and of stack frames). However, these structures change from time to time; we don't announce such internal implementation changes, as they shouldn't have any bearing on any other code, but that's obviously not the case.
Overall, this is a bug in Expect (and it might be that the fix is for a new C API function to be added to Tcl). In order to see about fixing this, we need to know not just the exact version of Expect but also the exact version of Tcl (use info patchlevel to get this).

Running R Code from Command Line (Windows)

I have some R code inside a file called analyse.r. I would like to be able to, from the command line (CMD), run the code in that file without having to pass through the R terminal and I would also like to be able to pass parameters and use those parameters in my code, something like the following pseudocode:
C:\>(execute r script) analyse.r C:\file.txt
and this would execute the script and pass "C:\file.txt" as a parameter to the script and then it could use it to do some further processing on it.
How do I accomplish this?
You want Rscript.exe.
You can control the output from within the script -- see sink() and its documentation.
You can access command-arguments via commandArgs().
You can control command-line arguments more finely via the getopt and optparse packages.
If everything else fails, consider reading the manuals or contributed documentation
Identify where R is install. For window 7 the path could be
1.C:\Program Files\R\R-3.2.2\bin\x64>
2.Call the R code
3.C:\Program Files\R\R-3.2.2\bin\x64>\Rscript Rcode.r
There are two ways to run a R script from command line (windows or linux shell.)
1) R CMD way
R CMD BATCH followed by R script name. The output from this can also be piped to other files as needed.
This way however is a bit old and using Rscript is getting more popular.
2) Rscript way
(This is supported in all platforms. The following example however is tested only for Linux)
This example involves passing path of csv file, the function name and the attribute(row or column) index of the csv file on which this function should work.
Contents of test.csv file
x1,x2
1,2
3,4
5,6
7,8
Compose an R file “a.R” whose contents are
#!/usr/bin/env Rscript
cols <- function(y){
cat("This function will print sum of the column whose index is passed from commandline\n")
cat("processing...column sums\n")
su<-sum(data[,y])
cat(su)
cat("\n")
}
rows <- function(y){
cat("This function will print sum of the row whose index is passed from commandline\n")
cat("processing...row sums\n")
su<-sum(data[y,])
cat(su)
cat("\n")
}
#calling a function based on its name from commandline … y is the row or column index
FUN <- function(run_func,y){
switch(run_func,
rows=rows(as.numeric(y)),
cols=cols(as.numeric(y)),
stop("Enter something that switches me!")
)
}
args <- commandArgs(TRUE)
cat("you passed the following at the command line\n")
cat(args);cat("\n")
filename<-args[1]
func_name<-args[2]
attr_index<-args[3]
data<-read.csv(filename,header=T)
cat("Matrix is:\n")
print(data)
cat("Dimensions of the matrix are\n")
cat(dim(data))
cat("\n")
FUN(func_name,attr_index)
Runing the following on the linux shell
Rscript a.R /home/impadmin/test.csv cols 1
gives
you passed the following at the command line
/home/impadmin/test.csv cols 1
Matrix is:
x1 x2
1 1 2
2 3 4
3 5 6
4 7 8
Dimensions of the matrix are
4 2
This function will print sum of the column whose index is passed from commandline
processing...column sums
16
Runing the following on the linux shell
Rscript a.R /home/impadmin/test.csv rows 2
gives
you passed the following at the command line
/home/impadmin/test.csv rows 2
Matrix is:
x1 x2
1 1 2
2 3 4
3 5 6
4 7 8
Dimensions of the matrix are
4 2
This function will print sum of the row whose index is passed from commandline
processing...row sums
7
We can also make the R script executable as follows (on linux)
chmod a+x a.R
and run the second example again as
./a.R /home/impadmin/test.csv rows 2
This should also work for windows command prompt..
save the following in a text file
f1 <- function(x,y){
print (x)
print (y)
}
args = commandArgs(trailingOnly=TRUE)
f1(args[1], args[2])
No run the following command in windows cmd
Rscript.exe path_to_file "hello" "world"
This will print the following
[1] "hello"
[1] "world"

Resources