What useful GDB scripts have you used/written? - debugging

People use gdb on and off for debugging,
of course there are lots of other debugging tools
across the varied OSes, with and without GUI and,
maybe other fancy IDE features.
I would like to know what useful gdb scripts you have written and liked.
While, I do not mean a dump of commands in a something.gdb file that you source to pull out a bunch of data, if that made your day, go ahead and talk about it.
Lets think conditional processing, control loops and functions written for more elegant and refined programming to debug and, maybe even for whitebox testing
Things get interesting when you start debugging remote systems (say, over a serial/ethernet interface)
And, what if the target is a multi-processor (and, multithreaded) system
Let me put a simple case as an example...
Say,
A script that traversed serially over entries
to locate a bad entry in a large hash-table
that is implemented on an embedded platform.
That helped me debug a broken hash-table once.

This script, not written by me, pretty prints STL containers, such as vector, map, etc: http://www.yolinux.com/TUTORIALS/src/dbinit_stl_views-1.03.txt
Totally awesome.

When debugging an AOLserver SIGSEGV crash, I used the following script to examine the TCL-level call stack from GDB:
define tcl_stack_dump
set $interp = *(Interp*)interp
set $frame = $interp->framePtr
while (0 != (CallFrame *)$frame->callerPtr != 0)
set $i = 0
if 0 != $frame->objv
while ($i < $frame->objc)
if (0 != $frame->objv[$i] && 0 != $frame->objv[$i]->bytes)
printf " %s", (char *)(CallFrame *)$frame->objv[$i]->bytes
end
set $i = $i + 1
end
printf "\n"
end
set $frame = (CallFrame *)$frame->callerPtr
end
end
document tcl_stack_dump
Print a list of TCL procs and arguments currently being called in an
interpreter. Most TCL C API functions beginning with Tcl[^_] will have a
Tcl_Interp parameter. Assumes the `interp' local C variable holds a
Tcl_Interp which is represented internally as an Interp struct.
See:
ptype Interp
ptype CallFrame
ptype Proc
ptype Command
ptype Namespace
ptype Tcl_Obj
end

1. When trying to get some 3rd party closed-source DLLs working with our project under Mono, it was giving meaningless errors. Consequently, I resorted to the scripts from the Mono project.
2. I also had a project that could dump it's own information to stdout for use in GDB, so at a breakpoint, I could run the function, then cut-n-paste its output into GDB.
[Edit]
3. Most of my GCC/G++ use has been a while, but I also recall using a macro to take advantage of the fact that GDB knew the members of some opaque data I had (the library was compiled with debug). That was enormously helpful.
4. And I just found this, too. It dumps a list of objects (from a global "headMeterFix" SLL) that contain, among other things, dynamic arrays of another object type. One of the few times I've used nested loops in a macro:
define showFixes
set $i= headMeterFix
set $n = 0
while ($i != 0)
set $p = $i->resolved_list
set $x = $i->resolved_cnt
set $j = 0
printf "%08x [%d] = {", $i, $x
printf "%3d [%3d] %08x->%08x (D/R): %3d/%-3d - %3d/%-3d {", $n, $i, $x, $i->fix, $i->depend_cnt, dynArySizeDepList($i->depend_list), $i->resolved_cnt, dynArySizeDepList($i->resolved_list)
while ($j < $x)
printf " %08x", $p[$j]
set $j=$j+1
end
printf " }\n"
set $i = $i->next
set $n = $n+1
end
end

Related

Tcl syntax explanation

Tcl code:
for {local i 0 } { $i < $bsLen } { incr i } {
local topb [bs rhex $bsStream 1]
local botb [bs rhex $bsStream 1]
local hexStr [strcat $hexStr $topb $botb ]
}
What are some documents that can help to explain the above syntax?
Only for and incr are standard Tcl commands in that sample. If you know C or Java or C#, you'll probably be able to guess what those do without too much difficulty; the syntax is a little different but not very.
The other commands are these, but I know not who defines them:
local — appears to be setting a local variable. What's wrong with using set? I don't know…
bs — specifically bs rhex, and it appears to be getting a value (in hex?) from a stream (named in the bsStream variable). This one is totally guesswork.
strcat — I'd guess this is doing string concatenation of its arguments, as it has the name of a standard C function that does that (doing anything else would be weird and designed to trip its own programmer up).
That last line would be more conventionally written:
append hexStr $topb $botb
as the append command is optimised (in its memory management, which tends to dominate these sorts of things) for the building-a-string-piecemeal case. In particular, it doesn't demonstrate quadratic Shlemiel the painter misbehaviour.
There is nothing like the local standard keyword in Tcl.
for {set i 0 } { $i < $bsLen } { incr i } {
set topb [bs rhex $bsStream 1]
set botb [bs rhex $bsStream 1]
sethexStr [strcat $hexStr $topb $botb ]
}
for {initialize} {condition_check} {inclriment} {body to execute}

Automated GOTO removal algorithm

I've heard that it's been proven theoretically possible to express any control flow in a Turing-complete language using only structured programming constructs, (conditionals, loops and loop-breaks, and subroutine calls,) without any arbitrary GOTO statements. Is there any way to use that theory to automate refactoring of code that contains GOTOs into code that does not?
Let's say I have an arbitrary single subroutine in a simple imperative language, such as C or Pascal. I also have a parser that can verify that this subroutine is valid, and produce an Abstract Syntax Tree from it. But the code contains GOTOs and Labels, which could jump forwards or backwards to any arbitrary point, including into or out of conditional or loop blocks, but not outside of the subroutine itself.
Is there an algorithm that could take this AST and rework it into new code which is semantically identical, but does not contain any Labels or GOTO statements?
In principle, it is always possible to do this, though the results might not be pretty.
One way to always eliminate gotos is to transform the program in the following way. Start off by numbering all the instructions in the original program. For example, given this program:
start:
while (true) {
if (x < 5) goto start;
x++
}
You could number the statements like this:
0 start:
1 while (x < 3) {
2 if (x < 5) goto start;
3 x++
}
To eliminate all gotos, you can simulate the flow of the control through this function by using a while loop, an explicit variable holding the program counter, and a bunch of if statements. For example, you might translate the above code like this:
int PC = 0;
while (PC <= 3) {
if (PC == 0) {
PC = 1; // Label has no effect
} else if (PC == 1) {
if (x < 3) PC = 4; // Skip loop, which ends this function.
else PC = 2; // Enter loop.
} else if (PC == 2) {
if (x < 5) PC = 0; // Simulate goto
else PC = 3; // Simulate if-statement fall-through
} else if (PC == 3) {
x++;
PC = 1; // Simulate jump back up to the top of the loop.
}
}
This is a really, really bad way to do the translation, but it shows that in theory it is always possible to do this. Actually implementing this would be very messy - you'd probably number the basic blocks of the function, then generate code that puts the basic blocks into a loop, tracks which basic block is currently executing, then simulates the effect of running a basic block and the transition from that basic block to the appropriate next basic block.
Hope this helps!
I think you want to read Taming Control Flow by Erosa and Hendren, 1994. (Earlier link on Google scholar).
By the way, loop-breaks are also easy to eliminate. There is a simple mechanical procedure involving the creating of a boolean state variable and the restructuring of nested conditionals to create straight-line control flow. It does not produce pretty code :)
If your target language has tail-call optimization (and, ideally, inlining), you can mechanically remove both break and continue by turning the loop into a tail-recursive function. (If the index variable is modified by the loop body, you need to work harder at this. I'll just show the simplest case.) Here's the transformation of a simple loop:
for (Type Index = Start; function loop(Index: Type):
Condition(Index); if (Condition)
Index = Advance(Index)){ return // break
Body Body
} return loop(Advance(Index)) // continue
loop(Start)
The return statements labeled "continue" and "break" are precisely the transformation of continue and break. Indeed, the first step in the procedure might have been to rewrite the loop into its equivalent form in the original language:
{
Type Index = Start;
while (true) {
if (!Condition(Index))
break;
Body;
continue;
}
}
I use either/both Polyhedron's spag and vast's 77to90 to begin the process of refactoring fortran and then converting it to matlab source. However, these tools always leave 1/4 to 1/2 of the goto's in the program.
I wrote up a goto remover which accomplishes something similar to what you were describing: it takes fortran code and refactors all the remaining goto's from a program and replacing them with conditionals and do/cycle/exit's which can then be converted into other languages like matlab. You can read more about the process I use here:
http://engineering.dartmouth.edu/~d30574x/consulting/consulting_gotorefactor.html
This program could be adapted to work with other languages, but I have not gotten than far yet.

Return a value via a gdb user-defined command

I'm debugging with a core-file, so I have no active process in which to run anything.
I'm using gdb user-defined commands to inspect a bunch of data from the core file, and attempting to simplify the process using user-defined commands.
However, I cannot find a way to make the user-defined commands return values which could be used in other commands.
For example:
(note the comment on the "return" line)
define dump_linked_list
set $node = global_list->head
set $count = 1
while $node != 0
printf "%p -->", $node
set $node = $node->next
set $count = $count + 1
end
return $count ## GDB doesn't understand this return
end
Ideally, my dump_linked_list command would return the number of nodes found in the list, so that it could be used in another defined command:
define higher_function
set $total_nodes = dump_linked_list
printf "Total Nodes is %d\n", $total_nodes
end
Is such a thing possible in gdb commands?
I feel it must be, but I've been searching documentation and cannot find a mention of it, or any examples.
I found out gdb seems to pass by name which can be used to pass back a return value. A little more flexible that just using a single global variable.
(gdb) define foo
Type commands for definition of "foo".
End with a line saying just "end".
>set $arg0 = 1
>end
(gdb) set $retval = 0
(gdb) p $retval
$3 = 0
(gdb) foo $retval
(gdb) p $retval
$4 = 1
As far as I know GDB does not have such a functionality. You can set a variable of some name that you know and use it as a "return" value. For example always set the variable retval like this:
set $retval = <whatever value>
Then all your newly defined functions can use it as a return value from previously called functions. I know this is only workaround, but it is relatively simple and it works.

Does awk support dynamic user-defined variables?

awk supports this:
awk '{print $(NF-1);}'
but not for user-defined variables:
awk '{a=123; b="a"; print $($b);}'
by the way, shell supports this:
a=123;
b="a";
eval echo \${$b};
How can I achieve my purpose in awk?
OK, since some of us like to eat spaghetti through their nose, here is some actual code that I wrote in the past :-)
First of all, getting a self modifying code in a language that does not support it will be extremely non-trivial.
The idea to allow dynamic variables, function names, in a language that does not support one is very simple. At some state in the program, you want a dynamic anything to self modify your code, and resume execution
from where you left off. a eval(), that is.
This is all very trivial, if the language supports eval() and such equlavant. However, awk does not have such function. Therefore, you, the programmer has to provide a interface to such thing.
To allow all this to happen, you have three main problems
How to get our self so we can modify it
How to load the modified code, and resume from where we left off
Finding a way for the interpreter to accept our modified code
How to get our self so we can modify it
Here is a example code, suitable for direct execution.
This one is the infastrucure that I inject for enviroments running gawk, as it requires PROCINFO
echo ""| awk '
function push(d){stack[stack[0]+=1]=d;}
function pop(){if(stack[0])return stack[stack[0]--];return "";}
function dbg_printarray(ary , x , s,e, this , i ){
x=(x=="")?"A":x;for(i=((s)?s:1);i<=((e)?e:ary[0]);i++){print x"["i"]=["ary[i]"]"}}
function dbg_argv(A ,this,p){
A[0]=0;p="/proc/"PROCINFO["pid"]"/cmdline";push(RS);RS=sprintf("%c",0);
while((getline v <p)>0)A[A[0]+=1]=v;RS=pop();close(p);}
{
print "foo";
dbg_argv(A);
dbg_printarray(A);
print "bar";
}'
Result:
foo
A[1]=[awk]
A[2]=[
function push(d){stack[stack[0]+=1]=d;}
function pop(){if(stack[0])return stack[stack[0]--];return "";}
function dbg_printarray(ary , x , s,e, this , i ){
x=(x=="")?"A":x;for(i=((s)?s:1);i<=((e)?e:ary[0]);i++){print x"["i"]=["ary[i]"]"}}
function dbg_argv(A ,this,p){
A[0]=0;p="/proc/"PROCINFO["pid"]"/cmdline";push(RS);RS=sprintf("%c",0);
while((getline v <p)>0)A[A[0]+=1]=v;RS=pop();close(p);}
{
print "foo";
dbg_argv(A);
dbg_printarray(A);
print "bar";
}]
bar
As you can see, as long as the OS does not play with our args, and /proc/ is available, it is possible
to read our self. This may appear useless at first, but we need it for push/pop of our stack,
so that our execution state can be enbedded within the code, so we can save/resume and survive OS shutdown/reboots
I have left out the OS detection function and the bootloader (written in awk), because, if I publish that,
kids can build platform independent polynormal code, and it is easy to cause havoc with it.
how to load the modified code, and resume from where we left off
Now, normaly you have push() and pop() for registers, so you can save your state and play with
your self, and resume from where you left off. a Call and reading your stack is a typical way to get the
memory address.
Unfortunetly, in awk, under normal situations we can not use pointers (with out a lot of dirty work),
or registers (unless you can inject other stuff along the way).
However you need a way to suspend and resume from your code.
The idea is simple. Instead of letting awk in control of your loops and while, if else conditions,
recrusion depth, and functions you are in, the code should.
Keep a stack, list of variable names, list of function names, and manage it your self.
Just make sure that your code always calls self_modify( bool ) constantly, so that even upon sudden failure,
As soon as the script is re-run, we can enter self_modify( bool ) and resume our state.
When you want to self modify your code, you must provide a custom made
write_stack() and read_stack() code, that writes out the state of stack as string, and reads string from
the values out from the code embedded string itself, and resume the execution state.
Here is a small piece of code that demonstrates the whole flow
echo ""| awk '
function push(d){stack[stack[0]+=1]=d;}
function pop(){if(stack[0])return stack[stack[0]--];return "";}
function dbg_printarray(ary , x , s,e, this , i ){
x=(x=="")?"A":x;for(i=((s)?s:1);i<=((e)?e:ary[0]);i++){print x"["i"]=["ary[i]"]"}}
function _(s){return s}
function dbg_argv(A ,this,p){
A[0]=0;p="/proc/"PROCINFO["pid"]"/cmdline";push(RS);RS=sprintf("%c",0);
while((getline v <p)>0)A[A[0]+=1]=v;RS=pop();close(p);}
{
_(BEGIN_MODIFY"|");print "#foo";_("|"END_MODIFY)
dbg_argv(A);
sub( \
"BEGIN_MODIFY\x22\x5c\x7c[^\x5c\x7c]*\x5c\x7c\x22""END_MODIFY", \
"BEGIN_MODIFY\x22\x7c\x22);print \"#"PROCINFO["pid"]"\";_(\x22\x7c\x22""END_MODIFY" \
,A[2])
print "echo \x22\x22\x7c awk \x27"A[2]"";
print "function bar_"PROCINFO["pid"]"_(s){print \x22""doe\x22}";
print "\x27"
}'
Result:
Exactly same as our original code, except
_(BEGIN_MODIFY"|");print "65964";_("|"ND_MODIFY)
and
function bar_56228_(s){print "doe"}
at the end of code
Now, this may seem useless, as we are only replaceing code print "foo"; with our pid.
But it becomes usefull, when there are multiple _() with separate MAGIC strings to identify BLOCKS,
and a custome made multi line string replacement routine instead of sub()
You msut provide BLOCKS for stack, function list, execution point, as a bare minimum.
And notice that the last line contains bar
This it self is just a sting, but when this code repeatedly gets executed, notice that
function bar_56228_(s){print "doe"}
function bar_88128_(s){print "doe"}
...
and it keeps growing. While the example is intentionally made so that it does nothing useful,
if we provide a routine to call bar_pid_(s) instead of that print "foo" code,
Sudenly it means we have eval() on our hands :-)
Now, isn't eval() usefull :-)
Don't forget to provide a custome made remove_block() function so that the code maintains
a reasonable size, instead of growing every time you execute.
Finding a way for the interpreter to accept our modified code
Normally calling a binary is trivial. However, when doing so from with in awk, it becomes difficult.
You may say system() is the way.
There are two problems to that.
system() may not work on some envoroments
it blocks while you are executing code, trus you can not perform recrusive calls and keep the user happy at the same time.
If you must use system(), ensure that it does not block.
A normal call to system("sleep 20 && echo from-sh & ") will not work.
The solution is simple,
echo ""|awk '{print "foo";E="echo ep ; sleep 20 && echo foo & disown ; "; E | getline v;close(E);print "bar";}'
Now you have a async system() call that does not block :-)
Not at the moment. However, if you provide a wrapper, it is (somewhat hacky and dirty) possible.
The idea is to use # operator, introduced in the recent versions of gawk.
This # operator is normally used to call a function by name.
So if you had
function foo(s){print "Called foo "s}
function bar(s){print "Called bar "s}
{
var = "";
if(today_i_feel_like_calling_foo){
var = "foo";
}else{
var = "bar";
}
#var( "arg" ); # This calls function foo(), or function bar() with "arg"
}
Now, this is usefull on it's own.
Assuming we know var names beforehand, we can write a wrapper to indirectly modify and obtain vars
function get(varname, this, call){call="get_"varname;return #call();}
function set(varname, arg, this, call){call="set_"varname; #call(arg);}
So now, for each var name you want to prrvide access by name, you declare these two functions
function get_my_var(){return my_var;}
function set_my_var(arg){my_var = arg;}
And prahaps, somewhere in your BEGIN{} block,
BEGIN{ my_var = ""; }
To declare it for global access.
Then you can use
get("my_var");
set("my_var", "whatever");
This may appear useless at first, however there are perfectly good use cases, such as
keeping a linked list of vars, by holding the var's name in another var's array, and such.
It works for arrays too, and to be honest, I use this for nesting and linking Arrays within
Arrays, so I can walk through multiple Arrays like using pointers.
You can also write configure scripts that refer to var names inside awk this way,
in effect having a interpreter-inside-a-interpreter type of things, too...
Not the best way to do things, however, it gets the job done, and I do not have to worry about
null pointer exceptions, or GC and such :-)
The $ notation is not a mark for variables, as in shell, PHP, Perl etc. It is rather an operator, which receives an integer value n and returns the n-th column from the input. So, what you did in the first example is not the setting/getting of a variable dynamically but rather a call to an operator/function.
As stated by commenters, you can archive the behavior you are looking for with arrays:
awk '{a=123; b="a"; v[b] = a; print v[b];}'
I had a similar problem to solve, to load the settings from a '.ini' file and I've used arrays to set the variables dynamically.
It works with Awk or Gawk, Linux or Windows (GnuWin32)
gawk -v Settings_File="my_settings_file.ini" -f awk_script.awk <processing_file>
[my_settings_file.ini]
#comment
first_var=foo
second_var=bar
[awk_script.awk]
BEGIN{
FS="=";
while((getline < Settings_File)>0) {
if($0 !~ /^[#;]|^(\s*)$/) {
var_array[$1] = $2;
}
}
print var_array["first_var"];
print var_array["second_var"];
if (var_array["second_var"] == "bar") {
print "works!";
}
}
{
#more processing
}
END {
#finish processing
}

Calculating IDs for model runs

I'm running some array jobs on a PBS system (although hopefully no knowledge of PBS systems is needed to answer my question!). I've got 24 runs, but I want to split them up into 5 sub-jobs each, so I need to run my script 120 times.
After giving the PBS option of -t 1-120, I can get the current job-array ID using $PBS_ARRAYID. However, I want to create some output files. It would be best if these output files used the ID that it would have had if there were only 24 runs, together with a sub-run identifier (e.g. output-1a.txt, output-1b.txt ... output-1e.txt, output-2a.txt).
What I therefore need is a way of calculating a way to get the ID (in the range 1-24) together with the sub-run identifier (presumably in a set of if-statements), which can be used in a shell-script. Unfortunately, neither my maths nor my Unix knowledge is quite good enough to figure this out. I assume that I'll need something to do with the quotient/remainder based on the current $PBS_ARRAYID relative to either 120 or 24, but that's as far as I've got...
You just need a little modular division. A quick simulation of this in Ruby would be:
p = Array.new;
(1..120).each {|i| p[i] = "Run #{1+(i/5)}-#{((i%5)+96).chr}" }
What this says is simply that the run should start at 1 and increment after each new section of five, and that the trailing sub-run should be the ascii character represented by 96 plus the position of the sub-run (eg, 97 == 'a').
Here it is in Bash:
#!/bin/bash
chr() {
local tmp
[ ${1} -lt 256 ] || return 1
printf -v tmp '%03o' "$1"
printf \\"$tmp"
}
for ((i = 0; i < ${#ARP[*]}; i++))
do
charcode=$((($i % 5)+97))
charachter=$(chr "$charcode")
echo "Filename: output-$((($i/5)+1))$charachter"
done
I just used ARP as the name of the array, but you can obviously substitute that. Good luck!

Resources