Recommendations for "Dynamic/interactive" debugging of functions in R? - debugging

When debugging a function I usually use
library(debug)
mtrace(FunctionName)
FunctionName(...)
And that works quite well for me.
However, sometimes I am trying to debug a complex function that I don't know. In which case, I can find that inside that function there is another function that I would like to "go into" ("debug") - so to better understand how the entire process works.
So one way of doing it would be to do:
library(debug)
mtrace(FunctionName)
FunctionName(...)
# when finding a function I want to debug inside the function, run again:
mtrace(FunctionName.SubFunction)
The question is - is there a better/smarter way to do interactive debugging (as I have described) that I might be missing?
p.s: I am aware that there where various questions asked on the subject on SO (see here). Yet I wasn't able to come across a similar question/solution to what I asked here.

Not entirely sure about the use case, but when you encounter a problem, you can call the function traceback(). That will show the path of your function call through the stack until it hit its problem. You could, if you were inclined to work your way down from the top, call debug on each of the functions given in the list before making your function call. Then you would be walking through the entire process from the beginning.
Here's an example of how you could do this in a more systematic way, by creating a function to step through it:
walk.through <- function() {
tb <- unlist(.Traceback)
if(is.null(tb)) stop("no traceback to use for debugging")
assign("debug.fun.list", matrix(unlist(strsplit(tb, "\\(")), nrow=2)[1,], envir=.GlobalEnv)
lapply(debug.fun.list, function(x) debug(get(x)))
print(paste("Now debugging functions:", paste(debug.fun.list, collapse=",")))
}
unwalk.through <- function() {
lapply(debug.fun.list, function(x) undebug(get(as.character(x))))
print(paste("Now undebugging functions:", paste(debug.fun.list, collapse=",")))
rm(list="debug.fun.list", envir=.GlobalEnv)
}
Here's a dummy example of using it:
foo <- function(x) { print(1); bar(2) }
bar <- function(x) { x + a.variable.which.does.not.exist }
foo(2)
# now step through the functions
walk.through()
foo(2)
# undebug those functions again...
unwalk.through()
foo(2)
IMO, that doesn't seem like the most sensible thing to do. It makes more sense to simply go into the function where the problem occurs (i.e. at the lowest level) and work your way backwards.
I've already outlined the logic behind this basic routine in "favorite debugging trick".

I like options(error=recover) as detailed previously on SO. Things then stop at the point of error and one can inspect.

(I'm the author of the 'debug' package where 'mtrace' lives)
If the definition of 'SubFunction' lives outside 'MyFunction', then you can just mtrace 'SubFunction' and don't need to mtrace 'MyFunction'. And functions run faster if they're not 'mtrace'd, so it's good to mtrace only as little as you need to. (But you probably know those things already!)
If 'MyFunction' is only defined inside 'SubFunction', one trick that might help is to use a conditional breakpoint in 'MyFunction'. You'll need to 'mtrace( MyFunction)', then run it, and when the debugging window appears, find out what line 'MyFunction' is defined in. Say it's line 17. Then the following should work:
D(n)> bp( 1, F) # don't bother showing the window for MyFunction again
D(n)> bp( 18, { mtrace( SubFunction); FALSE})
D(n)> go()
It should be clear what this does (or it will be if you try it).
The only downsides are: the need to do it again whenever you change the code of 'MyFunction', and; the slowing-down that might occur through 'MyFunction' itself being mtraced.
You could also experiment with adding a 'debug.sub' argument to 'MyFunction', that defaults to FALSE. In the code of 'MyFunction', then add this line immediately after the definition of 'SubFunction':
if( debug.sub) mtrace( SubFunction)
That avoids any need to mtrace 'MyFunction' itself, but does require you to be able to change its code.

Related

ReasonML, side effecting call on x if option is Some(x)

I have let intervalId = option(Js.Global.intervalId)
I would like a succinct way to do the side effecting call to Js.Global.clearInterval if the option has a value (ie. is Some(id) and not None)
Maybe the Belt.Option.map function is the answer, but I'm having problems putting it to use.
I'm very new to OCaml and ReasonML, but several languages I know have suitable functions. I'm paraphrasing some of them here to give the idea what I'm looking for:
In Scala I would say: intervalId.foreach(Js.Global.clearInterval)
In Swift I would say: intervalId.map(Js.Global.clearInterval)
Belt.Option.map(intervalId, Js.Global.clearInterval) should work fine, except it returns a value that you need to discard in some way to avoid a type error or warning.
The safest way to discard values you don't need is to assign it to the wildcard pattern and include a type annotation to ensure the value you discard is what you expect it to be:
let _: option(unit) = Belt.Option.map(intervalId, Js.Global.clearInterval)
You can also use the ignore function, which works particularly well at the end of a sequence of pipes, but beware that you might then accidentally partially apply a function and discard it without actually executing the function and invoking the side-effect.
intervalId
|> Belt.Option.map(_, Js.Global.clearInterval)
|> ignore
For curiosity, I'll leave the obvious thing I had here before the chosen answer:
switch (intervalId) {
|Some(id) => Js_global.clearInterval(id)
|None => ()
}

How to implement the behaviour of -time-passes in my own Jitter?

I am working on a Jitter which is based on LLVM. I have a real issue with performance. I was reading a lot about this and I know it is a problem in LLVM. However, I am wondering if there are other bottlenecks. Hence, I want to use in my Jitter the same mechanism offers by -time-passes, but saving the result to a specific file. In this way, I can do some simple math like:
real_execution_time = total_time - time_passes
I added the option to the command line, but it does not work:
// Disable branch fold for accurate line numbers.
llvm_argv[arrayIndex++] = "-disable-branch-fold";
llvm_argv[arrayIndex++] = "-stats";
llvm_argv[arrayIndex++] = "-time-passes";
llvm_argv[arrayIndex++] = "-info-output-file";
llvm_argv[arrayIndex++] = "pepe.txt";
cl::ParseCommandLineOptions(arrayIndex, const_cast<char**>(llvm_argv));
Any solution?
Ok, I found the solution. I am publishing the solution because It may be useful for someone else.
Before any exit(code) in your program you must include a call to
llvm::llvm_shutdown();
This call flush the information to the file.
My problem was:
1 - Other threads emitted exit without the mentioned call.
2 - There is a fancy struct llvm::llvm_shutdown_obj with a destructor which call to the mentioned method. I had declared a variable in the main function as follow:
llvm::llvm_shutdown_obj X();
Everybody know that the compiler should call the destructor, but in this case it was no happening. The reason is that the variable was not used, so the compiler removed it.
No variable => No destructor => No flush to the file

Regarding Functional Programming Theory

Is there a consensus of preference between these two programming approaches? Could you please explain to me why, on pros`cons scale, for your chosen paradigm.
(i) A program has three functions that needs to be enacted on some input. It runs the first, gets a returned variable, runs the second with that variable and then does the same for the third. Finally printing the third's returned variable.
func1(){ return f1 }
func2(){ return f2 }
func3(){ return f3 }
main(){
fin=# of inputs
i=0
while i<fin
first=func1(in[i])
sec=func2(first)
third=func3(sec)
print(third)
i++
}
(ii) A program steps through a series of instructions, initially pushing the first domino from the main function.
func1(){ func2(newfrom1) }
func2(){ func3(newfrom2) }
func3(){ print(newfrom3) }
main(){
fin=# of inputs
i=0
while i<fin
func1(in[i])
i++
}
The only difference I see is that version 2 uses variables to store intermediate results.
So from a performance point of view, there should not be any difference, since a compiler would store these intermediate results in both versions in registers. But this can be checked by profiling.
But to me version 1 is more readable, and thus better.
The first approach is more reusable - what if you want to do whatever it is that func1 does to something else later on, but you don't then want to do func2 and func3 on it? If func1 was written to call those for the first scenario then you have to go and change everything.
My preference is to try to identify 'operations' that make sense for a single function to do, write a function to do that, then for more complex things write another function which calls several of the smaller ones to achieve its ends. One then often finds some of those smaller functions find use elsewhere at a later date.
Yes this leaves me with more function calls, and possibly more temporary storage being used, but I let the compiler worry about that - if it proves to be a performance issue I'll deal with it then. Usually performance is hurt by other things though.

emacs debugger: how can I step-out, step-over?

I don't know why I'm having so much trouble groking the documentation for the elisp debugger.
I see it has a commands to "step-into" (d). But for the life of me, I cannot see a step-out or step-over.
Can anyone help?
If I have this in the Backtrace buffer:
Debugger entered--returning value: 5047
line-beginning-position()
* c-parse-state()
* byte-code("...")
* c-guess-basic-syntax()
c-show-syntactic-information(nil)
call-interactively(c-show-syntactic-information)
...where do I put the cursor, and what key do I type, to step out of the parse-state() fn ? by that I mean, run until that fn returns, and then stop in the debugger again.
When debugging, I press ? and I see:
o edebug-step-out
f edebug-forward-sexp
h edebug-goto-here
I believe o (it is step-out) and f (like step over) are what you're looking for, though I also find h extremely useful.
'c' and 'j' work kind of like a step-out and step-over. When a flagged frame (indicated by "*") is encountered (the docs say "exited" but this doesn't seem to be how the debugger behaves), the debugger will be re-entered. When the top frame is flagged, they work like step-over; when it isn't, they work like step-out.
In your example backtrace, typing either will step out of line-beginning-position into c-parse-state. The frame flag should clear, so typing either a second time should step out of c-parse-state.
Hm. I, for one, prefer debug to edebug, but to each her own...
As to debug, I use d, c, e, and q.
If you do use debug, one thing to keep in mind, which can save time and effort, is that when you see a macro call (starts with#) you can just hit c to expand the macro -- there is normally no sense in digging into the macro expansion code (unless you wrote the macro and you are trying to debug it).
In particular, for dolist, there are two levels of macroexpansion to skip over using c: one for dolist and one for block.
HTH.

Debugging generic functions in R

How do you debug a generic function (using debug, or mtrace in the debug package)?
As an example, I want to debug cenreg in the NADA package, specifically the method that takes a formula input.
You can retrieve the method details like this:
library(NADA)
getMethod("cenreg", c("formula", "missing", "missing"))
function (obs, censored, groups, ...)
{
.local <- function (obs, censored, groups, dist, conf.int = 0.95,
...)
{
dist = ifelse(missing(dist), "lognormal", dist)
...
}
The problem is that cenreg itself looks like this:
body(cenreg)
# standardGeneric("cenreg")
I don't know how to step through the underlying method, rather than the generic wrapper.
My first two suggestions are pretty basic: (1) wrap your function call in a try() (that frequently provides more information with S4 classes) and (2) call traceback() after the error is thrown (that can sometimes give hints to where the problem is really occuring).
Calling debug() won't help in this scenario, so you need to use trace or browser. From the debug help page:
"In order to debug S4 methods (see Methods), you need to use trace, typically
calling browser, e.g., as "
trace("plot", browser, exit=browser, signature = c("track", "missing"))
S4 classes can be hard to work with; one example of this is the comment in the debug package documentation (regarding the usage of mtrace() with S4 classes):
"I have no plans to write S4 methods, and hope not to have to
debug other people’s!"
A similar question was asked recently on R-Help. The recommendation from Duncan Murdoch:
"You can insert a call to browser() if you want to modify the source. If
you'd rather not do that, you can use trace() to set a breakpoint in it.
The new setBreakpoint() function in R 2.10.0 will also work, if you
install the package from source with the R_KEEP_PKG_SOURCE=yes
environment variable set. It allows you to set a breakpoint at a
particular line number in the source code."
I've never done this before myself (and it requires R 2.10.0), but you might try installing from source with R_KEEP_PKG_SOURCE=yes.
Incidentally, you can use the CRAN mirror of NADA in github to browse the source.
For a long time this was a standard annoyance point for S4 method debugging. As pointed out by Charles Plessy, I worked with Michael Lawrence to add a number of features to R that are intended to make this easier.
debug, debugonce, undebug and isdebugged all now take a signature argument suitable for specifying s4 methods. Furthermore, debugging S4 methods this way bypasses the weird implementation detail that you previously had to deal with by hand by browsering into the method via trace, stepping through the .local definition, debugging that, then continuing.
In addition, I added debugcall, which you give an actual, full call that you would want to invoke. Doing so sets debugging on the first closue which will be invoked when evaluating that call that is not an S3 or S4 standard generic. So if you are calling a non-generic, that will just be the top level function being called, but if it is a standard S3 or S4 generic, the first method that will be hit is debugged instead of the generic. A "standard S3 generic" is defined as a function where the first top-level (ignoring curly braces) call in the body is a call to UseMethod.
Note we went back and forth on the design of this but at the end of the day settled on debugcall not actually executing the function call being debugged, but it returns the call expression which you can pass it to eval if desired, as illustrated in ?debugcall.

Resources