Listing functions with debug flag set in R - debugging

I am trying to find a global counterpart to isdebugged() in R. My scenario is that I have functions that make calls to other functions, all of which I've written, and I am turning debug() on and off for different functions during my debugging. However, I may lose track of which functions are set to be debugged. When I forget and start a loop, I may get a lot more output (nuisance, but not terrible) or I may get no output when some is desired (bad).
My current approach is to use a function similar to the one below, and I can call it with listDebugged(ls()) or list the items in a loaded library (examples below). This could suffice, but it requires that I call it with the list of every function in the workspace or in the packages that are loaded. I can wrap another function that obtains these. It seems like there should be an easier way to just directly "ask" the debug function or to query some obscure part of the environment where it is stashing the list of functions with the debug flag set.
So, a two part question:
Is there a simpler call that exists to query the functions with the debug flag set?
If not, then is there any trickery that I've overlooked? For instance, if a function in one package masks another, I suspect I may return a misleading result.
I realize that there is another method I could try and that is to wrap debug and undebug within functions that also maintain a hidden list of debugged function names. I'm not yet convinced that's a safe thing to do.
UPDATE (8/5/11): I searched SO, and didn't find earlier questions. However, SO's "related questions" list has shown that an earlier question that is similar, though the function in the answer for that question is both more verbose and slower than the function offered by #cbeleites. The older question also doesn't provide any code, while I did. :)
The code:
listDebugged <- function(items){
isFunction <- vector(length = length(items))
isDebugged <- vector(length = length(items))
for(ix in seq_along(items)){
isFunction[ix] <- is.function(eval(parse(text = items[ix])))
}
for(ix in which(isFunction == 1)){
isDebugged[ix] <- isdebugged(eval(parse(text = items[ix])))
}
names(isDebugged) <- items
return(isDebugged)
}
# Example usage
listDebugged(ls())
library(MASS)
debug(write.matrix)
listDebugged(ls("package:MASS"))

Here's my throw at the listDebugged function:
ls.deb <- function(items = search ()){
.ls.deb <- function (i){
f <- ls (i)
f <- mget (f, as.environment (i), mode = "function",
## return a function that is not debugged
ifnotfound = list (function (x) function () NULL)
)
if (length (f) == 0)
return (NULL)
f <- f [sapply (f, isdebugged)]
f <- names (f)
## now check whether the debugged function is masked by a not debugged one
masked <- !sapply (f, function (f) isdebugged (get (f)))
## generate pretty output format:
## "package::function" and "(package::function)" for masked debugged functions
if (length (f) > 0) {
if (grepl ('^package:', i)) {
i <- gsub ('^package:', '', i)
f <- paste (i, f, sep = "::")
}
f [masked] <- paste ("(", f [masked], ")", sep = "")
f
} else {
NULL
}
}
functions <- lapply (items, .ls.deb)
unlist (functions)
}
I chose a different name, as the output format are only the debugged functions (otherwise I easily get thousands of functions)
the output has the form package::function (or rather namespace::function but packages will have namespaces pretty soon anyways).
if the debugged function is masked, output is "(package::function)"
the default is looking throught the whole search path

This is a simple one-liner using lsf.str:
which(sapply(lsf.str(), isdebugged))
You can change environments within the function, see ?lsf.str for more arguments.

Since the original question, I've been looking more and more at Mark Bravington's debug package. If using that package, then check.for.traces() is the appropriate command to list those functions that are being debugged via mtrace.
The debug package is worth a look if one is spending much time with the R debugger and various trace options.

#cbeleites I like your answer, but it didn't work for me. I got this to work but it is less functional than yours above (no recursive checks, no pretty print)
require(plyr)
debug.ls <- function(items = search()){
.debug.ls <- function(package){
f <- ls(package)
active <- f[which(aaply(f, 1, function(x){
tryCatch(isdebugged(x), error = function(e){FALSE}, finally=FALSE)
}))]
if(length(active)==0){
return(NULL)
}
active
}
functions <- lapply (items, .debug.ls)
unlist (functions)
}

I constantly get caught in the browser window frame because of failing to undebug functions. So I have created two functions and added them to my .Rprofile. The helper functions are pretty straight forward.
require(logging)
# Returns a vector of functions on which the debug flag is set
debuggedFuns <- function() {
envs <- search()
debug_vars <- sapply(envs, function(each_env) {
funs <- names(Filter(is.function, sapply(ls(each_env), get, each_env)))
debug_funs <- Filter(isdebugged, funs)
debug_funs
})
return(as.vector(unlist(debug_vars)))
}
# Removes the debug flag from all the functions returned by `debuggedFuns`
unDebugAll <- function(verbose = TRUE) {
toUnDebug <- debuggedFuns()
if (length(toUnDebug) == 0) {
if (verbose) loginfo('no Functions to `undebug`')
return(invisible())
} else {
if (verbose) loginfo('undebugging [%s]', paste0(toUnDebug, collapse = ', '))
for (each_fn in toUnDebug) {
undebug(each_fn)
}
return(invisible())
}
}
I have tested them out, and it works pretty well. Hope this helps!

Related

Why doesn't this data.table function modify the argument? [duplicate]

I'm writing a function that, among other things, coerces the input into a data.table.
library(data.table)
df <- data.frame(id = 1:10)
f <- function(df){setDT(df)}
f(df)
df[, temp := 1]
However, the last command outputs the following warning:
Warning message: In [.data.table(df, , :=(temp, 1)) : Invalid
.internal.selfref detected and fixed by taking a copy of the whole
table so that := can add this new column by reference. At an earlier
point, this data.table has been copied by R (or been created manually
using structure() or similar). Avoid key<-, names<- and attr<- which
in R currently (and oddly) may copy the whole data.table. Use set*
syntax instead to avoid copying: ?set, ?setnames and ?setattr. Also,
in R<=v3.0.2, list(DT1,DT2) copied the entire DT1 and DT2 (R's list()
used to copy named objects); please upgrade to R>v3.0.2 if that is
biting. If this message doesn't help, please report to datatable-help
so the root cause can be fixed.
I'm using v1.9.3 of data.table and R 3.1.1. Does it mean df is copied at some point? How to avoid this warning?
Edit:
The code of setDT actually uses NSE. So this seems to work:
df1 <- data.frame(id = 1:10)
f <- function(df){eval(substitute(setDT(df)),parent.frame())}
f(df1)
df1[, temp := 1]
It seems I can do other stuffs with df within the function f like
df1 <- data.frame(id = 1:10)
f <- function(df){
eval(substitute(setDT(df)),parent.frame())
df[, temp := 1]
}
f(df1)
Is this the right way to do it?
Great question! The warning message should say: ... and fixed by taking a shallow copy of the whole table .... Will fix this.
setDT does two things:
set the class to data.table from data.frame/list
use alloc.col to over-allocate columns (so that := can be used directly)
And the 2nd step requires a shallow copy, if the input is not a data.table already. And this is why we assign the value back to the symbol in it's environment (setDT's parent frame). But the parent frame for setDT is your function f(). Therefore the setDT(df) within your function has gone through smoothly, but the df that resides in the global environment will only have it's class changed, not the over-allocation (as the shallow copy severed the link).
And in the next step, := detects that and shallow copies once again to over-allocate.
The idea so far is to use setDT to convert to data.tables before providing it to a function. But I'd like that these cases be resolved (will take a look).
Thanks a bunch!

Change row names in data frame in R

I've got the following code to iterate through directory/subdirectory, pick out certain files, read a value in them, and populate a new data frame with those values. It works, with a few issues...
Here's the code
wd = setwd("/Users/TK/Downloads/DataCSV")
Groups <- list.dirs(path = wd, full.names = TRUE, recursive = FALSE)
Subj <- list.dirs(path = Groups, full.names = TRUE, recursive = FALSE)
section_area_vector <- numeric()
for(i in Subj) {
setwd(i)
section_area <- list.files(path = i, pattern = "section_area",
full.names = FALSE, recursive = TRUE)
read_area <- sapply(section_area, function(x)read.csv(x)[1,2])
total_area_subj <- sum(read_area)
section_area_vector <- rbind(section_area_vector, total_area_subj)
}
section_area_data <- as.data.frame(section_area_vector)
colnames(section_area_data)[colnames(section_area_data) ==
"V1"] <- "Area"
The output looks like this table:
How do I get the row names to appear as subj.1, subj.2, subj.3
Also, I seem to have to run the code twice, with the first time it not working (basically a null result), but the second time it works and yields the table - any ideas why this might be?
Also, is this the best way to write this task, or is there something more elegant? I know "for loops" are frowned upon as they are slow (eventually there will be lots of data to work with)...tried using sapply functions but got lost in the syntax. Would love some suggestions if this code can be improved.

Using the Haxe While Loop to Remove All of a Value from an Array

I'm wanting to remove all of a possibly duplicated value in an array. At the moment I'm using the remove(x:T):Bool function in a while loop, but I'm wondering about the expression part.
I've started by using:
function removeAll(array:Array<String>, element:String):Void
while (array.remove(element)) {}
but I'm wondering if any of these lines would be more efficient:
while (array.remove(element)) continue;
while (array.remove(element)) true;
while (array.remove(element)) 0;
or if it makes any kind of difference.
I'm guessing that using continue is less efficient because it actually has to do something, true and 0 are slightly more efficient, but still do something, and {} would probably be most efficient.
Does anyone have any background information on this?
While other suggested filter, it will create a new instance of list/array which may cause your other code to lose reference.
If you loop array.remove, it is going to loop through all the elements in the front of the array every time, which is not so performant.
IMO a better approach is to use a reverse while loop:
var i = array.length;
while(--i >= 0)
if(array[i] == element) array.splice(i, 1);
It doesn't make any difference. In fact, there's not even any difference in the generated code for the {}, 0 and false cases: they all end up generating {}, at least on the JS target.
However, you could run into issues if you have a large array with many duplicates: in that case, remove() would be called many times, and it has to iterate over the array each time (until it finds a match, that is). In that case, it's probably more efficient to use filter():
function removeAll(array:Array<String>, element:String):Array<String>
return array.filter(function(e) return e != element);
Personally, I also find this to be a bit more elegant than your while-loop with an empty body. But again, it depends on the use case: this does create a new array, and thus causes an allocation. Usually, that's not worth worrying about, but if you for instance do it in the update loop of a game, you might want to avoid it.
In terms of the expression part of the while loop, it seems that it's just set to empty brases ({}) when compiled so it doesn't really matter what you do.
In terms of performance, a much better solution is the Method 2 from the following:
class Test
{
static function main()
{
var thing:Array<String> = new Array<String>();
for (index in 0...1000)
{
thing.push("0");
thing.push("1");
}
var copy1 = thing.copy();
var copy2 = thing.copy();
trace("epoch");
while (copy1.remove("0")) {}
trace("check");
// Method 2.
copy2 = [
for (item in Lambda.filter(copy2, function(v)
{return v != "0";}))
item
];
trace("check");
}
}
which can be seen [here](https://try.haxe.org/#D0468"Try Haxe example."). For 200,000 one-character elements in an Array<String>, Method 2 takes 0.017s while Method 1 takes 44.544s.
For large arrays it will be faster to use a temporary array and then assign that back after populating ( method3 in try )?
OR
If you don't want to use a temp you can assign back and splice ( method4 in try )?
https://try.haxe.org/#5f80c
Both are more verbose codewise as I setup vars, but on mac seems faster at runtime, summary of my method3 approach:
while( i < l ) { if( ( s = copy[ i++ ] ) != '0' ) arr[ j++ ] = s;
copy = arr;
am I missing something obvious against these approaches?

In Haskell, is it correct measuring performance using a timestamp obtained at the beginning and in the end of a function execution?

I want to measure the performance of a Haskell function. This function is executed concurrently.
Is it correct to measure its performance using timestamps that getCurrentTime function returns? Does lazyness affects the measuring?
I want to save these times on a log. I have looked some logging libraries, but the time they return is not as precise as the timestamp that getCurrentTime returns. I use XES format on my log.
The code i use is something like this: (i did not compile it)
import Data.Time.Clock
measuredFunction:: Int -> IO (Int,UTCTime,UTCTime)
measuredFunction x = do
time' <- getCurrentTime
--performs some IO action that returns x'
time'' <- getCurrentTime
return (x',time',time'')
runTest :: Int -> Int -> IO ()
runTest init end = do
when (init <= end) (do
forkIO (do
(x',time',time'') <- measuredFunction 1
-- saves time' and time '' in a log
)
runTest (init+1) end )
It depends on the function. Some values have all their information immediately, whereas others can have expensive stuff going on "beyond the top layer". Here's a contrived example:
example :: (Int, Int)
example = (1+1, head [ x | x <- [1..], x == 10^6 ])
If you load this up in ghci, you will see (2, printed, and then after some delay, the remainder of the value 1000000) is printed. If you get a value like this, then the function will "return" "before" the expensive sub-value has been computed. But you can use deepseq to ensure that a value is computed all the way and doesn't have any sub-computations left.
Benchmarking is subtle, and there are a lot of ways to do it wrong (especially in Haskell). Fortunately we have a very good benchmarking library called criterion
(tutorial) which I definitely recommend you use if you are trying to get reliable results.

Purpose of an 'Identity Function'?

I came across this subject when I was reading through PrototypeJS's docs: its Identity Function. I did some further searching&reading on it and I think I understand its mathematical basis (e.g. multiplication by 1 is an identity function (or did I misinterpret this?)), but not why you would write a JS(or PHP or C or whatever)-function that basically takes X as a parameter and then just does something like return X.
Is there a deeper insight connected to this? Why does Prototype supply this function? What can I use it for?
Thanks :)
Using the Identity function makes the library code slightly easier to read. Take the Enumerable#any method:
any: function(iterator, context) {
iterator = iterator || Prototype.K;
var result = false;
this.each(function(value, index) {
if (result = !!iterator.call(context, value, index))
throw $break;
});
return result;
},
It allows you to check if any of the elements of an array are true in a boolean context. Like so:
$A([true, false, true]).any() == true
but it also allows you to process each of the elements before checking for true:
$A([1,2,3,4]).any(function(e) { return e > 2; }) == true
Now without the identity function you would have to write two versions of any function, one if you pre process and one if you dont.
any_no_process: function(iterator, context) {
var result = false;
this.each(function(value, index) {
if (value)
throw $break;
});
return result;
},
any_process: function(iterator, context) {
return this.map(iterator).any();
},
I do not know about that library, but normally, you optimize formuals or code or whatever by factoring common part out, like if (add) (a + b) + x else a + b should be rewritten into a + b + (add ? x : 0). You are tempted to do the same with
if (!initialized) initialize(callback_with_very_long_name) else callback_with_very_long_name
Looks pretty similar. You can easily factor out a common factor or term but how do we factor out a function application? If you understand mathematiscs or Hascel, you should see that
a ? x + v : v
looks very much like
a ? f value : value
You add x in one case but not in the other. You apply function in one case but not in the other. You optimize the former into (a ? x : 0) + v because 0 is additive identity (it does not change anything when added to it) and v is a common factor here, which comes always, regardless of application of x. In case of function application (or not application), the callback is the common factor. We want to factor it out. What is the identity function that we should apply to it so that nothing changes? Identity function!
(a ? f : identity) value
is what we are looking for. Our original example looks like the following then
(initialized ? identity : initialize) (callback_with_very_long_name)
Please note that it fits into one row of page now.

Resources