What is the best way to read the code of already defined function (especially from the System` context)? - wolfram-mathematica

Occasionally we like to look into how certain System` functions are defined (when they're written in Mathematica). This question is about the best way to do that.
Points to keep in mind:
Of couse ReadProtected needs to be removed first.
Builtins usually need to be used at least once before they get loaded into the kernel. Is a single simple invocation usually sufficient for this when they have extended functionality (e.g. through options)?
Information (??) gives the definition in a hard-to-read format (no indentation, and all private context names prepended). What is the best way to get rid of the context names, and get formatted code?
One idea for getting rid of certain contexts is Block[{$ContextPath = Append[$ContextPath, "SomeContext`Private`"], Information[symbol]]. Code can be auto-formatted using Workbench. Some issues remain, e.g. Information doesn't quote strings, preventing the code from being able to be copied into Workbench.
Generally, I'm interested in how people do this, what methods they use to make the code of builtins as easy to read as possible.
Use case: For example, recently I digged into the code of RunThrough when I found out that it simply doesn't work on Windows XP (turns out it fails to quote the names of temp files when the path to them contains spaces).
Update: It appears that there used to be a function for printing definitions without context prepended, Developer`ContextFreeForm, but it's not working any more in newer versions.

Regarding the pretty-printing: the following is a very schematic code which builds on the answer of #Mr.Wizard to show that a few simple rules can go a long way towards improving the readability of the code:
Internal`InheritedBlock[{RunThrough},
Unprotect[RunThrough];
ClearAttributes[RunThrough, ReadProtected];
Block[{$ContextPath = Append[$ContextPath, "System`Dump`"]},
With[{boxes = ToBoxes# DownValues[RunThrough]},
CellPrint[Cell[BoxData[#], "Input"]] &[
boxes /.
f_[left___, "\[RuleDelayed]", right___] :>
f[left, "\[RuleDelayed]", "\n", right] //.
{
RowBox[{left___, ";", next : Except["\n"], right___}] :>
RowBox[{left, ";", "\n", "\t", next, right}],
RowBox[{sc : ("Block" | "Module" | "With"), "[",
RowBox[{vars_, ",", body_}], "]"}] :>
RowBox[{sc, "[", RowBox[{vars, ",", "\n\t", body}], "]"}]
}]]]]
This is for sure not a general solution (in particular it won't work well on deeply nested functional code without many separate statements), but I am sure it can be improved and generalized without too much trouble to cover many cases of interest.

Good question, because I don't think I have seen this discussed yet.
I do essentially the same thing you outlined. You can get a somewhat different print-out with Definition, and more information with FullDefinition:
Unprotect[RunThrough];
ClearAttributes[RunThrough, ReadProtected]
Block[{$ContextPath = Append[$ContextPath, "System`Dump`"]},
Print # FullDefinition # RunThrough
]

Related

Using the output of functions in mathematica for further computation

Mathematica has a bevy of useful functions (Solve, NDSolve, etc.). These functions output in a very strange manner, ie {{v -> 2.05334*10^-7}}. The major issue is that there does not appear to be any way to use the output of these functions in the program; that is to say all of these appear to be terminal functions where the output is for human viewing only.
I have tired multiple methods (Part, /., etc.) to try to get the output of functions into variables so the program can use them for further steps, but nothing works. The documentation says it can be done but nothing they list actually functions. For example, if I try to use /. to move variables, it continues to treat the variable I assigned to as empty and does symbolic math with it instead of seeing the value. If I try to access the variable ie [[1]], it says the variable is not that deep.
The only method I have found is to put the later steps in separate blocks and copy-paste the output to continue evaluation. Is there any way to get the output of these functions into variables programmatically?
Solve etc. produce a list of replacement rules. So you need to apply these rules to the pattern to be replaced. For instance
solutions = x /. Solve[x^2 == 3, x]
gives you all the solutions in a list.
Here is a quick way to get variable names for the solutions:
x1 = solutions[[1]]
x2 = solutions[[2]]

Unwanted evaluation in assignments in Mathematica: why it happens and how to debug it during the package-loading?

I am developing a (large) package which does not load properly anymore.
This happened after I changed a single line of code.
When I attempt to load the package (with Needs), the package starts loading and then one of the setdelayed definitions “comes alive” (ie. Is somehow evaluated), gets trapped in an error trapping routine loaded a few lines before and the package loading aborts.
The error trapping routine with abort is doing its job, except that it should not have been called in the first place, during the package loading phase.
The error message reveals that the wrong argument is in fact a pattern expression which I use on the lhs of a setdelayed definition a few lines later.
Something like this:
……Some code lines
Changed line of code
g[x_?NotGoodQ]:=(Message[g::nogood, x];Abort[])
……..some other code lines
g/: cccQ[g[x0_]]:=True
When I attempt to load the package, I get:
g::nogood: Argument x0_ is not good
As you see the passed argument is a pattern and it can only come from the code line above.
I tried to find the reason for this behavior, but I have been unsuccessful so far.
So I decided to use the powerful Workbench debugging tools .
I would like to see step by step (or with breakpoints) what happens when I load the package.
I am not yet too familiar with WB, but it seems that ,using Debug as…, the package is first loaded and then eventually debugged with breakpoints, ect.
My problem is that the package does not even load completely! And any breakpoint set before loading the package does not seem to be effective.
So…2 questions:
can anybody please explain why these code lines "come alive" during package loading? (there are no obvious syntax errors or code fragments left in the package as far as I can see)
can anybody please explain how (if) is possible to examine/debug
package code while being loaded in WB?
Thank you for any help.
Edit
In light of Leonid's answer and using his EvenQ example:
We can avoid using Holdpattern simply by definying upvalues for g BEFORE downvalues for g
notGoodQ[x_] := EvenQ[x];
Clear[g];
g /: cccQ[g[x0_]] := True
g[x_?notGoodQ] := (Message[g::nogood, x]; Abort[])
Now
?g
Global`g
cccQ[g[x0_]]^:=True
g[x_?notGoodQ]:=(Message[g::nogood,x];Abort[])
In[6]:= cccQ[g[1]]
Out[6]= True
while
In[7]:= cccQ[g[2]]
During evaluation of In[7]:= g::nogood: -- Message text not found -- (2)
Out[7]= $Aborted
So...general rule:
When writing a function g, first define upvalues for g, then define downvalues for g, otherwise use Holdpattern
Can you subscribe to this rule?
Leonid says that using Holdpattern might indicate improvable design. Besides the solution indicated above, how could one improve the design of the little code above or, better, in general when dealing with upvalues?
Thank you for your help
Leaving aside the WB (which is not really needed to answer your question) - the problem seems to have a straightforward answer based only on how expressions are evaluated during assignments. Here is an example:
In[1505]:=
notGoodQ[x_]:=True;
Clear[g];
g[x_?notGoodQ]:=(Message[g::nogood,x];Abort[])
In[1509]:= g/:cccQ[g[x0_]]:=True
During evaluation of In[1509]:= g::nogood: -- Message text not found -- (x0_)
Out[1509]= $Aborted
To make it work, I deliberately made a definition for notGoodQ to always return True. Now, why was g[x0_] evaluated during the assignment through TagSetDelayed? The answer is that, while TagSetDelayed (as well as SetDelayed) in an assignment h/:f[h[elem1,...,elemn]]:=... does not apply any rules that f may have, it will evaluate h[elem1,...,elem2], as well as f. Here is an example:
In[1513]:=
ClearAll[h,f];
h[___]:=Print["Evaluated"];
In[1515]:= h/:f[h[1,2]]:=3
During evaluation of In[1515]:= Evaluated
During evaluation of In[1515]:= TagSetDelayed::tagnf: Tag h not found in f[Null]. >>
Out[1515]= $Failed
The fact that TagSetDelayed is HoldAll does not mean that it does not evaluate its arguments - it only means that the arguments arrive to it unevaluated, and whether or not they will be evaluated depends on the semantics of TagSetDelayed (which I briefly described above). The same holds for SetDelayed, so the commonly used statement that it "does not evaluate its arguments" is not literally correct. A more correct statement is that it receives the arguments unevaluated and does evaluate them in a special way - not evaluate the r.h.s, while for l.h.s., evaluate head and elements but not apply rules for the head. To avoid that, you may wrap things in HoldPattern, like this:
Clear[g,notGoodQ];
notGoodQ[x_]:=EvenQ[x];
g[x_?notGoodQ]:=(Message[g::nogood,x];Abort[])
g/:cccQ[HoldPattern[g[x0_]]]:=True;
This goes through. Here is some usage:
In[1527]:= cccQ[g[1]]
Out[1527]= True
In[1528]:= cccQ[g[2]]
During evaluation of In[1528]:= g::nogood: -- Message text not found -- (2)
Out[1528]= $Aborted
Note however that the need for HoldPattern inside your left-hand side when making a definition is often a sign that the expression inside your head may also evaluate during the function call, which may break your code. Here is an example of what I mean:
In[1532]:=
ClearAll[f,h];
f[x_]:=x^2;
f/:h[HoldPattern[f[y_]]]:=y^4;
This code attempts to catch cases like h[f[something]], but it will obviously fail since f[something] will evaluate before the evaluation comes to h:
In[1535]:= h[f[5]]
Out[1535]= h[25]
For me, the need for HoldPattern on the l.h.s. is a sign that I need to reconsider my design.
EDIT
Regarding debugging during loading in WB, one thing you can do (IIRC, can not check right now) is to use good old print statements, the output of which will appear in the WB's console. Personally, I rarely feel a need for debugger for this purpose (debugging package when loading)
EDIT 2
In response to the edit in the question:
Regarding the order of definitions: yes, you can do this, and it solves this particular problem. But, generally, this isn't robust, and I would not consider it a good general method. It is hard to give a definite advice for a case at hand, since it is a bit out of its context, but it seems to me that the use of UpValues here is unjustified. If this is done for error - handling, there are other ways to do it without using UpValues.
Generally, UpValues are used most commonly to overload some function in a safe way, without adding any rule to the function being overloaded. One advice is to avoid associating UpValues with heads which also have DownValues and may evaluate -by doing this you start playing a game with evaluator, and will eventually lose. The safest is to attach UpValues to inert symbols (heads, containers), which often represent a "type" of objects on which you want to overload a given function.
Regarding my comment on the presence of HoldPattern indicating a bad design. There certainly are legitimate uses for HoldPattern, such as this (somewhat artificial) one:
In[25]:=
Clear[ff,a,b,c];
ff[HoldPattern[Plus[x__]]]:={x};
ff[a+b+c]
Out[27]= {a,b,c}
Here it is justified because in many cases Plus remains unevaluated, and is useful in its unevaluated form - since one can deduce that it represents a sum. We need HoldPattern here because of the way Plus is defined on a single argument, and because a pattern happens to be a single argument (even though it describes generally multiple arguments) during the definition. So, we use HoldPattern here to prevent treating the pattern as normal argument, but this is mostly different from the intended use cases for Plus. Whenever this is the case (we are sure that the definition will work all right for intended use cases), HoldPattern is fine. Note b.t.w., that this example is also fragile:
In[28]:= ff[Plus[a]]
Out[28]= ff[a]
The reason why it is still mostly OK is that normally we don't use Plus on a single argument.
But, there is a second group of cases, where the structure of usually supplied arguments is the same as the structure of patterns used for the definition. In this case, pattern evaluation during the assignment indicates that the same evaluation will happen with actual arguments during the function calls. Your usage falls into this category. My comment for a design flaw was for such cases - you can prevent the pattern from evaluating, but you will have to prevent the arguments from evaluating as well, to make this work. And pattern-matching against not completely evaluated expression is fragile. Also, the function should never assume some extra conditions (beyond what it can type-check) for the arguments.

Treetop grammar infinite loop

I have had some ideas for a new programming language floating around in my head, so I thought I'd take a shot at implementing it. A friend suggested I try using Treetop (the Ruby gem) to create a parser. Treetop's documentation is sparse, and I've never done this sort of thing before.
My parser is acting like it has an infinite loop in it, but with no stack traces; it is proving difficult to track down. Can somebody point me in the direction of an entry-level parsing/AST guide? I really need something that list rules, common usage etc for using tools like Treetop. My parser grammer is on GitHub, in case someone wishes to help me improve it.
class {
initialize = lambda (name) {
receiver.name = name
}
greet = lambda {
IO.puts("Hello, #{receiver.name}!")
}
}.new(:World).greet()
I asked treetop to compile your language into an .rb file. That gave me something to dig into:
$ tt -o /tmp/rip.rb /tmp/rip.treetop
Then I used this little stub to recreate the loop:
require 'treetop'
load '/tmp/rip.rb'
RipParser.new.parse('')
This hangs. Now, isn't that interesting! An empty string reproduces the behavior just as well as the dozen-or-so-line example in your question.
To find out where it's hanging, I used an Emacs keyboard macro to edit rip.rb, adding a debug statement to the entry of each method. For example:
def _nt_root
p [__LINE__, '_nt_root'] #DEBUG
start_index = index
Now we can see the scope of the loop:
[16, "root"]
[21, "_nt_root"]
[57, "_nt_statement"]
...
[3293, "_nt_eol"]
[3335, "_nt_semicolon"]
[3204, "_nt_comment"]
[57, "_nt_statement"]
[57, "_nt_statement"]
[57, "_nt_statement"]
...
Further debugging from there reveals that an integer is allowed to be an empty string:
rule integer
digit*
end
This indirectly allows a statement to be an empty string, and the top-level rule statement* to forever consume empty statements. Changing * to + fixes the loop, but reveals another problem:
/tmp/rip.rb:777:in `_nt_object': stack level too deep (SystemStackError)
from /tmp/rip.rb:757:in `_nt_compound_object'
from /tmp/rip.rb:1726:in `_nt_range'
from /tmp/rip.rb:1671:in `_nt_special_literals'
from /tmp/rip.rb:825:in `_nt_literal_object'
from /tmp/rip.rb:787:in `_nt_object'
from /tmp/rip.rb:757:in `_nt_compound_object'
from /tmp/rip.rb:1726:in `_nt_range'
from /tmp/rip.rb:1671:in `_nt_special_literals'
... 3283 levels...
Range is left-recursing, indirectly, via special_literals, literal_object, object, and compound_object. Treetop, when faced with left recursion, eats stack until it pukes. I don't have a quick fix for that problem, but at least you've got a stack trace to go from now.
Also, this is not your immediate problem, but the definition of digit is odd: It can either one digit, or multiple. This causes digit* or digit+ to allow the (presumably) illegal integer 1________2.
I really enjoyed Language Implementation Patterns by Parr; since Parr created the ANTLR parser generator, it's the tool he uses throughout the book, but it should be simple enough to learn from it all the same.
What I really liked about it was the way each example grew upon the previous one; he doesn't start out with a gigantic AST-capable parser, instead he slowly introduces problems that need more and more 'backend smarts' to do the job, so the book scales well along with the language that needs parsing.
What I wish it covered in a little more depth is the types of languages that one can write and give advice on Do's and Do Not Do's when designing languages. I've seen some languages that are a huge pain to parse and I'd have liked to know more about the design decisions that could have been made differently.

How to define /#-like operator

I would like to define a new operator of the form x /==> y, where
the operator /==> is treated as e.g. the /# operator of Map, and
is translated to MyFunction[x, y]. There is one important aspect: I
want the resulting operator to behave in the frontend like any two-bit
operator does, that is, the two characters (a Divide and a
DoubleLongRightArrow) should be connected together, no syntax
coloration should appear, and they are to be selected together when
clicked, so precedence must be set. Also, I'd rather avoid using the
Notation` package. As a result, I'd like to see something like this:
In[11]:= FullForm[x/\[DoubleLongRightArrow]y]
Out[11]//FullForm= MyFunction[x,y]
Does anyone have an idea how to achieve this?
The Notation Package is perhaps the closest to doing this kind of thing, but according to the response to my own question of a similar nature, what you want is unfortunately not practical.
Don't let this stop you from trying however, as you will probably learn new things in the process. The Notation Package and the the functions that underpin it are far from useless.
You may also find the replies to this question informative.
There are a number of functions that are useful for manual implementation of syntax changes. Rather than try to write my own help file for these, I will direct you to the official pages on these functions. After reading them, please ask any focused questions you have, or for help with implementing specific ideas. I or others here should be able to either answer your question, show you how to do something, or explain why it is not readily possible.
The index page on Textual Input and Output.
MakeBoxes, and MakeExpression, and an example of their use.
PreRead
More drastically, one might use CellEvaluationFunction which can be used to do unusual things.
There are more, and I will try to extend this list later. (others are welcome to edit this post)
Thanks to Mr.Wizard's links, I've found the only example in the documentation on how to parse new operators (the gplus example in Low-Level Input). According to this example, here is my version for the new operator PerArrow. Please comment/critize on the code below:
In[1]:= PerArrow /: MakeBoxes[PerArrow[x_, y_], StandardForm] :=
RowBox[{MakeBoxes[x, StandardForm],
RowBox[{AdjustmentBox["/", BoxMargins -> -.2],
AdjustmentBox["\[DoubleLongRightArrow]", BoxMargins -> -.1]}],
MakeBoxes[y, StandardForm]}];
MakeExpression[
RowBox[{x_, "/", RowBox[{"\[DoubleLongRightArrow]", y_}]}],
StandardForm] :=
MakeExpression[RowBox[{"PerArrow", "[", x, ",", y, "]"}],
StandardForm];
In[3]:= PerArrow[x, y]
Out[3]= x /\[DoubleLongRightArrow] y
In[4]:= x /\[DoubleLongRightArrow]y
Out[4]= x /\[DoubleLongRightArrow] y
In[5]:= FullForm[x /\[DoubleLongRightArrow]y]
Out[5]//FullForm= \!\(\*
TagBox[
StyleBox[
RowBox[{"PerArrow", "[",
RowBox[{"x", ",", "y"}], "]"}],
ShowSpecialCharacters->False,
ShowStringCharacters->True,
NumberMarks->True],
FullForm]\)
For sake of clarity, here is a screenshot as well:
Since the operator is not fully integrated, further concerns are:
the operator is selected weird when clicked (DoubleLongRightArrow with y instead of with /).
accordingly, the parsing part requires the DoubleLongRightArrow to be RowBox-ed with y, otherwise it yields syntax error
syntax coloration (at In[4] and In[5])
it prints weird if inputted directly (notice the large gaps at In[4] and In[5])
Now, I can live with these, though it would be nice to have some means to iron out all the minor issues. I guess all these boil down to basically some even lower-level syntax handler, which does not now how to group the new operator. Any idea on how to tackle these? I understand that Cell has a multitude of options which might come handy (like CellEvaluationFunction, ShowAutoStyles and InputAutoReplacements) though I'm again clueless here.

Sprintf equivalent in Mathematica?

I don't know why Wikipedia lists Mathematica as a programming language with printf. I just couldn't find the equivalent in Mathematica.
My specific task is to process a list of data files with padded numbers, which I used to do it in bash with
fn=$(printf "filename_%05d" $n)
The closest function I found in Mathematica is PaddedForm. And after some trial and error, I got it with
"filename_" <> PaddedForm[ Round##, 4, NumberPadding -> {"0", ""} ]&
It is very odd that I have to use the number 4 to get the result similar to what I get from "%05d". I don't understand this behavior at all. Can someone explain it to me?
And is it the best way to achieve what I used to in bash?
I wouldn't use PaddedForm for this. In fact, I'm not sure that PaddedForm is good for much of anything. Instead, I'd use good old ToString, Characters and PadLeft, like so:
toFixedWidth[n_Integer, width_Integer] :=
StringJoin[PadLeft[Characters[ToString[n]], width, "0"]]
Then you can use StringForm and ToString to make your file name:
toNumberedFileName[n_Integer] :=
ToString#StringForm["filename_``", toFixedWidth[n, 5]]
Mathematica is not well-suited to this kind of string munging.
EDIT to add: Mathematica proper doesn't have the required functionality, but the java.lang.String class has the static method format() which takes printf-style arguments. You can call out to it using Mathematica's JLink functionality pretty easily. The performance won't be very good, but for many use cases you just won't care that much:
Needs["JLink`"];
LoadJavaClass["java.lang.String"];
LoadJavaClass["java.util.Locale"];
sprintf[fmt_, args___] :=
String`format[Locale`ENGLISH,fmt,
MakeJavaObject /#
Replace[{args},
{x_?NumericQ :> N#x,
x : (_Real | _Integer | True |
False | _String | _?JavaObjectQ) :> x,
x_ :> MakeJavaExpr[x]},
{1}]]
You need to do a little more work, because JLink is a bit dumb about Java functions with a variable number of arguments. The format() method takes a format string and an array of Java Objects, and Mathematica won't do the conversion automatically, which is what the MakeJavaObject is there for.
I've run into the same problem quite a bit, and decided to code my own function. I didn't do it in Java but instead just used string operations in Mathematica. It turned out quite lengthy, since I actually also needed %f functionality, but it works, and now I have it as a package that I can use at any time. Here's a link to the GitHub project:
https://github.com/vlsd/MathPrintF
It comes with installation instructions (really just copying the directory somewhere in the $Path).
Hope this will be helpful to at least some.
You could also define a function which passes all arguments to StringForm[] and use IntegerString or the padding functions as previously mentioned:
Sprintf[args__] := StringForm[args__] // ToString;
file = Sprintf["filename_``", IntegerString[n, 10, 5]];
IntegerString does exactly what you need. In this case it would be
IntegerString[x,10,5]
I agree with Pillsy.
Here's how I would do it.
Note the handy cat function, which I think of as kind of like sprintf (minus the placeholders like StringForm provides) in that it works like Print (you can print any concatenation of expressions without converting to String) but generates a string instead of sending to stdout.
cat = StringJoin##(ToString/#{##})&;
pad[x_, n_] := If[StringLength#cat[x]>=n, cat[x],
cat##PadLeft[Characters#cat[x],n,"0"]]
cat["filename_", pad[#, 5]]&
This is very much like Pillsy's answer but I think cat makes it a little cleaner.
Also, I think it's safer to have that conditional in the pad function -- better to have the padding wrong than the number wrong.

Resources