It is common in java, when using "modern" IDEs, to inline variable values and perform heavy refactoring that can, as an example, transform this source code
boolean test = true;
//...
if(test) {
//...
}
Into this code
if(true) {
//...
}
Obviously, this code can be simplified, but Eclipse won't perform that simplification for me.
So, is there any way (using Eclipse or - even better - maven) that can detect and (possibly) simplify that code ? (it would be obviously way better if such a tool was able to detect other wrong constructs like empty for loops, ...)
What you want is a Program Transformation system (PTS).
These are tools that read source code, build compiler data structures (almost always including at least an AST), carry out customized analysis and modification of the compiler data structures, and then regenerate source text (for the modified program) from those modified data structures.
Many of the PTS will allow you express changes to code directly in source-to-source form as rules, expressed in terms of the language syntax, metavariables, etc. The point of such a rule language is to let you express complex code transformations more easily.
Our DMS Software Reengineering Toolkit is such a PTS. You can easily simplify code with boolean expressions containing boolean constants with the following simple rules:
default domain Java~v7;
simplify_not_true(): primary -> primary
" ! true" -> "false";
simplify_not_false(): primary -> primary
" ! false" -> "true";
simplify_not_not(x: primary): primary -> primary
" ! ! \x " -> "\x";
simplify_and_right_true(x: term): conjunction -> conjunction ;
" \x && true " -> "\x";
simplify_and_left_true(x: term): conjunction -> conjunction ;
" true && \x " -> "\x";
simplify_and_left_false(x: term): conjunction -> conjunction ;
" false && \x " -> "false";
simplify_and_right_false(x: term): conjunction -> conjunction ;
" \x && false " -> "false"
if no_side_effects_or_exceptions(x); -- note additional semantic check here
simplify_or_right_false(x: term): disjunction -> disjunction ;
" \x || false " -> "\x";
simplify_or_left_false(x: term): disjunction -> disjunction ;
" false || \x " -> "\x";
simplify_or_right_true(x: term): disjunction -> disjunction ;
" \x || true " -> "true"
if no_side_effects_or_exceptions(x);
simplify_or_left_true(x: term): disjunction -> disjunction ;
" true || \x " -> "true";
(The grammar names "term", "primary", "conjunction", "disjunction" are directly from the BNF used to drive Java source code parsing.)
These rules together will take boolean expressions involving known boolean constants,
and simplify them down sometimes to simply "true" or "false".
To eliminate if-conditionals whose expressions are boolean constants one would write these:
simplify_if_true(b: block): statement -> statement
" if (true) \b" -> " \b ";
simplify_if_false(b: block): statement -> statement
" if (false) \b" -> ";" -- null statement
Together with boolean simplification, these two rules would get rid of conditionals for obviously true or obviously false conditionals.
To do what you want is bit more complicated, because you wish to propagate information from one place in the program, to another place possibly "far away". For that you need what amounts to a data flow analysis, showing where values can reach from their assignments:
default domain Java~v7;
rule propagate_constant_variables(i:IDENTIFIER): term -> term
" \i " -> construct_reaching_constant()
if constant_reaches(i);
This rule depends on a built-in analysis providing data flow facts and a custom
interface function "constant_reaches" that inspects this data.
(DMS has this for C, C++, Java and COBOL and support for doing it for other languages; to my knowledge, none of the other PTS mentioned in the Wikipedia article have these flow facts available). It also depends on a custom contructor "contruct_reaching_constant" to build a primitive tree node containing a reaching constant. These would be coded in DMS's underlying metaprogramming langauge and require a few tens of lines of code. Similarly the special condition discussed earlier "no_side_effects_or_exceptions"; this can be a lot more complex as the question about side effects may require an analysis of the full program.
There are tools such a Clang that can transform C++ code to some extent, but Clang does not have rewrite rules as PTS do, it is really a compiler with additional hooks.
I'm trying to compile some personal language to erlang. I want to create a function with pattern matching on clauses.
This is my data :
Data =
[ {a, <a_body> }
, {b, <b_body> }
, {c, <c_body> }
].
This is what i want :
foo(a) -> <a_body>;
foo(b) -> <b_body>;
foo(c) -> <c_body>;
foo(_) -> undefined. %% <- this
I do that at the moment :
MkCaseClause =
fun({Pattern,Body}) ->
cerl:c_clause([cerl:c_atom(Pattern)], deep_literal(Body))
end,
WildCardClause = cerl:c_clause([ ??? ], cerl:c_atom(undefined)),
CaseClauses = [MkCaseClause(S) || S <- Data] ++ [WildCardClause],
So please help me to define WildCardClause. I saw that if i call my compiled function with neither a nor b nor c it results in ** exception error: no true branch found when evaluating an if expression in function ....
When i print my Core Erlang code i get this :
'myfuncname'/1 =
fun (Key) ->
case Key of
<'a'> when 'true' -> ...
<'b'> when 'true' -> ...
<'c'> when 'true' -> ...
end
So okay, case is translated to if when core is compiled. So i need to specify a true clause as in an if expression to get a pure wildcard. I don't know how to do it, since matching true in an if expression and in a case one are different semantics. In a case, true is not a wildcard.
And what if i would like match expressions with wildcards inside like {sometag,_,_,Thing} -> {ok, Thing}.
Thank you
I've found a way to do this
...
WildCardVar = cerl:c_var('_Any'),
WildCardClause = cerl:c_clause([WildCardVar], cerl:c_atom(undefined)),
...
It should work for inner wildcards too, but one has to be careful to give different variable names to each _ wildcard since only multiple _ do not match each other, variables do.
f(X,_, _ ) %% matches f(a,b,c)
f(X,_X,_X) %% doesn't
Example and background ( note the usage of Hold, ReleaseHold ):
The following code represents a static factory method to create a scenegraph object ( from an XML file ). The (output-)field is an instance of CScenegraph ( an OO-System class ).
new[imp_]:= Module[{
ret,
type = "TG",
record ={{0,0,0},"Root TG"}
},
ret = MathNew[
"CScenegraph",
2,
MathNew["CTransformationgroup",1,{type,record},0,0,0,0,Null]];
ret#setTree[ret];
ret#getRoot[]#setColref[ret];
csp = loadClass["CSphere"];
spheres = Cases[imp, XMLElement["sphere", _, __], Infinity];
codesp = Cases[spheres, XMLElement["sphere",
{"point" -> point_, "radius" -> rad_, "hue" -> hue_}, {}] -> Hold[csp#new[ToExpression[point], ToExpression[rad], ToExpression[hue]]]];
ret#addAschild[ret#getRoot[],ReleaseHold[codesp]];
ret
];
My question is about the following:
spheres = Cases[imp, XMLElement[\sphere\, _, __], Infinity];
codesp = Cases[spheres, XMLElement[\sphere\,
{\point\ -> point_, \radius\ -> rad_, \"hue\" -> hue_}, {}] -> Hold[csp#new[ToExpression[point], ToExpression[rad], ToExpression[hue]]]];
ret#addAschild[ret#getRoot[],ReleaseHold[codesp]];
where
addAschild
adds ( a list of ) geometries to a ( root ) transformationgroup and has the signature
addAsChild[parent MathObject, child MathObject], or
addAsChild[parent MathObject, Children List{MathObject, ...}]
and the XML element representing a sphere looks as follows:
<sphere point='{0., 1., 3.}'
radius='1'
hue='0.55' />
If I do NOT USE Hold[] , ReleaseHold[] I end up with objectdata like
{"GE", {"SP", {CScenegraph`point, CScenegraph`rad}}, {CScenegraph`hue}}
while I would have expected
{"GE", {"SP", {{4., 3., -4.}, 3.}}, {0.45}}
(The above code with Hold[], ReleaseHold[] yields the correct data.)
Questions
1. Why is Hold necessary in this case? ( In fact, is it? Is there a way to code this without Hold[], ReleaseHold[]? ) ( I got it right by trial and error! Don't really understand why. )
2. As a learning point: What is the prototypical example / case for the usage of Hold / ReleaseHold?
EDIT:
Summary of Leonid's answer. Change this code
codesp = Cases[spheres, XMLElement["sphere",
{"point" -> point_, "radius" -> rad_, "hue" -> hue_}, {}] -> Hold[csp#new[ToExpression[point], ToExpression[rad], ToExpression[hue]]]];
ret#addAschild[ret#getRoot[],ReleaseHold[codesp]];
to:
codesp = Cases[spheres, XMLElement["sphere",
{"point" -> point_, "radius" -> rad_, "hue" -> hue_}, {}] :> csp#new[ToExpression[point], ToExpression[rad], ToExpression[hue]]];
ret#addAschild[ret#getRoot[],codesp];
The short answer for the first question is that you probably should have used RuleDelayed rather than Rule, and then you don't need Hold-ReleaseHold.
It is hard to be sure what is going on since your code sample is not self-contained. One thing to be sure is that OO-System performs non-trivial manipulations with contexts, since it uses contexts as an encapsulation mechanism (which makes sense). Normally, Rule and RuleDelayed inject the matched expressions in the r.h.s., so it is not clear how this could happen. Here is one possible scenario (you may execute this in a notebook):
BeginPackage["Test`"]
f[{a_Symbol, b_Symbol}] := {c, d};
fn[input_] := Cases[input, XMLElement[{"a" -> a_, "b" -> b_}, {}, {}] -> f[{a, b}]];
fn1[input_] := Cases[input, XMLElement[{"a" -> a_, "b" -> b_}, {}, {}] :> f[{a, b}]];
EndPackage[];
$ContextPath = DeleteCases[$ContextPath, "Test`"]
Now,
In[71]:= Test`fn[{XMLElement[{"a"->1,"b"->2},{},{}],{"a"->3,"b"->4},{"a"->5,"b"->6}}]
Out[71]= {{Test`c,Test`d}}
What happened is that, since we used Rule in XMLElement[...]->rhs, the r.h.s. evaluates before the substitution takes place - in this case the function f evaluates. Now,
In[78]:= Test`fn1[{XMLElement[{"a" -> 1, "b" -> 2}, {}, {}],
{"a" ->3, "b" -> 4}, {"a" -> 5, "b" -> 6}}]
Out[78]= {Test`f[{1, 2}]}
The result is different here since the idiom XMLElement[...] :> rhs was used in implementation of fn1, involving RuleDelayed this time. Therefore, f[{a,b}] was not evaluated until a and b were substituted by the matching numbers from the l.h.s. And since f does not have a rule for the argument of the form of list of 2 numbers, it is returned.
The reason why your method with Hold-ReleaseHold worked is that this prevented the r.h.s. (function f in my example, and the call to new in your original one) from evaluation until the values for pattern variables have been substituted into it. As a side note, you may find it useful to add better error-checking to your constructor (if OO-System allows that), so that problems like this would be better diagnosed at run-time.
So, the bottom line: use RuleDelayed, not Rule.
To answer the second question, the combination ReleaseHold-Hold is generally useful when you want to manipulate the held code before you allow it to evaluate. For example:
In[82]:=
{a,b,c}={1,2,3};
ReleaseHold[Replace[Hold[{a,b,c}],s_Symbol:>Print[s^2],{2}]]
During evaluation of In[82]:= 1
During evaluation of In[82]:= 4
During evaluation of In[82]:= 9
Out[83]= {Null,Null,Null}
One can probably come up with more sensible examples. This is especially useful for things like code-generation - one less trivial example can be found here. The specific case at hand, as I already mentioned, does not really fall into the category of cases where Hold-ReleaseHold are beneficial - they are here just a workaround, which is not really necessary when you use delayed rules.
How a good Erlang programmer would write this code ?
loop(expr0) ->
case expr1 of
true ->
A = case expr2 of
true -> ...;
false -> ...
end;
false->
A = case expr3 of
true -> ...;
false -> ...
end
end,
loop(expr4(A)).
Generally speaking, you want to make your code more readable. It's usually a good idea extracting bits of code to functions to avoid long or deeply nested functions, and provide self-explained names that clarify the purpose of a piece of code:
loop(expr0) ->
case expr1 of
true ->
A = do_something(expr2);
false->
A = do_something_else(expr3)
end,
loop(expr4(A)).
do_something(E) ->
case E of
true -> ...;
false -> ...
end
do_something_else(E) ->
case E of
true -> ...;
false -> ...
end
Now, a casual reader knows that your function does something if expr1 is true and something else if expr1 is false. Good naming conventions help a lot here. You can also do that with comments, but code is never outdated, and thus easier to maintain. I also find short functions rather easier to read than really loooong functions. Even if those long functions have coments inlined.
Once you've stated clearly what your function does, you may want to shorten the code. Short code is easier to read and maintain, but don't shorten too much using "clever" constructions, or you'll obscure it, which is the opposite of what you want. You can start by using pattern matching in function heads:
loop(expr0) ->
case expr1 of
true ->
A = do_something(expr2);
false->
A = do_something_else(expr3)
end,
loop(expr4(A)).
do_something(true) -> ...;
do_something(false) -> ....
do_something_else(true) -> ...;
do_something_else(false) -> ....
Then, you can avoid repeating A in the main function (aside, variables scoped out of nested statements is a feature I always disliked)
loop(expr0) ->
A = case expr1 of
true -> do_something(expr2);
false-> do_something_else(expr3)
end,
loop(expr4(A)).
do_something(true) -> ...;
do_something(false) -> ....
do_something_else(true) -> ...;
do_something_else(false) -> ....
And I think that's it for this piece of code. With more context you can also go for some abstractions to reduce duplicity, but take care when abstracting, if you overdo it you'll also obscure the code again, losing the maintenance benefit you'd expected to get by removing similar code.
The code, as it is currently written, is hard to make simpler. The problem are the ExprX entries are unknown, so there is no way to simplify the code without knowing that it is beneficial to do so. If you have a more full example, we will have a much better time at attempting to do an optimization of it.
The concrete problem is that we don't know how Expr2 and Expr3 depends on Expr1 for instance. And we don't know what the purpose of Expr0 is, and neither about Expr4's dependence other than it uses the returned A.
Why need expr0 in loop function?
I encountered the following construct in various places throughout Ocaml project I'm reading the code of.
match something with
true -> foo
| false -> bar
At first glance, it works like usual if statement. At second glance, it.. works like usual if statement! At third glance, I decided to ask at SO. Does this construct have special meaning or a subtle difference from if statement that matters in peculiar cases?
Yep, it's an if statement.
Often match cases are more common in OCaml code than if, so it may be used for uniformity.
I don't agree with the previous answer, it DOES the work of an if statement but it's more flexible than that.
"pattern matching is a switch statement but 10 times more powerful" someone stated
take a look at this tutorial explaining ways to use pattern matching Link here
Also, when using OCAML pattern matching is the way to allow you break composed data to simple ones, for example a list, tuple and much more
> Let imply v =
match v with
| True, x -> x
| False, _ -> true;;
> Let head = function
| [] -> 42
| H:: _ -> am;
> Let rec sum = function
| [] -> 0
| H:: l -> h + sum l;;