How to detect if(true) and other refactoring issues?

How to detect if(true) and other refactoring issues? - refactoring

It is common in java, when using "modern" IDEs, to inline variable values and perform heavy refactoring that can, as an example, transform this source code
boolean test = true;
//...
if(test) {
//...
}
Into this code
if(true) {
//...
}
Obviously, this code can be simplified, but Eclipse won't perform that simplification for me.
So, is there any way (using Eclipse or - even better - maven) that can detect and (possibly) simplify that code ? (it would be obviously way better if such a tool was able to detect other wrong constructs like empty for loops, ...)

What you want is a Program Transformation system (PTS).
These are tools that read source code, build compiler data structures (almost always including at least an AST), carry out customized analysis and modification of the compiler data structures, and then regenerate source text (for the modified program) from those modified data structures.
Many of the PTS will allow you express changes to code directly in source-to-source form as rules, expressed in terms of the language syntax, metavariables, etc. The point of such a rule language is to let you express complex code transformations more easily.
Our DMS Software Reengineering Toolkit is such a PTS. You can easily simplify code with boolean expressions containing boolean constants with the following simple rules:
default domain Java~v7;
simplify_not_true(): primary -> primary
" ! true" -> "false";
simplify_not_false(): primary -> primary
" ! false" -> "true";
simplify_not_not(x: primary): primary -> primary
" ! ! \x " -> "\x";
simplify_and_right_true(x: term): conjunction -> conjunction ;
" \x && true " -> "\x";
simplify_and_left_true(x: term): conjunction -> conjunction ;
" true && \x " -> "\x";
simplify_and_left_false(x: term): conjunction -> conjunction ;
" false && \x " -> "false";
simplify_and_right_false(x: term): conjunction -> conjunction ;
" \x && false " -> "false"
if no_side_effects_or_exceptions(x); -- note additional semantic check here
simplify_or_right_false(x: term): disjunction -> disjunction ;
" \x || false " -> "\x";
simplify_or_left_false(x: term): disjunction -> disjunction ;
" false || \x " -> "\x";
simplify_or_right_true(x: term): disjunction -> disjunction ;
" \x || true " -> "true"
if no_side_effects_or_exceptions(x);
simplify_or_left_true(x: term): disjunction -> disjunction ;
" true || \x " -> "true";
(The grammar names "term", "primary", "conjunction", "disjunction" are directly from the BNF used to drive Java source code parsing.)
These rules together will take boolean expressions involving known boolean constants,
and simplify them down sometimes to simply "true" or "false".
To eliminate if-conditionals whose expressions are boolean constants one would write these:
simplify_if_true(b: block): statement -> statement
" if (true) \b" -> " \b ";
simplify_if_false(b: block): statement -> statement
" if (false) \b" -> ";" -- null statement
Together with boolean simplification, these two rules would get rid of conditionals for obviously true or obviously false conditionals.
To do what you want is bit more complicated, because you wish to propagate information from one place in the program, to another place possibly "far away". For that you need what amounts to a data flow analysis, showing where values can reach from their assignments:
default domain Java~v7;
rule propagate_constant_variables(i:IDENTIFIER): term -> term
" \i " -> construct_reaching_constant()
if constant_reaches(i);
This rule depends on a built-in analysis providing data flow facts and a custom
interface function "constant_reaches" that inspects this data.
(DMS has this for C, C++, Java and COBOL and support for doing it for other languages; to my knowledge, none of the other PTS mentioned in the Wikipedia article have these flow facts available). It also depends on a custom contructor "contruct_reaching_constant" to build a primitive tree node containing a reaching constant. These would be coded in DMS's underlying metaprogramming langauge and require a few tens of lines of code. Similarly the special condition discussed earlier "no_side_effects_or_exceptions"; this can be a lot more complex as the question about side effects may require an analysis of the full program.
There are tools such a Clang that can transform C++ code to some extent, but Clang does not have rewrite rules as PTS do, it is really a compiler with additional hooks.

Related

Switch statements in Prolog

In Prolog predicates, I often write repetitive conditional statements like this one, but I wish they could be written more concisely:
output(Lang, Type, Output) :-
(Lang = javascript ->
Output = ["function", Type];
Lang = ruby ->
Output = ["def", Type];
Lang = java ->
Output = [Type]).
Would it be possible to replace this series of conditional statements with a more concise switch-statement?

In Prolog it is quite easy to define your own control structures, using meta-predicates (predicates that take goals or predicates as arguments).
For example, you could implement a switch construct like
switch(X, [
a : writeln(case1),
b : writeln(case2),
c : writeln(case3)
])
by defining
switch(X, [Val:Goal|Cases]) :-
( X=Val ->
call(Goal)
;
switch(X, Cases)
).
If necessary, this can then be made more efficient by compile-time transformation as supported by many Prolog systems (inline/2 in ECLiPSe, or goal expansion in several other systems).
And via operator declarations you can tweak the syntax to pretty much anything you like.

It seems that multiple clauses are made for this use case and also quite concise.
output(javascript, Type, ["javascript", Type]).
output(ruby, Type, ["def", Type]).
output(java, Type, [Type]).

slightly shorter:
output(Lang, Type, Output) :-
(Lang, Output) = (javascript, ["function", Type]) ;
(Lang, Output) = (ruby, ["def", Type]) ;
(Lang, Output) = (java, [Type]).
idiomatic:
output(Lang, Type, Output) :-
memberchk(Lang-Output, [
javascript - ["function", Type],
ruby - ["def", Type],
java - [Type]
]).

Haskell debugging an arbitrary lambda expression

I have a set of lambda expressions which I'm passing to other lambdas. All lambdas rely only on their arguments, they don't call any outside functions. Of course, sometimes it gets quite confusing and I'll pass an function with the incorrect number of arguments to another, creating a GHCi exception.
I want to make a debug function which will take an arbitrary lambda expression (with an unknown number of arguments) and return a string based on the structure and function of the lambda.
For example, say I have the following lambda expressions:
i = \x -> x
k = \x y -> x
s = \x y z -> x z (y z)
debug (s k) should return "\a b -> b"
debug (s s k) should return "\a b -> a b a" (if I simplified that correctly)
debug s should return "\a b c -> a c (b c)"
What would be a good way of doing this?

I think the way to do this would be to define a small lambda calculus DSL in Haskell (or use an existing implementation). This way, instead of using the native Haskell formulation, you would write something like
k = Lam "x" (Lam "y" (App (Var "x") (Var "y")))
s = Lam "x" (Lam "y" (Lam "z" (App (App (Var "x") (Var "z")
(App (Var "y") (Var "z"))))
and similarly for s and i. You would then write/use an evaluation function so that you could write
debug e = eval e
debug (App s k)
which would give you the final form in your own syntax. Additionally you would need a sort of interpreter to convert your DSL syntax to Haskell, so that you can actually use the functions in your code.
Implementing this does seem like quite a lot of (tricky) work, and it's probably not exactly what you had in mind (especially if you need the evaluation for typed syntax), but I'm sure it would be a great learning experience. A good reference would be chapter 6 of "Write you a Haskell". Using an existing implementation would be a lot easier (but less fun :)).
If this is merely for debugging purposes you might benefit from looking at the core syntax ghc compiles to. See chapter 25 of Real world Haskell, the ghc flag to use is -ddump-simpl. But this would mean looking at generated code rather than generating a representation inside your program. I'm also not sure to what extent you would be able to identify specific functions in the Core code easily (I have no experience with this so YMMV).
It would of course be pretty cool if using show on functions would give the kind of output you describe but there are probably very good reasons functions are not an instance of Show (I wouldn't be able to tell you).

You can actually achieve that by utilising pretty-printing from Template Haskell, which comes with GHC out of the box.
First, the formatting function should be defined in separate module (that's a TH restriction):
module LambdaPrint where
import Control.Monad
import Language.Haskell.TH.Ppr
import Language.Haskell.TH.Syntax
showDef :: Name -> Q Exp
showDef = liftM (LitE . StringL . pprint) . reify
Then use it:
{-# LANGUAGE TemplateHaskell #-}
import LambdaPrint
y :: a -> a
y = \a -> a
$(return []) --workaround for GHC 7.8+
test = $(showDef 'y)
The result is more or less readable, not counting fully qualified names:
*Main> test
"Main.y :: forall a_0 . a_0 -> a_0"
Few words about what's going on. showDef is a macro function which reifies the definition of some name from the environment and pretty-prints it in a string literal expression. To use it, you need to quote the name of the lambda (using ') and splice the result (which is a quoted string expression) into some expression (using $(...)).

When generalizing monad, performance drops nearly 50%

I have code that does some parsing of files according to specified rules. The whole parsing takes place in a monad that is a stack of ReaderT/STTrans/ErrorT.
type RunningRule s a = ReaderT (STRef s LocalVarMap) (STT s (ErrorT String Identity)) a
Because it would be handy to run some IO in the code (e.g. to query external databases), I thought I would generalize the parsing, so that it could run both in Identity or IO base monad, depending on the functionality I would desire. This changed the signature to:
type RunningRule s m a = ReaderT (STRef s LocalVarMap) (STT s (ErrorT String m)) a
After changing the appropriate type signatures (and using some extensions to get around the types) I ran it again in the Identity monad and it was ~50% slower. Although essentially nothing changed, it is much slower. Is this normal behaviour? Is there some simple way how to make this faster? (e.g. combining the ErrorT and ReaderT (and possibly STT) stack into one monad transformer?)
To add a sample of code - it is a thing that based on a parsed input (given in C-like language) constructs a parser. The code looks like this:
compileRule :: forall m. (Monad m, Functor m) =>
-> [Data -> m (Either String Data)] -- For tying the knot
-> ParsedRule -- This is the rule we are compiling
-> Data -> m (Either String Data) -- The real parsing
compileRule compiled (ParsedRule name parsedlines) =
\input -> runRunningRule input $ do
sequence_ compiledlines
where
compiledlines = map compile parsedlines
compile (Expression expr) = compileEx expr >> return ()
compile (Assignment var expr) =
...
compileEx (Function "check" expr) = do
value <- expr
case value of
True -> return ()
False -> fail "Check failed"
where
code = compileEx expr

This is not so unusual, no. You should try using SPECIALIZE pragmas to specialize to Identity, and maybe IO too. Use -ddump-simpl and watch for warnings about rule left hand sides being too complicated. When specialization doesn't happen as it should, GHC ends up passing around typeclass dictionaries at runtime. This is inherently somewhat inefficient, but more importantly it prevents GHC from inlining class methods to enable further simplification.

General-purpose language to specify value constraints

I am looking for a general-purpose way of defining textual expressions which allow a value to be validated.
For example, I have a value which should only be set to 1, 2, 3, 10, 11, or 12.
Its constraint might be defined as: (value >= 1 && value <= 3) || (value >= 10 && value <= 12)
Or another value which can be 1, 3, 5, 7, 9 etc... would have a constraint like value % 2 == 1 or IsOdd(value).
(To help the user correct invalid values, I'd like to show the constraint - so something descriptive like IsOdd is preferable.)
These constraints would be evaluated both on client-side (after user input) and server-side.
Therefore a multi-platform solution would be ideal (specifically Win C#/Linux C++).
Is there an existing language/project which allows evaluation or parsing of similar simple expressions?
If not, where might I start creating my own?
I realise this question is somewhat vague as I am not entirely sure what I am after. Searching turned up no results, so even some terms as a starting point would be helpful. I can then update/tag the question accordingly.

You may want to investigate dependently typed languages like Idris or Agda.
The type system of such languages allows encoding of value constraints in types. Programs that cannot guarantee the constraints will simply not compile. The usual example is that of matrix multiplication, where the dimensions must match. But this is so to speak the "hello world" of dependently typed languages, the type system can do much more for you.

If you end up starting your own language I'd try to stay implementation-independent as long as possible. Look for the formal expression grammars of a suitable programming language (e.g. C) and add special keywords/functions as required. Once you have a formal definition of your language, implement a parser using your favourite parser generator.
That way, even if your parser is not portable to a certain platform you at least have a formal standard from where to start a separate parser implementation.

You may also want to look at creating a Domain Specific Language (DSL) in Ruby. (Here's a good article on what that means and what it would look like: http://jroller.com/rolsen/entry/building_a_dsl_in_ruby)
This would definitely give you the portability you're looking for, including maybe using IronRuby in your C# environment, and you'd be able to leverage the existing logic and mathematical operations of Ruby. You could then have constraint definition files that looked like this:
constrain 'wakeup_time' do
6 <= value && value <= 10
end
constrain 'something_else' do
check (value % 2 == 1), MustBeOdd
end
# constrain is a method that takes one argument and a code block
# check is a function you've defined that takes a two arguments
# MustBeOdd is the name of an exception type you've created in your standard set
But really, the great thing about a DSL is that you have a lot of control over what the constraint files look like.

there are a number of ways to verify a list of values across multiple languages. My preferred method is to make a list of the permitted values and load them into a dictionary/hashmap/list/vector (dependant on the language and your preference) and write a simple isIn() or isValid() function, that will check that the value supplied is valid based on its presence in the data structure. The beauty of this is that the code is trivial and can be implemented in just about any language very easily. for odd-only or even-only numeric validity again, a small library of different language isOdd() functions will suffice: if it isn't odd it must by definition be even (apart from 0 but then a simple exception can be set up to handle that, or you can simply specify in your code documentation that for logical purposes your code evaluates 0 as odd/even (your choice)).
I normally cart around a set of c++ and c# functions to evaluate isOdd() for similar reasons to what you have alluded to, and the code is as follows:
C++
bool isOdd( int integer ){ return (integer%2==0)?false:true; }
you can also add inline and/or fastcall to the function depending on need or preference; I tend to use it as an inline and fastcall unless there is a need to do otherwise (huge performance boost on xeon processors).
C#
Beautifully the same line works in C# just add static to the front if it is not going to be part of another class:
static bool isOdd( int integer ){ return (integer%2==0)?false:true; }
Hope this helps, in any event let me know if you need any further info:)

Not sure if it's what you looking for, but judging from your starting conditions (Win C#/Linux C++) you may not need it to be totally language agnostic. You can implement such a parser yourself in C++ with all the desired features and then just use it in both C++ and C# projects - thus also bypassing the need to add external libraries.
On application design level, it would be (relatively) simple - you create a library which is buildable cross-platform and use it in both projects. The interface may be something simple like:
bool VerifyConstraint_int(int value, const char* constraint);
bool VerifyConstraint_double(double value, const char* constraint);
// etc
Such interface will be usable both in Linux C++ (by static or dynamic linking) and in Windows C# (using P/Invoke). You can have same codebase compiling on both platforms.
The parser (again, judging from what you've described in the question) may be pretty simple - a tree holding elements of types Variable and Expression which can be Evaluated with a given Variable value.
Example class definitions:
class Entity {public: virtual VARIANT Evaluate() = 0;} // boost::variant may be used typedef'd as VARIANT
class BinaryOperation: public Entity {
private:
Entity& left;
Entity& right;
enum Operation {PLUS,MINUS,EQUALS,AND,OR,GREATER_OR_EQUALS,LESS_OR_EQUALS};
public:
virtual VARIANT Evaluate() override; // Evaluates left and right operands and combines them
}
class Variable: public Entity {
private:
VARIANT value;
public:
virtual VARIANT Evaluate() override {return value;};
}
Or, you can just write validation code in C++ and use it both in C# and C++ applications :)

My personal choice would be Lua. The downside to any DSL is the learning curve of a new language and how to glue the code with the scripts but I've found Lua has lots of support from the user base and several good books to help you learn.
If you are after making somewhat generic code that a non programmer can inject rules for allowable input it's going to take some upfront work regardless of the route you take. I highly suggest not rolling your own because you'll likely find people wanting more features that an already made DSL will have.

If you are using Java then you can use the Object Graph Navigation Library.
It enables you to write java applications that can parse,compile and evaluate OGNL expressions.
OGNL expressions include basic java,C,C++,C# expressions.
You can compile an expression that uses some variables, and then evaluate that expression
for some given variables.

An easy way to achieve validation of expressions is to use Python's eval method. It can be used to evaluate expressions just like the one you wrote. Python's syntax is easy enough to learn for simple expressions and english-like. Your expression example is translated to:
(value >= 1 and value <= 3) or (value >= 10 and value <= 12)
Code evaluation provided by users might pose a security risk though as certain functions could be used to be executed on the host machine (such as the open function, to open a file). But the eval function takes extra arguments to restrict the allowed functions. Hence you can create a safe evaluation environment.
# Import math functions, and we'll use a few of them to create
# a list of safe functions from the math module to be used by eval.
from math import *
# A user-defined method won't be reachable in the evaluation, as long
# as we provide the list of allowed functions and vars to eval.
def dangerous_function(filename):
print open(filename).read()
# We're building the list of safe functions to use by eval:
safe_list = ['math','acos', 'asin', 'atan', 'atan2', 'ceil', 'cos', 'cosh', 'degrees', 'e', 'exp', 'fabs', 'floor', 'fmod', 'frexp', 'hypot', 'ldexp', 'log', 'log10', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan', 'tanh']
safe_dict = dict([ (k, locals().get(k, None)) for k in safe_list ])
# Let's test the eval method with your example:
exp = "(value >= 1 and value <= 3) or (value >= 10 and value <= 12)"
safe_dict['value'] = 2
print "expression evaluation: ", eval(exp, {"__builtins__":None},safe_dict)
-> expression evaluation: True
# Test with a forbidden method, such as 'abs'
exp = raw_input("type an expression: ")
-> type an expression: (abs(-2) >= 1 and abs(-2) <= 3) or (abs(-2) >= 10 and abs(-2) <= 12)
print "expression evaluation: ", eval(exp, {"__builtins__":None},safe_dict)
-> expression evaluation:
-> Traceback (most recent call last):
-> File "<stdin>", line 1, in <module>
-> File "<string>", line 1, in <module>
-> NameError: name 'abs' is not defined
# Let's test it again, without any extra parameters to the eval method
# that would prevent its execution
print "expression evaluation: ", eval(exp)
-> expression evaluation: True
# Works fine without the safe dict! So the restrictions were active
# in the previous example..
# is odd?
def isodd(x): return bool(x & 1)
safe_dict['isodd'] = isodd
print "expression evaluation: ", eval("isodd(7)", {"__builtins__":None},safe_dict)
-> expression evaluation: True
print "expression evaluation: ", eval("isodd(42)", {"__builtins__":None},safe_dict)
-> expression evaluation: False
# A bit more complex this time, let's ask the user a function:
user_func = raw_input("type a function: y = ")
-> type a function: y = exp(x)
# Let's test it:
for x in range(1,10):
# add x in the safe dict
safe_dict['x']=x
print "x = ", x , ", y = ", eval(user_func,{"__builtins__":None},safe_dict)
-> x = 1 , y = 2.71828182846
-> x = 2 , y = 7.38905609893
-> x = 3 , y = 20.0855369232
-> x = 4 , y = 54.5981500331
-> x = 5 , y = 148.413159103
-> x = 6 , y = 403.428793493
-> x = 7 , y = 1096.63315843
-> x = 8 , y = 2980.95798704
-> x = 9 , y = 8103.08392758
So you can control the allowed functions that should be used by the eval method, and have a sandbox environment that can evaluate expressions.
This is what we used in a previous project I worked in. We used Python expressions in custom Eclipse IDE plug-ins, using Jython to run in the JVM. You could do the same with IronPython to run in the CLR.
The examples I used in part inspired / copied from the Lybniz project explanation on how to run a safe Python eval environment. Read it for more details!

You might want to look at Regular-Expressions or RegEx. It's proven and been around for a long time. There's a regex library all the major programming/script languages out there.
Libraries:
C++: what regex library should I use?
C# Regex Class
Usage
Regex Email validation
Regex to validate date format dd/mm/yyyy

how to simplify/improve an Erlang code?

How a good Erlang programmer would write this code ?
loop(expr0) ->
case expr1 of
true ->
A = case expr2 of
true -> ...;
false -> ...
end;
false->
A = case expr3 of
true -> ...;
false -> ...
end
end,
loop(expr4(A)).

Generally speaking, you want to make your code more readable. It's usually a good idea extracting bits of code to functions to avoid long or deeply nested functions, and provide self-explained names that clarify the purpose of a piece of code:
loop(expr0) ->
case expr1 of
true ->
A = do_something(expr2);
false->
A = do_something_else(expr3)
end,
loop(expr4(A)).
do_something(E) ->
case E of
true -> ...;
false -> ...
end
do_something_else(E) ->
case E of
true -> ...;
false -> ...
end
Now, a casual reader knows that your function does something if expr1 is true and something else if expr1 is false. Good naming conventions help a lot here. You can also do that with comments, but code is never outdated, and thus easier to maintain. I also find short functions rather easier to read than really loooong functions. Even if those long functions have coments inlined.
Once you've stated clearly what your function does, you may want to shorten the code. Short code is easier to read and maintain, but don't shorten too much using "clever" constructions, or you'll obscure it, which is the opposite of what you want. You can start by using pattern matching in function heads:
loop(expr0) ->
case expr1 of
true ->
A = do_something(expr2);
false->
A = do_something_else(expr3)
end,
loop(expr4(A)).
do_something(true) -> ...;
do_something(false) -> ....
do_something_else(true) -> ...;
do_something_else(false) -> ....
Then, you can avoid repeating A in the main function (aside, variables scoped out of nested statements is a feature I always disliked)
loop(expr0) ->
A = case expr1 of
true -> do_something(expr2);
false-> do_something_else(expr3)
end,
loop(expr4(A)).
do_something(true) -> ...;
do_something(false) -> ....
do_something_else(true) -> ...;
do_something_else(false) -> ....
And I think that's it for this piece of code. With more context you can also go for some abstractions to reduce duplicity, but take care when abstracting, if you overdo it you'll also obscure the code again, losing the maintenance benefit you'd expected to get by removing similar code.

The code, as it is currently written, is hard to make simpler. The problem are the ExprX entries are unknown, so there is no way to simplify the code without knowing that it is beneficial to do so. If you have a more full example, we will have a much better time at attempting to do an optimization of it.
The concrete problem is that we don't know how Expr2 and Expr3 depends on Expr1 for instance. And we don't know what the purpose of Expr0 is, and neither about Expr4's dependence other than it uses the returned A.

Why need expr0 in loop function?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to detect if(true) and other refactoring issues? - refactoring

Related

Switch statements in Prolog

Haskell debugging an arbitrary lambda expression

When generalizing monad, performance drops nearly 50%

General-purpose language to specify value constraints

how to simplify/improve an Erlang code?

Categories

Resources