Is there any formal definition for "refactoring"? - refactoring

Anyone knows a way to define refactoring in a more formal way?
UPDATE.
A refactoring is a pair R = (pre; T) where pre is the precondition that
the program must satisfy, and T is the program transformation.

It's an interesting question and one I hadn't considered. I did a little googling and came up with this paper (PDF) on refactoring in AOP that attempts to apply some mathematical modeling to aspects to show the that functional aspects have the same flexibility as traditional aspects but with reduced complexity. I didn't read the whole paper, but you might find something there.
Another interesting idea would be to think of refactorings along the same lines as compiler optimizations. Essentially, the compiler refactors your code on the fly, although with different goals than code-level refactoring. You'd have to somehow quantify code complexity and readability in a reasonable way to demonstrate how a particular refactoring affects it. Coming up with the model would probably be the hard part.
I also found this paper that establishes an algebra of OO programming and derives some basic laws, then uses those basic rules to derive a more complicated refactoring.
Interesting stuff. Hope this helps.

refactoring is a series of correctness-preserving transformations, but refactoring may result in more general code than the original
so we can't just assert that a refactoring transformation T on program P has the same properties R before and after refactoring, but the properties R' of the refactored program P' should be at least equivalent to R
given program P implies R
refactoring transformation T(P) produces P'
where (P' implies R') and (R' is equivalent to or subsumes R')
we can also assert that the inputs and outputs remain the same or equivalent
but to follow your example, perhaps we want to define a refactoring transformation T as a 4-tuple P,I,O,R where P is the original program, I is the inputs and/or preconditions, O is the outputs and/or postcondition, and R is the transformed program, then assert (using temporal logic?) that
P:I -> O
holds before transformation
T(P) -> R
defines the transformation, and
R:I -> O
holds after transformation
my symbolic math is rusty, but that's a general direction
this would make a good master's thesis, BTW

It could be interesting to note that most of the Refactorings come in pairs:
Add Parameter - Remove Parameter
Extract Class/Method - Inline Class/Method
Pull Up Field/Method - Pull Down Field/Method
Change Bidirectional Association to Unidirectional - Change Unidirectional Association to Bidirectional
...
Applying the two refactorings of the pair is a null transformation.
For a refactoring pair R, R' :
R'(R(code)) = code

Well not directly, but in terms of money - I can say Yes. I can't come up with an equation on that :)
Code well-written, free of complexity (that could be due to refactoring) can save time/effort and hence money.

Related

Is it possible to convert all higher-order logic and dependent type in Lean/Isabelle/Coq into first-order logic?

More specially, given arbitrary Lean proof/theorem, is it possible to express it solely using first-order logic? If so, is it practical, i.e. the generated FOL will not be enormously large?
I have seen https://www.cl.cam.ac.uk/~lp15/papers/Automation/translations.pdf, but since I am not an expert, I am still not sure whether all Lean's proof code can be converted.
Other mathematical proof languages are also OK.
The short answer is: yes, it is not impractically large and this is done in particular when translating proofs to SMT solvers for sledgehammer-like tools. There is a fair amount of blowup, but it is a linear factor on the order of 2-5. You probably lose more from not having specific support for all the built in rules, and in the case of DTT, writing down all the defeq proofs which are normally implicit.

Efficient representation of functions

Function type A -> B in some sense is not very good. Though functions are first class values, one often cannot freely operate them due to efficiency problems. You can't apply too many transformations (A -> B) -> (C -> D), at some point you have to compute a value.
Obviously this is due to the non-strict nature of -> .
There are well know tricks to deal with functions of type Double -> Double. One can represent them as vectors given a certain basis, which can consist of trig functions, polynomials etc.
Are there any general tricks to get round the inefficiency of the A -> B type?
Or alternatives to -> ?
Your concern seems to be that given h === f • g, f • g is typically less efficient than h. Given a composition of functions known at compile time, there are two tricks performed by the compiler which can render f • g more efficient than you would suspect -- inlining, and fusion. Inlining avoids the extra indirection of a second function call, and opens up many new opportunities for optimizations. Stream fusion (or build/foldr fusion) can (to give a basic example) turn compositions such as map f . map g into map (f . g) thereby reducing the number of traversals of a structure by a constant factor. Fusion operates not only on lists, but other structures, and provides one reason for the efficient performance of Haskell libraries such as Vector.
Short cut fusion: http://www.haskell.org/haskellwiki/Correctness_of_short_cut_fusion
Stream fusion: What is Haskell's Stream Fusion
I cannot confirm this. As a productive user and implementor of AFRP, I am performing transformations on fully polymorphic functions a lot, deeply nested and for long running applications. Note that Haskell compilers do not use the traditional stack-based function calling paradigm. They use graph reduction algorithms. We don't have the same problems as, say, C.
One of the most general tricks is memoization - storing the value of a function after you computed it. Links: Haskellwiki, SO example, MemoCombinators. As you mentioned, the other trick is when you have a nice type of function (polynomial, vector, Taylor series etc.) - then it can be represented as a list, expression etc.
FWIW: In Felix, which is a whole program analyser relying heavily on inlining for performance, function arguments have three kinds: eager, lazy, or "let the compiler decide".
Eager arguments are evaluated and assigned to variable before the body of the function is evaluated.
Lazy arguments are evaluated by replacing the parameter with the argument expression wherever it occurs.
The default is "let the compiler decide". For a large amount of "ordinary" code (whatever that means) it doesn't make any difference whether you use eager or lazy evaluation.
Generally in Felix lazy evaluation is faster: note carefully this does NOT mean closures. It means inlining. However sometimes the compiler will chose eager evaluation, it reduces code bloat, and too much inlining is counter productive. I make no claim the algorithm is any good .. however Felix can sometimes outperform C and Ocaml (GHC didn't get into the finals).
As a simple example .. type classes. Felix has typeclasses, sort of like Haskell. No or very little performance overhead .. certainly no dictionaries!
In my view, Haskell would be a lot better if you just chucked out the archaic concept of separate compilation: whole program analysers can do so much more, and text is much faster to work with than object code (given the complete freedom to cache compilation results). It's crazy to have a lazy language use a compilation model designed for eager evaluation!
The other thing a Haskell variant might try is to drop the idea all functions are lazy, and instead adopt the idea that the evaluation strategy is irrelevant, unless otherwise specified. That may allow a lot more optimisation opportunities.

Logic for software verification

I'm looking at the requirements for automated software verification, i.e. a program that takes in code (ordinary procedural code written in languages like C and Java), generates a bunch of theorems saying that each loop must eventually halt, no assertion will be violated, there will never be a dereference of a null pointer etc., then passes same to a theorem prover to prove they are actually true (or else find a counterexample indicating a bug in the code).
The question is what kind of logic to use. The two major positions seem to be:
First-order logic is just fine.
First-order logic isn't expressive enough, you need higher order logic.
Problem is, there seems to be a lot of support for both positions. So which one is right? If it's the second one, are there any available examples of things you want to do, that verifiers based on first-order logic have trouble with?
You can do everything you need in FOL, but it's a lot of extra work - a LOT! Most existing systems were developed by academics / people with not a lot of time, so they are tempted to take short cuts to save time / effort, and thus are attracted to HOLs, functional languages, etc. However, if you want to build a system that is to be used by hundreds of thousands of people, rather than merely hundreds, we believe that FOL is the way to go because it is far more accessible to a wider audience. There's just no substitute for doing the work; we've been at this for 25 years now! Please take a look at our project (http://www.manmademinions.com)
Regards, Aaron.
In my practical experience, it seems to be "1. First-order logic is just fine". For examples of complete specifications for various functions written entirely in a specification language based on first-order logic, see for instance ACSL by Example or this case study.
First-order logic has automated provers (not proof assistants) that have been refined over the years to handle well properties that come from program verification. Notable automated provers for these uses are for instance Simplify, Z3, and Alt-ergo. If these provers fail and there is no obvious lemma/assertion you can add to help them, you still have the recourse of starting up a proof assistant for the difficult proof obligations. If you use HOL on the other hand, you cannot use Simplify, Z3 or Alt-ergo at all, and while I have heard of automated provers for high-order logic, I have never heard them praised for their efficiency when it comes to properties from programs.
We've found that FOL is fine for most verification conditions, but higher order logic is invaluable for a small number, for example for proving properties about summation of the elements in a collection. So our theorem prover (used in Perfect Developer and Escher C Verifier) is basically first order, but with the ability to do some higher order reasoning as well.

How to calculate indefinite integral programmatically

I remember solving a lot of indefinite integration problems. There are certain standard methods of solving them, but nevertheless there are problems which take a combination of approaches to arrive at a solution.
But how can we achieve the solution programatically.
For instance look at the online integrator app of Mathematica. So how do we approach to write such a program which accepts a function as an argument and returns the indefinite integral of the function.
PS. The input function can be assumed to be continuous(i.e. is not for instance sin(x)/x).
You have Risch's algorithm which is subtly undecidable (since you must decide whether two expressions are equal, akin to the ubiquitous halting problem), and really long to implement.
If you're into complicated stuff, solving an ordinary differential equation is actually not harder (and computing an indefinite integral is equivalent to solving y' = f(x)). There exists a Galois differential theory which mimics Galois theory for polynomial equations (but with Lie groups of symmetries of solutions instead of finite groups of permutations of roots). Risch's algorithm is based on it.
The algorithm you are looking for is Risch' Algorithm:
http://en.wikipedia.org/wiki/Risch_algorithm
I believe it is a bit tricky to use. This book:
http://www.amazon.com/Algorithms-Computer-Algebra-Keith-Geddes/dp/0792392590
has description of it. A 100 page description.
You keep a set of basic forms you know the integrals of (polynomials, elementary trigonometric functions, etc.) and you use them on the form of the input. This is doable if you don't need much generality: it's very easy to write a program that integrates polynomials, for example.
If you want to do it in the most general case possible, you'll have to do much of the work that computer algebra systems do. It is a lifetime's work for some people, e.g. if you look at Risch's "algorithm" posted in other answers, or symbolic integration, you can see that there are entire multi-volume books ("Manuel Bronstein, Symbolic Integration Volume I: Springer") that have been written on the topic, and very few existing computer algebra systems implement it in maximum generality.
If you really want to code it yourself, you can look at the source code of Sage or the several projects listed among its components. Of course, it's easier to use one of these programs, or, if you're writing something bigger, use one of these as libraries.
These expert systems usually have a huge collection of techniques and simply try one after another.
I'm not sure about WolframMath, but in Maple there's a command that enables displaying all intermediate steps. If you do so, you get as output all the tried techniques.
Edit:
Transforming the input should not be the really tricky part - you need to write a parser and a lexer, that transforms the textual input into an internal representation.
Good luck. Mathematica is very complex piece of software, and symbolic manipulation is something that it does the best. If you are interested in the topic take a look at these books:
http://www.amazon.com/Computer-Algebra-Symbolic-Computation-Elementary/dp/1568811586/ref=sr_1_3?ie=UTF8&s=books&qid=1279039619&sr=8-3-spell
Also, going to the source wouldn't hurt either. These book actually explains the inner workings of mathematica
http://www.amazon.com/Mathematica-Book-Fourth-Stephen-Wolfram/dp/0521643147/ref=sr_1_7?ie=UTF8&s=books&qid=1279039687&sr=1-7

Can any algorithmic puzzle be implemented in a purely functional way?

I've been contemplating programming language designs, and from the definition of Declarative Programming on Wikipedia:
This is in contrast from imperative programming, which requires a detailed description of the algorithm to be run.
and further down:
... Any style of programming that is not imperative. ...
It then goes on to express that functional languages, because they are not imperative, are declarative by their very nature.
However, this makes me wonder, are purely functional programming languages able to solve any algorithmic problem, or are the constraints based upon what functions are available in that language?
I'm mostly interested in general thoughts on the subject, although if specific examples can illustrate the point, I certainly welcome them.
According to the Church-Turing Thesis ,
the three computational processes (recursion, λ-calculus, and Turing machine) were shown to be equivalent"
where Turing machine can be read as "procedural" and lambda calculus as "functional".
Yes, Haskell, Erlang, etc. are Turing complete languages. In principle, you don't need mutable state to solve a problem, since you can always create a new object instead of mutating the old one. Of course, Brainfuck is also Turing complete. In other words, just because an algorithm can be expressed in a functional language doesn't mean it's not horribly awkward.
OK, so Church and Turing provied it is possible, but how do we actually do something?
Rewriting imperative code in pure functional style is an exercise I frequently assign to undergraduate students:
Each mutable variable becomes a function parameter
Loops are rewritten using recursion
Each goto is expressed as a function call with arguments
Sometimes what comes out is a mess, but often the results are surprisingly elegant. The only real trick is not to pass arguments that never change, but instead to let-bind them in the outer environment.
The big difference with functional style programming is that it avoids mutable state. Where imperative programming will typically update variables, functional programming will define new, read-only values.
The main place where this will hit performance is with algorithms that use updatable arrays. An imperative implementation can update an array element in O(1) time, while the best a purely functional style of implementation can achieve is O(log N) (using a sorted tree).
Note that functional languages generally have some way to use updateable arrays with O(1) access time (e.g., Haskell provides this with its state transformer monad). However, this is arguably an imperative programming method... nothing wrong with that; you want to use the best tools for a particular job, after all.
The functional style of O(log N) incremental array update is not all bad, though, as functional style algorithms seem to lend themselves well to parallellization.
Too long to be posted as a comment on #SteveB's answer.
Functional programming and imperative programming have equal capability: whatever one can do, the other can do. They are said to be Turing complete. The functions that a Turing machine can compute are exactly the ones that recursive function theory and λ-calculus express.
But the Church-Turing Thesis, as such, is irrelevant. It asserts that any computation can be carried out by a Turing machine. This relates an informal idea - computation - to a formal one - the Turing machine. Nobody has yet found anything we would recognise as computation that a Turing machine can't do. Will someone find such a thing in future? Who can tell.
Using state monads you can program in an imperative style in Haskell.
So the assertion that Haskell is declarative by its very nature needs to be taken with a grain of salt. On the positive side it then is equivalent to imperative programming languages, also in a practical sense which doesn't completely ignore efficiency.
While I completely agree with the answer that invokes Church-Turing thesis, this begs an interesting question actually. If I have a parallel computation problem (which is not algorithmic in a strict mathematical sense), such as multiple producer/consumer queue or some network protocol between several machines, can this be adequately modeled by Turing machine? It can be simulated, but if we simulate it, we lose the purpose why we have the parallelism in the problem (because then we can find simpler algorithm on the Turing machine). So what if we were not to lose parallelism inherent to the problem (and thus the reason why are we interested in it), we couldn't remove the notion of state?
I remember reading somewhere that there are problems which are provably harder when solved in a purely functional manner, but I can't seem to find the reference.
As noted above, the primary problem is array updates. While the compiler may use a mutable array under the hood in some conditions, it must be guaranteed that only one reference to the array exists in the entire program.
Not only is this a hard mathematical fact, it is also a problem in practice, if you don't use impure constructs.
On a more subjective note, stating that all Turing complete languages are equivalent is only true in a narrow mathematical sense. Paul Graham explores the issue in Beating the Averages in the section "The Blub Paradox."
Formal results such as Turing-completeness may be provably correct, but they are not necessarily useful. The travelling salesman problem may be NP-complete, and yet salesman travel all the time. It seems they don't feel the need to follow an "optimal" path, so the theorem is irrelevant.
NOTE: I am not trying to bash functional programming, since I really like it. It is just important to remember that it is not a panacea.

Resources