Process and tools used in refactoring software - refactoring

I am looking for re-factoring software for language like C++/Java/C# ... that provide information on how they do re-factoring.
What method they use to detect portion of code that needs re-factoring and how do they keep the program integrity when applying changes. And if possible what tools do they use.
Thanks

If you want to understand how refactoring tools work, you need, as a foundation, to learn essentially how compilers work: parsing, symbol-table construction, various kinds of control and dataflow analysis, program analysis, program transformation. Refactoring engines build on top of this.
Details on how tools handle specific refactorings tend to be found in the software engineering research literature. Check out http://scholar.google.com, and use the search term "refactoring"; you'll get buried under papers that address different kinds of refactorings, and different approaches to doing them.
The question about "finding (single) refactoring tools for list of languages?" is pretty hard to answer. Most refactoring tools are difficult to build (see compiler technology discussion above), so you tend not to see "one" tool that does them all, but rather one tool per language/IDE. Language specific tools are relatively easy to find: google "refactoring tool language".
One insight, however, is that the machinery to do such refactoring tasks have a lot of basic technology foundations in common in the abstract; see my first paragraph above.
It is unfortunate that the way most refactoring tools are built, is to construct all
this machinery for just the one specific refactoring tool, which helps explain why they are hard to build, therefore expensive to build and therefore rare. They are also built using traditional compiler techniques (e.g., traditional parsers [with limitations that cause people to continually complain], and procedural programming (after all, that's the way we've done that since the days of the pyramids, right?).
Tools called program transformation engines try to instantiate this shared commonality, thereby amortizing the cost of building the baseline across many tasks, and to address the problem of building complex code transformations in easier ways, to make it easier to implement such tasks, by providing non-procedural means to express what needs to be done.
You can see an example of program transformations applied to Java; the message from that example is the same transformation engine can be used to "refactor" code in other languages, which is I think what you were originally trying to ask about. (Full disclosure: I'm behind the tool in the example).

Related

Does a Tool for Automatically Visualizing a Project's Source Code's Control Flow In-Line Exist?

I would like to be able to use a tool that lets you visualize a program's control flow(s) in the context of its source code. To clarify, such a tool should basically show what happens in a program by spitting out a human-readable abstract syntax tree in the form of a multidigraph with nodes containing snippets of source-code translation units. The resulting graph initial node would, I presume, contain the block of code starting with a program's entry point (that'd be main for a C or C++ program.) New nodes would be created when a node needs to reference another block of code, whether that might be in the current file or in another one, and arrows would connect the nodes. Does such a tool exist, or would it have to be created from scratch?
You aren't going to get a tool that does this for arbitrary languages off the shelf. There are too many languages, each with its own syntax and semantics. You somehow need a tool per language. You might find such tools for very commonly used languages, e.g, Understand for Software.
I think that the only way to do this is to build metatools that enable the construction of language-specific tools relatively easily. Such a tool has to have the common machinery needed by all such language processing tools: strong parsers (so writing grammars for languages is relatively straightforward), AST construction machinery, symbol table support, routines to build control and data flow graphs. By providing such machinery, one can build language front ends for modest costs.
There's a class of tools that does this, program transformation. Most of them have parsing engines, but not the rest of the mechanisms I have suggested above.
I believe this enough to have invested 20 years of my life to building
such meta tools. Our DMS Software Reengineering toolkit shows its strength in being able to parse some 50+ languages, including the stunningly hard C++14 (both MS and GNU variants). It shows symbol table support and control flow graph construction for COBOL, Java, C, C++. (We can't do everything at once; pedaling as fast as practical).
[DMS builds these graphs as data structures rather than "showing" them; the examples on that page are drawn with the additional help of DOT].
One of the few other tools that tries to do this is Clang/LLVM; this covers a wide variety of popular languages. Clang doesn't have any specific support for parsing that I know about; you get to code it all yourself. I think you get control flow graphs only after you convert the language to LLVM. I don't think it has any specific support for drawing control flow graphs, either.
An older tool with a good reputation for multi-language support in this space is CoCo/R;
I don't know a lot about it. I know it parses,
and has some support for ASTs; I don't know what it does
about control flow analysis.

Is there any scripting language that's fast, easy to embed, and well-suited for high-level game-programming?

First off, I'm aware that there are many questions related to this, but none of them seemed to help my specific situation. In particular, lua and python don't fit my needs as well as I could hope. It may be that no language with my requirements exists, but before coming to that conclusion it'd be nice to hear a few more opinions. :)
As you may have guessed, I need such a language for a game engine I'm trying to create. The purpose of this game engine is to provide a user with the basic tools for building a game, while still giving her the freedom of creating many different types of games.
For this reason, the scripting language should be able to handle game concepts intuitively. Among other things, it should be easy to define a variety of types, sub-type them with slightly different properties, query and modify objects dynamically, and so on.
Furthermore, it should be possible for the game developer to handle every situation they come across in the scripting language. While basic components like the renderer and networking would be implemented in C++, game-specific mechanisms such as rotating a few hundred objects around a planet will be handled in the scripting language. This means that the scripting language has to be insanely fast, 1/10 C speed is probably the minimum.
Then there's the problem of debugging. Information about the function, stack trace and variable states that the error occurred in should be accessible.
Last but not least, this is a project done by a single person. Even if I wanted to, I simply don't have the resources to spend weeks on just the glue code. Integrating the language with my project shouldn't be much harder than integrating lua.
Examining the two suggested languages, lua and python, lua is fast(luajit) and easy to integrate, but its standard debugging facilities seem to be lacking. What's even worse, lua by default has no type-system at all. Of course you can implement that on your own, but the syntax will always be weird and unintuitive.
Python, on the other hand, is very comfortable to use and has a basic class system. However, it's not that easy to integrate, it's paradigm doesn't really involve type-checking and it's definitely not fast enough for more complex games. I'd again like to point out that everything would be done in python. I'm well aware that python would likely be fast enough for 90% of the code.
There's also Scala, which I haven't seen suggested so far. Scala seems to actually fulfill most of the requirements, but embedding the Java VM with C doesn't seem very easy, and it generally seems like java expects you to build your application around java rather than the other way around. I'm also not sure if Scala's functional paradigm would be good for intuitive game-development.
EDIT: Please note that this question isn't about finding a solution at any cost. If there isn't any language better than lua, I will simply compromise and use that(I actually already have the thing linked into my program). I just want to make sure I'm not missing something that'd be more suitable before doing so, seeing as lua is far from the perfect solution for me.
You might consider mono. I only know of one success story for this approach, but it is a big one: C++ engine with mono scripting is the approach taken in Unity.
Try the Ring programming language
http://ring-lang.net
It's general-purpose multi-paradigm scripting language that can be embedded in C/C++ projects, extended using C/C++ code and/or used as standalone language. The supported programming paradigms are Imperative, Procedural, Object-Oriented, Functional, Meta programming, Declarative programming using nested structures, and Natural programming.
The language is simple, trying to be natural, encourage organization and comes with transparent implementation. It comes with compact syntax and a group of features that enable the programmer to create natural interfaces and declarative domain-specific languages in a fraction of time. It is very small, fast and comes with smart garbage collector that puts the memory under the programmer control. It supports many programming paradigms, comes with useful and practical libraries. The language is designed for productivity and developing high quality solutions that can scale.
The compiler + The Virtual Machine are 15,000 lines of C code
Embedding Ring Interpreter in C/C++ Programs
https://en.wikibooks.org/wiki/Ring/Lessons/Embedding_Ring_Interpreter_in_C/C%2B%2B_Programs
For embeddability, you might look into Tcl, or if you're into Scheme, check out SIOD or Guile. I would suggest Lua or Python in general, of course, but your question precludes them.
Since noone seems to know a combination better than lua/luajit, I think I will leave it at that. Thanks for everyone's input on this. I personally find lua to be very lacking as a high-level language for game-programming, but it's probably the best choice out there. So to whomever finds this question and has the same requirements(fast, easy to use, easy to embed), you'll either have to use lua/luajit or make your own. :)

How to draw a tiger with just 3 lines?

Background:
An art teacher once gave me a design problem to draw a tiger using only 3 lines. The idea being that I study a tiger and learn the 3 lines to draw for people to still be able to tell it is a tiger.
The solution for this problem is to start with a full drawing of a tiger and remove elements until you get to the three parts that are most recognizable as a tiger.
I love this problem as it can be applied in multiple disciplines like software development, especially in removing complexity.
At work I deal with maintaining a large software system that has been hacked to death and is to the point of becoming unmaintainable. It is my job to remove the burdensome complexity that was caused by past developers.
Question:
Is there a set process for removing complexity in software systems - a kind of reduction process template to be applied to the problem?
Check out the book Refactoring by Martin Fowler, and his http://www.refactoring.com/ website.
Robert C. Martin's Clean Code is another good resource for reducing code complexity.
Unfortunately, the analogy with the tiger drawing may not work very well. With only three lines, a viewer can imagine the rest. In a software system, all the detail has to actually be there. You generally can't take much away without removing something essential.
Check out the book Anti-Patterns for a well-written book on the whole subject of moving from bad (or maladaptive) design to better. It provides ways to recover from a whole host of problems typically found in software systems. I would then add support to Kristopher's recommendation of Refactoring as an important second step.
Checkout the book, Working Effectively with Legacy Code
The topics covered include
Understanding the mechanics of software change: adding features, fixing bugs, improving design, optimizing performance
Getting legacy code into a test harness
Writing tests that protect you against introducing new problems
Techniques that can be used with any language or platform—with examples in Java, C++, C, and C#
Accurately identifying where code changes need to be made
Coping with legacy systems that aren't object-oriented
Handling applications that don't seem to have any structure
This book also includes a catalog of twenty-four dependency-breaking techniques that help you work with program elements in isolation and make safer changes.
While intellectually stimulating, the concept of detail removal doesn't carry very well (at least as-is) to software programs. The reason being that the drawing is re-evaluated by a human with it ability to accept fuzzy input, whereby the program is re-evaluated by a CPU which is very poor at "filling the blanks". Another more subtle reason is that programs convey a spaciotemporal narrative, whereas the drawing is essentially spacial.
Consequently with software there is much less room for approximation, and for outright removal of particular sections of the code. Never the less, refactoring is the operational keyword and is sometimes applicable even for them most awkward legacy pieces. This discipline is however part art part science and doesn't have very many "quick tricks" that I know of.
Edit: One isn't however completely helpless against legacy code. See for example the excellent book references provided in Alex Baranosky and Kristopher Johnson's answers. These books provide many useful techniques, but on the whole I remain strong in my assertion that refactoring non-trivial legacy code is an iterative process that requires both art and science (and patience and ruthlessness and gentleness ;-) ).
This is a loaded question :-)
First, how do we measure "complexity"? Without any metric decided apriori, it may be hard to justify any "reduction" project.
Second, is the choice entirely yours? If we may take an example, assume that, in some code base, the hammer of "inheritance" is used to solve every other problem. While using inheritance is perfectly right for some cases, it may not be right for all cases. What do you in such cases?
Third, can it be proved that behavior/functionality of the program did not change due to refactoring? (This gets more complex when the code is part of a shipping product.)
Fourth, you can start with start with simpler things like: (a) avoid global variables, (b) avoid macros, (c) use const pointers and const references as much as possible, (d) use const qualified methods wherever it is the logical thing to do. I know these are not refactoring techniques, but I think they might help you proceed towards your goal.
Finally, in my humble opinion, I think any such refactoring project is more of people issue than technology issue. All programmers want to write good code, but the perception of good vs. bad is very subjective and varies across members in the same team. I would suggest to establish a "design convention" for the project (Something like C++ Coding Standards). If you can achieve that, you are mostly done. The remaining part is modify the parts of code which does not follow the design convention. (I know, this is very easy to say, but much difficult to do. Good wishes to you.)

When is a new language the right tool for the job?

For a long time I've been trying different languages to find the feature-set I want and I've not been able to find it. I have languages that fit decently for various projects of mine, but I've come up with an intersection of these languages that will allow me to do 99.9% of my projects in a single language. I want the following:
Built on top of .NET or has a .NET implementation
Has few dependencies on the .NET runtime both at compile-time and runtime (this is important since one of the major use cases is in embedded development where the .NET runtime is completely custom)
Has a compiler that is 100% .NET code with no unmanaged dependencies
Supports arbitrary expression nesting (see below)
Supports custom operator definitions
Supports type inference
Optimizes tail calls
Has explicit immutable/mutable definitions (nicety -- I've come to love this but can live without it)
Supports real macros for strong metaprogramming (absolute must-have)
The primary two languages I've been working with are Boo and Nemerle, but I've also played around with F#.
Main complaints against Nemerle: The compiler has horrid error reporting, the implementation is buggy as hell (compiler and libraries), the macros can only be applied inside a function or as attributes, and it's fairly heavy dependency-wise (although not enough that it's a dealbreaker).
Main complaints against Boo: No arbitrary expression nesting (dealbreaker), macros are difficult to write, no custom operator definition (potential dealbreaker).
Main complaints against F#: Ugly syntax, hard to understand metaprogramming, non-free license (epic dealbreaker).
So the more I think about it, the more I think about developing my own language.
Pros:
Get the exact syntax I want
Get a turnaround time that will be a good deal faster; difficult to quantify, but I wouldn't be surprised to see 1.5x developer productivity, especially due to the test infrastructures this can enable for certain projects
I can easily add custom functionality to the compiler to play nicely with my runtime
I get something that is designed and works exactly the way I want -- as much as this sounds like NIH, this will make my life easier
Cons:
Unless it can get popularity, I will be stuck with the burden of maintenance. I know I can at least get the Nemerle people over, since I think everyone wants something more professional, but it takes a village.
Due to the first con, I'm wary of using it in a professional setting. That said, I'm already using Nemerle and using my own custom modified compiler since they're not maintaining it well at all.
If it doesn't gain popularity, finding developers will be much more difficult, to an extent that Paul Graham might not even condone.
So based on all of this, what's the general consensus -- is this a good idea or a bad idea? And perhaps more helpfully, have I missed any big pros or cons?
Edit: Forgot to add the nesting example -- here's a case in Nemerle:
def foo =
if(bar == 5)
match(baz) { | "foo" => 1 | _ => 0 }
else bar;
Edit #2: Figured it wouldn't hurt to give an example of the type of code that will be converted to this language if it's to exist (S. Lott's answer alone may be enough to scare me away from doing it). The code makes heavy use of custom syntax (opcode, :=, quoteblock, etc), expression nesting, etc. You can check a good example out here: here.
Sadly, there's no metrics or stories around failed languages. Just successful languages. Clearly, the failures outnumber the successes.
What do I base this on? Two common experiences.
Once or twice a year, I have to endure a pitch for a product/language/tool/framework that will Absolutely Change Everything. My answer has been constant for the last 20 or so years. Show me someone who needs support and my company will support them. And that's that. Never hear from them again. Let's say I've heard 25 of these.
Once or twice each year, I have to work with a customer who has orphaned technology. At some point in the past, some clever programming built a tool/framework/library/package that was used internally for several projects. Then that programmer left. No one else can figure that darn thing out, and they want us to replace/rewrite it. Sadly, we can't figure it out either, and our proposal is to rewrite from scratch. And they complain that their genius built the set of apps in a period of weeks, it can't take us months to rewrite them in Java/Python/VB/C#. Let's say I've written 25 or so of these kinds of proposals.
That's just me, one consultant.
Indeed one particularly sad situation was a company who's entire IT software portfolio was written by one clever guy with a private language and tools. He hadn't left, but he'd realized that his language and toolset had fallen way behind the times -- the state of the art had moved on, and he hadn't.
And the move was -- of course -- in an unexpected direction. His language and tools were okay, but the world had started to adopt relational databases, and he had absolutely no way to upgrade his junk to move away from flat files. It was something he had not foreseen. Indeed, it was something he could not possibly foresee. [You won't fall into this trap, will you?]
So, we talked. He rewrote a lot of the applications in Plain-Old VAX Fortran (yes, this is a long time ago.) And he rewrote it to use plain old relational SQL stuff (Ingres, at the time.)
After a year of coding, they were having performance problems. They called me back to review all the great stuff they'd done in replacing the home-built language. Sadly, they'd done the worst possible relational database design. Worst possible. They'd taken their file copies, merges, sorts, and what-not, and implemented each low-level file system operation using SQL, duplicating database rows left, right and center.
He was so mired in his private vision of the perfect language, that he couldn't adapt to a relatively common, pervasive new technology.
I say go for it.
It would be an awesome experience regardless of weather it makes it to production or not.
If you make it compile down to IL then you do not have to worry about not being able to re-use your compiled assemblies with C#
If you believe that you have valid complaints about the languages you listed above, it is likely that many will think like you. Of course, for every 1000 interested person there might be 1 willing to help you maintain it - but that is always the risk
But here are a few things to be cautioned about:
Get your language specification IN STONE before development. Make sure any and all language features are figured out before hand - even things that you may only want in the future. In my opinion, C# is slowly falling into the "oh-just-one-more-language-extension" trap that will lead to its eventual doom.
Be sure to make it optimized. I dont know what you already know; but if you dont know then learn ;) Nobody will want a language that has nice syntax but runs as slow as IE's javascript implementation.
Good luck :D
When I first started my career in the early 90s, there seemed to be this craze of everyone developing their own in-house languages. My first 3 jobs were with companies that had done this. One company had even developed their own operating system!
From experience, I'd say this is a bad idea for the following reasons:
1) You will spend time debugging the language itself in addition to the code base on top of it
2) Any developers you hire will need to go through the learning curve of the language
3) It will be hard to attract and keep developers since working in a proprietary language is a dead-end for someone's career
The main reason I left those three jobs was because they had proprietary languages and you'll notice that not many companies take this route any more :).
An additional argument I'd make is that most languages have entire teams whose full time job it is to develop the language. Maybe you'd be an exception, but I'd be very surprised if you'd be able to match that level of development by only working on the language part-time.
Main complaints against Nemerle: The
compiler has horrid error reporting,
the implementation is buggy as hell
(compiler and libraries), the macros
can only be applied inside a function
or as attributes, and it's fairly
heavy dependency-wise (although not
enough that it's a dealbreaker).
I see your post has been written more than two years ago.
I advise you trying Nemerle language today.
The compiler is stable. There are no blocker bugs for today.
The VS integration has a lot of improvements , also there is SharpDevelop integration.
If you give it a chance, you won't be disappointed.
NEVER EVER develop your own language.
Developing your own language is a fool's trap, and worse it will limit you to what your imagination can provide, as well demanding that you work out both your development environment and the actual programme you're writing.
The cases in which this doesn't apply are pretty much if you're Larry Wall, the AWK guys, or part of a substantial group of people dedicated to testing the boundaries of programming. If you're in any of those categories, you don't need my advice, but I strongly doubt that you're targeting a niche where there is no suitable programming language for the task AND the characteristics of the people doing the task.
If you are as clever as you seem to be (a likely possibility), my advice is to go ahead and do the design of the language first, iterate a couple of times over it, ask some smart fellows you trust in smart programming language related communities about the concrete design you came up with and then take the decision.
You might realize in the process of creating the design that just a quick hack on Nemerle would give it all you need, for example. Many things can happen just when thinking hard about a problem, and the final solution might not be what you actually had in mind when beginning the project.
Worst case scenario, you're stuck with actually implementing the design, but by then you will have it proof read and mature, and you'll know with a high degree of certainty that it was a good path to take.
A related piece of advice, start small, just define the features you absolutely need and then build on them to get the rest.
Writing your own language is not a easy project.. Especially one to be used in any kind of "professional setting"
It is a huge amount of work, and I would doubt you could write your own language, and still write any big projects that use it - you will spend so long adding features that you need, fixing bugs, and general language-design stuff.
I would strongly recommend choosing a language that is closest to what you want, and extending it to do what you need. It'll never be exactly what you want, but compared to the time you'll spend writing your own language, I would say that's a small compromise..
Scala has a .NET compiler. I don't know the status of this though. It's kind of a second class citizen in the Scala world (which is more focused on the JVM). But it might be a good tradeof to adopt the .NET compiler instead of creating a new language from scratch.
Scala is kind of weak in the meta-programming department ATM. It's possible that the need for metaprogramming is somewhat reduced by other language features. In any case I don't think anyone would be sad if you were to implement metaprogramming features for it. Also there is a compiler plug-in infrastructure on the way.
I think most languages will never fit all of the bill.
You might want to combine your 2 favourite languages (in my case C# and Scheme) and use them together.
From a professional point of view, this probably not a good idea though.
It would be interesting to hear some of the things you feel you can't do in existing languages. What kind of projects are you working on that can't be done in C#?
I'm just curios!

Good references / tips for designing rule systems?

I often need to implement some sort of rule system that is user-editable -- the requirements are generally different enough that the same system isn't directly applicable, so I frequently run into the same problem--how do I design a rule system that
is maintainable
properly balances expressiveness with ease of use
is easily extended (if/when I get (2) wrong).
I think Rule systems / DSLs are extremely valuable, but I don't feel comfortable with my ability to design them properly.
What references / tips do you have to offer that may help make this easier?
Because of the nature of the problems I run into, existing languages are generally not applicable. (For example, you would not require that general computer users learn python in order to write an email filter.) Similarly, rule languages, such as JESS, are only a partial solution, since some (simpler) user interface needs to be built on-top of the rule language so non-programmers can make use of it. This interface invariably involves removing some features, or making those features more difficult to use, and that process poses the same problems described above.
Edit: To clarify, the question is about designing a rule engine, I'm not looking for a pre-built rule engine. If you suggest a rule engine, please explain how it addresses the question about making good design decisions.
We had an in-house demo of this tool by it's vendor:
http://www.rulearts.com/rulexpress.php
As a company, we have a lot of experience with rule engines (e.g. Cleverpath Aion), but mostly developer-oriented tools. This tool (rulexpress) is very business-people oriented. It's not a rule engine. But it can output all the data in xml (so basically any format you like), and this is something we would then consider as input for a real rule engine, e.g. Windows Workflow Foundation (not one of the bigger/better rule engines, but still).
The tool in itself looked pretty good, some stuff I had never seen in any developer-oriented tool.
There are also some tools for rule management built around WF, if that's your rule engine of choice, check out InRule.
Edited after original question was clarified:
Although I have dabbled in this a long time ago (writing a little language in javacc), I would consider this a bad time investment now. My comment above is in the same spirit: take a simple rule engine, a simple (commercial) UI that makes it easy for business users to maintain, and only invest time in tying the two together.
We have had luck with this: http://msdn.microsoft.com/en-us/library/bb472424.aspx
A Ruby implementation to consider is Ruleby (http://ruleby.org/wiki/Ruleby)
One thing I've found is that being able to define rules as expression trees makes implementation so much simpler. As you correctly mentioned, the requirements from project to project are so different that you just about have to reimplement every time. Expression trees coupled with something like the visitor pattern make for a very (no pun intended) expressive framework that is easily extensible. And you can easily put a very dynamic GUI on top of expression trees which meets that aspect of your requirement.
Hopefully this doesn't sound like I'm saying that everything looks like a nail with my hammer because that's not the case ... it's just that in my experience, this has come in handy more than once :-)
First of all, normally it is not advised to let end-users define the rules. That's because they do not have development background and could simply write "code" that goes into infinite loop or does other weird things.
So either the system has to protect against that kind of behavior (thus, making it more complex), accept such possibility, or disallow end-users to do this.
If you are working with .NET then it is hideously easy to create your own DSL by extending the Boo compiler (i.e. with Rhino.DSL you can have simple DSL with one class).

Resources