Is Automatic Refactoring Possible in Dynamic Languages? - refactoring

Perhaps I am limited by my experience with dynamic languages (Ruby on Netbeans and Groovy on Eclipse), but it seems to me that the nature of dynamic languages makes it impossible to refactor (renaming methods, classes, pushing-up, pulling-down, etc.) automatically.
Is it possible to refactor AUTOMATICALLY in any dynamic language (with any IDE/tool)? I am especially interested in Ruby, Python and Groovy, and how the refactoring compares to the 100% automatic refactoring available in all Java IDEs.

Given that automatic refactoring was invented in a dynamic language (Smalltalk), I would have to say "Yes".
In particular, John Brant, Don Roberts and Ralph Johnson developed the Refactoring Browser which is one of the core tools in, for instance, Squeak.
My Google-fu is weak today, but you could try find this paper: Don Roberts, John Brant, and Ralph Johnson, A Refactoring Tool for Smalltalk, "The Theory and Practice of Object Systems", (3) 4, 1997.

Smalltalk does not declare any types. The Refactoring Browser has successfully performed correct refactorings in commercial code since 1995 and is incorporated in nearly all current Smalltalk IDE's. - Don Roberts

Automatic Refactoring was invented in Smalltalk, a highly dynamic language.
And it works like a charm ever since.
You can try yourself in a free Smalltalk version (for instance http://pharo-project.org)
In a dynamic language you can also script refactorings yourself or query the
system. Simple example to get the number of Test classes:
TestCase allSubclasses size

I have wondered the same thing. I'm not a compiler/interpreter writer, but I think the answer will be that it is impossible to get it perfect. However, you can get it correct in most cases.
First, I'm going to change the name "dynamic" language to "interpreted" language which is what I think of with Ruby, Javascript, etc. Interpreted languages tend to take advantage of run-time capabilities.
For instance, most scripting languages allow the following
-- pseudo-code but you get the idea
eval("echo(a)");
I just "ran" a string! You would have to refactor that string also. And will a be a variable or does this language allow you to print the character a without quotes if there is no variable a?
I want to believe this kind of coding is probably the exception and that you will get good refactoring almost all of the time. Unfortunately it seems that when I look through libraries for scripting languages, they get into such exceptions normally and maybe even base their architecture on them.
Or to up the ante a bit:
def functionThatAssumesInputWillCreateX(input)
eval(input)
echo(x)
def functionWithUnknownParms( ... )
eval(argv[1]);
At least when you refactor Java, and change a variable from int to string, you get errors in all the places that were expecting the int still:
String wasInt;
out = 3 + wasInt;
With interpreted languages you will probably not see this until run-time.

Ditto the points about the Refactoring Browser...it is highly effective in Smalltalk. However, I imagine there are certain types of refactoring that would be impossible without type information (whether obtain by explicit type annotation in the language or through some form of type inferencing in a dynamic language is irrelevant). One example: when renaming a method in Smalltalk, it will rename all implementors and senders of that method, which most often is just fine, but is sometimes undesirable. If you had type information on variables, you could scope the rename to just the implementors in the current class hierarchy and all senders when the message is being sent to a variable declared to be of a type in that hierarchy (however, I could imagine scenarios where even with type declaration, that would break down and produce undesirable results).

Related

How to Work with Ruby Duck Typing

I am learning Ruby and I'm having a major conceptual problem concerning typing. Allow me to detail why I don't understand with paradigm.
Say I am method chaining for concise code as you do in Ruby. I have to precisely know what the return type of each method call in the chain, otherwise I can't know what methods are available on the next link. Do I have to check the method documentation every time?? I'm running into this constantly running tutorial exercises. It seems I'm stuck with a process of reference, infer, run, fail, fix, repeat to get code running rather then knowing precisely what I'm working with during coding. This flies in the face of Ruby's promise of intuitiveness.
Say I am using a third party library, once again I need to know what types are allow to pass on the parameters otherwise I get a failure. I can look at the code but there may or may not be any comments or declaration of what type the method is expecting. I understand you code based on methods are available on an object, not the type. But then I have to be sure whatever I pass as a parameter has all the methods the library is expect, so I still have to do type checking. Do I have to hope and pray everything is documented properly on an interface so I know if I'm expected to give a string, a hash, a class, etc.
If I look at the source of a method I can get a list of methods being called and infer the type expected, but I have to perform analysis.
Ruby and duck typing: design by contract impossible?
The discussions in the preceding stackoverflow question don't really answer anything other than "there are processes you have to follow" and those processes don't seem to be standard, everyone has a different opinion on what process to follow, and the language has zero enforcement. Method Validation? Test-Driven Design? Documented API? Strict Method Naming Conventions? What's the standard and who dictates it? What do I follow? Would these guidelines solve this concern https://stackoverflow.com/questions/616037/ruby-coding-style-guidelines? Is there editors that help?
Conceptually I don't get the advantage either. You need to know what methods are needed for any method called, so regardless you are typing when you code anything. You just aren't informing the language or anyone else explicitly, unless you decide to document it. Then you are stuck doing all type checking at runtime instead of during coding. I've done PHP and Python programming and I don't understand it there either.
What am I missing or not understanding? Please help me understand this paradigm.
This is not a Ruby specific problem, it's the same for all dynamically typed languages.
Usually there are no guidelines for how to document this either (and most of the time not really possible). See for instance map in the ruby documentation
map { |item| block } → new_ary
map → Enumerator
What is item, block and new_ary here and how are they related? There's no way to tell unless you know the implementation or can infer it from the name of the function somehow. Specifying the type is also hard since new_ary depends on what block returns, which in turn depends on the type of item, which could be different for each element in the Array.
A lot of times you also stumble across documentation that says that an argument is of type Object, Which again tells you nothing since everything is an Object.
OCaml has a solution for this, it supports structural typing so a function that needs an object with a property foo that's a String will be inferred to be { foo : String } instead of a concrete type. But OCaml is still statically typed.
Worth noting is that this can be a problem in statically typed lanugages too. Scala has very generic methods on collections which leads to type signatures like ++[B >: A, That](that: GenTraversableOnce[B])(implicit bf: CanBuildFrom[Array[T], B, That]): That for appending two collections.
So most of the time, you will just have to learn this by heart in dynamically typed languages, and perhaps help improve the documentation of libraries you are using.
And this is why I prefer static typing ;)
Edit One thing that might make sense is to do what Scala also does. It doesn't actually show you that type signature for ++ by default, instead it shows ++[B](that: GenTraversableOnce[B]): Array[B] which is not as generic, but probably covers most of the use cases. So for Ruby's map it could have a monomorphic type signature like Array<a> -> (a -> b) -> Array<b>. It's only correct for the cases where the list only contains values of one type and the block only returns elements of one other type, but it's much easier to understand and gives a good overview of what the function does.
Yes, you seem to misunderstand the concept. It's not a replacement for static type checking. It's just different. For example, if you convert objects to json (for rendering them to client), you don't care about actual type of the object, as long as it has #to_json method. In Java, you'd have to create IJsonable interface. In ruby no overhead is needed.
As for knowing what to pass where and what returns what: memorize this or consult docs each time. We all do that.
Just another day, I've seen rails programmer with 6+ years of experience complain on twitter that he can't memorize order of parameters to alias_method: does new name go first or last?
This flies in the face of Ruby's promise of intuitiveness.
Not really. Maybe it's just badly written library. In core ruby everything is quite intuitive, I dare say.
Statically typed languages with their powerful IDEs have a small advantage here, because they can show you documentation right here, very quickly. This is still accessing documentation, though. Only quicker.
Consider that the design choices of strongly typed languages (C++,Java,C#,et al) enforce strict declarations of type passed to methods, and type returned by methods. This is because these languages were designed to validate that arguments are correct (and since these languages are compiled, this work can be done at compile time). But some questions can only be answered at run time, and C++ for example has the RTTI (Run Time Type Interpreter) to examine and enforce type guarantees. But as the developer, you are guided by syntax, semantics and the compiler to produce code that follows these type constraints.
Ruby gives you flexibility to take dynamic argument types, and return dynamic types. This freedom enables you to write more generic code (read Stepanov on the STL and generic programming), and gives you a rich set of introspection methods (is_a?, instance_of?, respond_to?, kind_of?, is_array?, et al) which you can use dynamically. Ruby enables you to write generic methods, but you can also explicity enforce design by contract, and process failure of contract by means chosen.
Yes, you will need to use care when chaining methods together, but learning Ruby is not just a few new keywords. Ruby supports multiple paradigms; you can write procedural, object oriend, generic, and functional programs. The cycle you are in right now will improve quickly as you learn about Ruby.
Perhaps your concern stems from a bias towards strongly typed languages (C++, Java, C#, et al). Duck typing is a different approach. You think differently. Duck typing means that if an object looks like a , behaves like a , then it is a . Everything (almost) is an Object in Ruby, so everything is polymorphic.
Consider templates (C++ has them, C# has them, Java is getting them, C has macros). You build an algorithm, and then have the compiler generate instances for your chosen types. You aren't doing design by contract with generics, but when you recognize their power, you write less code, and produce more.
Some of your other concerns,
third party libraries (gems) are not as hard to use as you fear
Documented API? See Rdoc and http://www.ruby-doc.org/
Rdoc documentation is (usually) provided for libraries
coding guidelines - look at the source for a couple of simple gems for starters
naming conventions - snake case and camel case are both popular
Suggestion - approach an online tutorial with an open mind, do the tutorial (http://rubymonk.com/learning/books/ is good), and you will have more focused questions.

Is ruby a pure object oriented programming language even though it doesn't support multiple inheritance? Please Explain

Is ruby a pure object oriented programming language even though it doesn't support multiple inheritance? If so how?, please explain.
I know that to an extent substitutes the lack of multiple inheritance by allowing one to include multiple modules within a class.
Also, I'm not sure of all of the prerequisites of a pure OOP Language. From this article, they mention
a Ruby class can have only one method with a given name (if you define
a method with the same name twice, the latter method definition
prevails..
So does it mean that Ruby doesn't support Overloading methods. If so, it still can qualify as a pure OOP Lanaguage ? If so, kindly explain the reason behind this as well.
Thank you.
There are several different families of object-oriented languages. If you're thinking of multiple inheritance and method overloading, you're probably coming from a C++ mindset where these things are taken for granted. These conventions came from earlier languages that C++ is heavily influenced by.
Ruby doesn't concern itself with the type of objects, but rather the methods they're capable of responding to. This is called duck typing and is what separates Smalltalk-inspired languages like Ruby from the more formal Simula or ALGOL influenced languages like C++.
Using modules it's possible to "mix in" methods from various sources and have a sort of multiple inheritance, but strictly speaking it's not possible for a class to have more than one immediate parent class. In practice this is generally not a big deal, as inheritance is not the only way of adding methods.
Method overloading is largely irrelevant in Ruby because of duck-typing. In C++ you might have various methods for handling string, int or float types, but in Ruby you'd have one that calls to_f on whatever comes in and manipulates it accordingly. In this sense, Ruby methods are a lot more like C++ templates.
If multiple inheritance were the only "symptom" of a OOP language, then neither would Java, C#, Python, and many more be OOP languages.
What makes a language object-oriented in the first place are naturally objects. Everything is an object in ruby. The whole language is built on the concept of objects and data. Different objects can "communicate" between each other, you can encapsulate data, etc.
Take a look at this resource: Definitions for ObjectOriented.
In the first place, the problem of multiple inheritance makes sense only for an object oriented language. The very question of asking about multiple inheritance with Ruby itself proves that Ruby is an object oriented language.

Code "internationalization"

I worked on different projects in different countries and remarked that sometimes the code became internationalized, like
SetLargeurEtHauteur() (for SetWidthAndHeight, fr)
Dim _ListaDeObiecte as List(Of Object) (for _ObjectList, ro)
internal void SohranenieUserov() (for SaveUsers, ru)
etc.
It happens that in countries with Latin alphabet this mix is more pronounced, because there is no need of transliteration.
More than that, often the programming "jargon" is inspired by the project specifications language. There are cases that terms in "project language" have a meaning that is not "translatable" in English.
There are also projects on which works only, say a French team, uses French words (say, Personne, Vehicule, Projet etc).
In that cases I personally add in specifications a "Dictionary" that explains all business object names and only these objects are used in other (French) language.
Say:
Collectif - ensemble des Personnes;
All the actions(Get, Set, Update, Modify, Load, etc) are in English.
Now that "strong" names could be used in code:
AddPersonneToCollectif.
What is your approach to "internationalization"?
PS.
I was amused that VisualStudio compiles and runs projects in .NET with buttons named à la "btnAddÉlève" or "кпкСтоп"...
My personal approach, which is shared by many but not all in the programming community, is that source code should be in English and, if possible, all the development tools should be in English too.
The most important reason for this is being able to share your problems and solutions with the world (like we are doing now in StackOverflow, no less) without having to translate class names, error messages, paths and other artifacts every time.
It also helps consistency, because most libraries are written in English and having element names that mix two languages doesn't really help anyone, besides being a constant focus of internal conflicts when a verb like Add isn't always traslated.
English code also makes it easier to add foreign people to a project without worrying about comprehension and misunderstandings (especially between closely related languages, like Spanish and Portuguese, which have lots of false cognates)
A good link on this subject: http://www.codinghorror.com/blog/2009/03/the-ugly-american-programmer.html
(In case anyone wonders, I'm south-american and English is not my primary language)
Even if everyone on the team is a good English speaker (which is not a given), they may not necessarily know the English equivalent of all the business terminology.
I think it's a project-specific decision what to allow, but I would generally tolerate and in some cases encourage business terms (e.g. entity names) in the local language, but not technical terms (i.e. not Largeur/Hauteur instead of Width/Height).
For example in the financial world in France, everyone knows what is meant by OPCVM and FCP - if you attempt English translations you might end up with more misunderstandings than you do by allowing mixed languages.
I have the same issue with Norwegian currently. I guess it depends on your position in the project, the available time and the role of the software.
In my case, I have decided to keep all terms in an existing protocol and library I am working with in Norwegian, as I can reasonably expect that generations of administrative workers have gotten used to these, and since the library depends on the protocol. In a library wrapper for an international project, I have translated each method name literally, and added an English language documentation of the method.
Comments and documentation on the code are in English.
If designing a software from scratch, I would try to find English terms for all method names and even business terms (if reasonable. I can hardly think of an example where no term can be found though.), to keep it "portable".
If you're writing code that you may be used internationally one day, write it in English. In doubt, write it in English, even comments if you can (although I suppose you can add a few comments in the language of your workplace).
It's not specific to coding unfortunately. English isn't my native language, but I've been able to read a number of technical papers and participate to international conferences with people from all over the world. These collaborations simply wouldn't work if everyone published in their respective native languages.
It may be sad if you feel like defending your language at all cost, but you have to be realistic about it. I suppose English has the advantage to be relatively simple for achieving a basic level: no genders for names, no conjugations, no cases.
Generally code referring to language concepts should be in the same mother language as the programming language (i.e. English - for, while, string are all English words).
It's OK (but not great) to have variables and domain concepts in a local language, but you definitely don't want to be translating List, Object, Decimal, etc. into terms which cause programmers more work in reconciling two languages. Even still, I would strongly lobby to restrict very common domain concepts like Collection, Membership, Person, User and possibly less common domain concepts like Invoice, Receipt to English where this is possible.
It would be like coding half your classes in VB and half in C# - your brain has to make a cognitive shift. While this is good for hybrid apps (JavaScript on the web and C# on the backend) because it helps you keep what's running where clear, it isn't good for a general programming.
In addition, using English for everything makes the domain and language words work together better.
There are always exceptions. There are certain cases where you would use a native word anyway - where the word describes the domain best. For instance, in our (English) code base, we had references to Mexican Spanish terms for certain concepts which were only relevant for people running our software in Mexico. Typically, Japanese terms were spelled out phonetically/Romaji, though - it was difficult for non-Japanese to be able to pronounce the pictograms ;-).
I think I'd call a code base like that "abysmal" rather than "internationalized", but the general rule I've always heard is that if you ever think that someone other than one who speaks your language might ever touch the code, do it in english.
I think that good design guideline is to write code with use of English names and with english comments only (of course if your team is capable to do this, but in case of international team English seems to be natural choose, since it's a most popular language, expecially in IT world).
Good explanation of such guidline is that keywords in most of programming languages are taken from English so writing your code using English names gives more consistent look and thanks to this you end with code that is easier to read.
Another reason is that most of compilers can handle only ascii characters as names of classes, methods, etc. so probably you will end with some strange names when you decide to use some language with alphabet containing non ascii chars.
Third reason that came to my mind is sharing your code on site like SO. Today I opened a post with a piece of code where classes had Spanish names. It was hard for me to guess what was the purpose of this class (even if sometimes is not necessary it is good when you read code and understand all used words:)).
To sum up I think that internationalization of code is not a good idea. You can imagine that keywords in programming languages (e.g. class, try, while) could also be localized and probably you can imagine also how hard life could be then...
To keep things consistent, I would make the code the same (human) language as the (programming) language. That is, if the programming language uses English keywords (like for, switch, public, etc) then keep the rest of the code in English. If you are using a compiler that recognizes (say) the Swahili translations of keywords, then keep the rest of the code in Swahili.
Many APIs have standardized naming schemes that are followed regardless of (human) language, and the accompanying documentation is translated as needed (instead of the source code).
No matter what (human) language you choose, pick one and stick with it. I'd much rather try to wade through source code in German than code that was a mix of German and English.

Duck type as syntactic sugar for reflection: Good or bad idea?

I've been thinking lately, would it be a good form of syntactic sugar in languages like Java and C#, to include a "duck" type as a method parameter type? This would look as follows:
void myFunction(duck foo) {
foo.doStuff();
}
This could be syntactic sugar for invoking doStuff() via reflection, or it could be implemented differently. Foo could be any type. If foo does not have a doStuff() method, this would throw a runtime exception. The point is that you would have the benefits of the more rigid pre-specified interface paradigm (performance, error checking) when you want them, i.e. most of the time. At the same time, you'd have a simple, clean-looking backdoor to duck typing, which would allow you to cleanly make changes not foreseen in the initial design without massive refactoring. Furthermore, it would likely be 100% backwards compatible and mesh cleanly with existing language constructs. I think this might help cut down on the overengineered just-in-case programming style that leads to confusing, cluttered APIs. Do you believe that something like this would be a good or bad thing in static OO languages like C# and Java?
The dynamic keyword supports these exact semantics and will be in C# 4.0.
It is not just for reflection, though. It is an implementation of dynamic dispatch which uses reflection only if no other mechanism is available.
This question also has lots of good information.
Actually, that's very much like the "dynamic" type that will be part of C#4.
My short answer is "Yes, Duck Typing is something that would be a useful tool in OO languages. The C# team agrees, and is putting it into C# 4.0"
Long answer:
Duck Typing has its pros and cons... there are a lot of people who swear it off completely, but I'll tell you what: There are some very compelling reasons for using Duck Typing for things that can't be solved well in any other case.
See my post on an interface that took a Dictionary instead of an IDictionary where Duck Typing would have saved the day:
http://www.houseofbilz.com/archive/2008/09/22/why-you-really-need-to-think-about-your-interfaces.aspx
C# has SOME duck typing built in since 1.0! Take the foreach keyword, for instance. It is a common misconception that your type needs to be IEnumerable in order to use an object in a foreach loop. It turns out, in fact, that this is NOT true. You only need to implement GetEnumerator() and implementing the interface is not necessary (it IS good form, though). This is Duck Typing.

Standards Document

I am writing a coding standards document for a team of about 15 developers with a project load of between 10 and 15 projects a year. Amongst other sections (which I may post here as I get to them) I am writing a section on code formatting. So to start with, I think it is wise that, for whatever reason, we establish some basic, consistent code formatting/naming standards.
I've looked at roughly 10 projects written over the last 3 years from this team and I'm, obviously, finding a pretty wide range of styles. Contractors come in and out and at times, and sometimes even double the team size.
I am looking for a few suggestions for code formatting and naming standards that have really paid off ... but that can also really be justified. I think consistency and shared-patterns go a long way to making the code more maintainable ... but, are there other things I ought to consider when defining said standards?
How do you lineup parenthesis? Do you follow the same parenthesis guidelines when dealing with classes, methods, try catch blocks, switch statements, if else blocks, etc.
Do you line up fields on a column? Do you notate/prefix private variables with an underscore? Do you follow any naming conventions to make it easier to find particulars in a file? How do you order the members of your class?
What about suggestions for namespaces, packaging or source code folder/organization standards? I tend to start with something like:
<com|org|...>.<company>.<app>.<layer>.<function>.ClassName
I'm curious to see if there are other, more accepted, practices than what I am accustomed to -- before I venture off dictating these standards. Links to standards already published online would be great too -- even though I've done a bit of that already.
First find a automated code-formatter that works with your language. Reason: Whatever the document says, people will inevitably break the rules. It's much easier to run code through a formatter than to nit-pick in a code review.
If you're using a language with an existing standard (e.g. Java, C#), it's easiest to use it, or at least start with it as a first draft. Sun put a lot of thought into their formatting rules; you might as well take advantage of it.
In any case, remember that much research has shown that varying things like brace position and whitespace use has no measurable effect on productivity or understandability or prevalence of bugs. Just having any standard is the key.
Coming from the automotive industry, here's a few style standards used for concrete reasons:
Always used braces in control structures, and place them on separate lines. This eliminates problems with people adding code and including it or not including it mistakenly inside a control structure.
if(...)
{
}
All switches/selects have a default case. The default case logs an error if it's not a valid path.
For the same reason as above, any if...elseif... control structures MUST end with a default else that also logs an error if it's not a valid path. A single if statement does not require this.
In the occasional case where a loop or control structure is intentionally empty, a semicolon is always placed within to indicate that this is intentional.
while(stillwaiting())
{
;
}
Naming standards have very different styles for typedefs, defined constants, module global variables, etc. Variable names include type. You can look at the name and have a good idea of what module it pertains to, its scope, and type. This makes it easy to detect errors related to types, etc.
There are others, but these are the top off my head.
-Adam
I'm going to second Jason's suggestion.
I just completed a standards document for a team of 10-12 that work mostly in perl. The document says to use "perltidy-like indentation for complex data structures." We also provided everyone with example perltidy settings that would clean up their code to meet this standard. It was very clear and very much industry-standard for the language so we had great buyoff on it by the team.
When setting out to write this document, I asked around for some examples of great code in our repository and googled a bit to find other standards documents that smarter architects than I to construct a template. It was tough being concise and pragmatic without crossing into micro-manager territory but very much worth it; having any standard is indeed key.
Hope it works out!
It obviously varies depending on languages and technologies. By the look of your example name space I am going to guess java, in which case http://java.sun.com/docs/codeconv/ is a really good place to start. You might also want to look at something like maven's standard directory structure which will make all your projects look similar.

Resources