How to Work with Ruby Duck Typing - ruby

I am learning Ruby and I'm having a major conceptual problem concerning typing. Allow me to detail why I don't understand with paradigm.
Say I am method chaining for concise code as you do in Ruby. I have to precisely know what the return type of each method call in the chain, otherwise I can't know what methods are available on the next link. Do I have to check the method documentation every time?? I'm running into this constantly running tutorial exercises. It seems I'm stuck with a process of reference, infer, run, fail, fix, repeat to get code running rather then knowing precisely what I'm working with during coding. This flies in the face of Ruby's promise of intuitiveness.
Say I am using a third party library, once again I need to know what types are allow to pass on the parameters otherwise I get a failure. I can look at the code but there may or may not be any comments or declaration of what type the method is expecting. I understand you code based on methods are available on an object, not the type. But then I have to be sure whatever I pass as a parameter has all the methods the library is expect, so I still have to do type checking. Do I have to hope and pray everything is documented properly on an interface so I know if I'm expected to give a string, a hash, a class, etc.
If I look at the source of a method I can get a list of methods being called and infer the type expected, but I have to perform analysis.
Ruby and duck typing: design by contract impossible?
The discussions in the preceding stackoverflow question don't really answer anything other than "there are processes you have to follow" and those processes don't seem to be standard, everyone has a different opinion on what process to follow, and the language has zero enforcement. Method Validation? Test-Driven Design? Documented API? Strict Method Naming Conventions? What's the standard and who dictates it? What do I follow? Would these guidelines solve this concern https://stackoverflow.com/questions/616037/ruby-coding-style-guidelines? Is there editors that help?
Conceptually I don't get the advantage either. You need to know what methods are needed for any method called, so regardless you are typing when you code anything. You just aren't informing the language or anyone else explicitly, unless you decide to document it. Then you are stuck doing all type checking at runtime instead of during coding. I've done PHP and Python programming and I don't understand it there either.
What am I missing or not understanding? Please help me understand this paradigm.

This is not a Ruby specific problem, it's the same for all dynamically typed languages.
Usually there are no guidelines for how to document this either (and most of the time not really possible). See for instance map in the ruby documentation
map { |item| block } → new_ary
map → Enumerator
What is item, block and new_ary here and how are they related? There's no way to tell unless you know the implementation or can infer it from the name of the function somehow. Specifying the type is also hard since new_ary depends on what block returns, which in turn depends on the type of item, which could be different for each element in the Array.
A lot of times you also stumble across documentation that says that an argument is of type Object, Which again tells you nothing since everything is an Object.
OCaml has a solution for this, it supports structural typing so a function that needs an object with a property foo that's a String will be inferred to be { foo : String } instead of a concrete type. But OCaml is still statically typed.
Worth noting is that this can be a problem in statically typed lanugages too. Scala has very generic methods on collections which leads to type signatures like ++[B >: A, That](that: GenTraversableOnce[B])(implicit bf: CanBuildFrom[Array[T], B, That]): That for appending two collections.
So most of the time, you will just have to learn this by heart in dynamically typed languages, and perhaps help improve the documentation of libraries you are using.
And this is why I prefer static typing ;)
Edit One thing that might make sense is to do what Scala also does. It doesn't actually show you that type signature for ++ by default, instead it shows ++[B](that: GenTraversableOnce[B]): Array[B] which is not as generic, but probably covers most of the use cases. So for Ruby's map it could have a monomorphic type signature like Array<a> -> (a -> b) -> Array<b>. It's only correct for the cases where the list only contains values of one type and the block only returns elements of one other type, but it's much easier to understand and gives a good overview of what the function does.

Yes, you seem to misunderstand the concept. It's not a replacement for static type checking. It's just different. For example, if you convert objects to json (for rendering them to client), you don't care about actual type of the object, as long as it has #to_json method. In Java, you'd have to create IJsonable interface. In ruby no overhead is needed.
As for knowing what to pass where and what returns what: memorize this or consult docs each time. We all do that.
Just another day, I've seen rails programmer with 6+ years of experience complain on twitter that he can't memorize order of parameters to alias_method: does new name go first or last?
This flies in the face of Ruby's promise of intuitiveness.
Not really. Maybe it's just badly written library. In core ruby everything is quite intuitive, I dare say.
Statically typed languages with their powerful IDEs have a small advantage here, because they can show you documentation right here, very quickly. This is still accessing documentation, though. Only quicker.

Consider that the design choices of strongly typed languages (C++,Java,C#,et al) enforce strict declarations of type passed to methods, and type returned by methods. This is because these languages were designed to validate that arguments are correct (and since these languages are compiled, this work can be done at compile time). But some questions can only be answered at run time, and C++ for example has the RTTI (Run Time Type Interpreter) to examine and enforce type guarantees. But as the developer, you are guided by syntax, semantics and the compiler to produce code that follows these type constraints.
Ruby gives you flexibility to take dynamic argument types, and return dynamic types. This freedom enables you to write more generic code (read Stepanov on the STL and generic programming), and gives you a rich set of introspection methods (is_a?, instance_of?, respond_to?, kind_of?, is_array?, et al) which you can use dynamically. Ruby enables you to write generic methods, but you can also explicity enforce design by contract, and process failure of contract by means chosen.
Yes, you will need to use care when chaining methods together, but learning Ruby is not just a few new keywords. Ruby supports multiple paradigms; you can write procedural, object oriend, generic, and functional programs. The cycle you are in right now will improve quickly as you learn about Ruby.
Perhaps your concern stems from a bias towards strongly typed languages (C++, Java, C#, et al). Duck typing is a different approach. You think differently. Duck typing means that if an object looks like a , behaves like a , then it is a . Everything (almost) is an Object in Ruby, so everything is polymorphic.
Consider templates (C++ has them, C# has them, Java is getting them, C has macros). You build an algorithm, and then have the compiler generate instances for your chosen types. You aren't doing design by contract with generics, but when you recognize their power, you write less code, and produce more.
Some of your other concerns,
third party libraries (gems) are not as hard to use as you fear
Documented API? See Rdoc and http://www.ruby-doc.org/
Rdoc documentation is (usually) provided for libraries
coding guidelines - look at the source for a couple of simple gems for starters
naming conventions - snake case and camel case are both popular
Suggestion - approach an online tutorial with an open mind, do the tutorial (http://rubymonk.com/learning/books/ is good), and you will have more focused questions.

Related

Static type inferring in Ruby, what can I do?

I have a school project which aims to statically type some Ruby code. So, my input is simply a .rb file, and I should be able to type every variable that is assigned in the program.
Right now, what I'm planning to do is :
get the file's AST with the Parser library
put each kind of nodes in container objects
implement the visitor pattern to recursively go through the program
try to infer something from there (I was thinking of somehow creating a table of possible input and output types from the core's methods)
I only accept some very basic Ruby as input ( = no call to external library, just the core of ruby + defined-in-the-file methods)
My question is : what do you think of my approach? Is there any gem/existing programs that could help me?
Your approach is technically correct, but it sounds very strange how you put it. This:
Right now, what I'm planning to do is :
get the file's AST with the Parser library
put each kind of nodes in container objects
implement the visitor pattern to recursively go through the program
try to infer something from there (I was thinking of somehow creating a table of possible input and output types from the core's methods)
sounds a little bit like you wanted to go to Mars like this:
Right now, what I'm planning to do is :
get a pencil
get a piece of paper
get a desk
sit down at the desk and use my pen and paper to design a space launch system and Mars lander
In other words, you list three completely trivial points that are maybe an hour of work for an experienced programmer, and then a fourth, that is multiple years of work and worth a PhD.
The most advanced work I know regarding static type inference for Ruby was Diamondback Ruby (DRuby) (not to be confused with the Distributed Ruby standard library aka dRb / dRuby). However, Diamondback Ruby is now abandoned, since the authors gave up on static type inference for Ruby.
One of the principal researchers behind Diamondback Ruby is now working on a new project called RDL. The main differences between Diamondback Ruby and RDL are:
RDL performs dynamic checking, not static checking
RDL relies on explicit annotations, not implicit inference
Steep is another similar project. It, too, relies on dynamic checking and annotations, and in addition does not actually strive for type-correctness.
Ruby Type Inference for IDEA is a complete re-think of how JetBrains plans to approach type inference for Ruby in their IDEA / RubyMine IDE. This does use type inference, but it uses dynamic type inference, not static.
So, as you can see, static type inference for Ruby is so hard that nobody is even trying it, and the guys who did try gave up on it and are now doing dynamic type checking with explicit type annotations instead.
Ruby Type Checking Roundup on Robert Mosolgo's blog is a good overview about the current state-of-the-art in Ruby typing.

Is there a way to perform compile time type-check in Ruby?

I know Ruby is dynamically and strongly typed, but AFAIK, current syntax doesn't allow checking the type of arguments at compile time due to lack of explicit type notation (or contract) for each argument.
If I want to perform compile-time type check, what (practically matured) options do I have?
Update
What I mean type-check is something like typical statically typed language. Such as C.
For example, C function denotes type of each argument and compiler checks passing-in argument is correct or not.
void func1(struct AAA aaa)
{
struct BBB bbb;
func1(bbb); // Wrong type. Compile time error.
}
As an another example, Objective-C does that by putting explicit type information.
- (id)method1:(AAA*)aaa
{
BBB* bbb = [[AAA alloc] init]; // Though we actually use correctly typed object...
[self method1:bbb]; // Compile time warning or error due to type contract mismatch.
}
I want something like that.
Update 2
Also, I mean compile-time = before running the script. I don't have better word to describe it…
There was a project for developing a type system, a type inferencer, a type checker and a syntax for type annotations for (a subset of) Ruby, called Diamondback Ruby. It was abandoned 4 years ago, you can find its source on GitHub.
But, basically, that language would no longer be Ruby. If static types are so important to you, you should probably just use a statically typed language such as Haskell, Scala, ML, Agda, Coq, ATS etc. That's what they're here for, after all.
RDL is a library for static type checking of Ruby/Rails programs. It has type annotations included for the standard library and (I think) for Rails. It lets you add types to methods/variables/etc. like so:
file.rb:
require 'rdl'
type '(Fixnum) -> Fixnum', typecheck: :now
def id(x)
"forty-two"
end
And then running file.rb will perform static type checking:
$ ruby file.rb
.../lib/rdl/typecheck.rb:32:in `error': (RDL::Typecheck::StaticTypeError)
.../file.rb:5:5: error: got type `String' where return type `Fixnum' expected
.../file.rb:5: "forty-two"
.../file.rb:5: ^~~~~~~~~~~
It seems to be pretty well documented!
While you can't check this in a static time-sense, you can use conditionals in your methods to run only after checking the object.
Here the #is_a? and #kind_of? come in handy...
def method(variable)
if variable.is_a? String
...
else
...
end
end
You would have the choice of returning specified error values or raise an exception. Hopefully this is close to what you are looking for.
You are asking for a "compile-time" type check, but in Ruby, there is no "compile" phase. Static analysis of Ruby code is almost impossible, since any method, even from the built-in classes, can be redefined at runtime. Classes can also be dynamically created and instantiated at runtime. How would you do type-checking for a class which doesn't even exist when the program starts?
Surely, your real goal is not just to "type-check your code". Your goal is to "write code that works", right? Type-checking is just a tool which can help you "write code that works". However, while type-checking is helpful, it has its limits. It can catch some simple bugs, but not most bugs, and not the most difficult bugs.
When you choose to use Ruby, you are giving up the benefits of type-checking. However, Ruby may allow you to get things done with much less code than other languages you are used to. Writing programs using less code, means that generally there are less bugs for you to fix. If you use Ruby skillfully, I believe the tradeoff is worth it.
Although you can't type-check your code in Ruby, there is great value in using assertions which check method arguments. In some cases, those assertions might check the type of an argument. More frequently, they will check other properties of the arguments. Then you need some tests which exercise the code. You will find that with a relatively small number of tests, you will catch more bugs than your C/C++ compiler can do.
It seems you want static types. There is not an effective way to do this in Ruby due to the language's dynamic nature.
A naive approach I can think of is to make a "contract" like this:
def up(name)
# name(string)
name.upcase
end
So the first line of each method will be a comment declaring what type each argument must have.
Then implement a tool that will statically scan & analyze the source and catch such errors by scanning the call sites of the above method and check the type of the passed argument whenever possible.
For example this would be easy to check:
x = "George"
up(x)
but how would you check this one:
x = rand(2).zero? "George" : 5
up(x)
In other words, most of the time the types are impossible to be deduced before runtime.
However if you do not care about the "type checking" happening statically, you could also do:
def up(name)
raise "TypeError etc." unless name.is_a? String
# ...
end
In any way, I don't think you will benefit from the above. I would recommend to make use of duck typing instead.
You might be interested in the idea of a "Pluggable type system". It means adding a static type system to a dynamic language, but the programmer decides what should be typed and what is left untyped. The typechecker stands aside the core language and it is usually implemented as a library. It can either do static checking or check types at runtime in a special "checked" mode that should be used during development and to execute tests.
The type checker for Ruby I found is called Rtc (Ruby Type Checker). Github, academic paper. The motivation is to make the requirements on a type of a parameter of a function or method explicit, move the requirements out of the tests into type annotations and turn the type annotation into an "executable documentation". Source.

Does Ruby have a Metaobject protocol and if not, is it possible to implement one?

Pardon my ignorance, but What is a Metaobject protocol, and does Ruby have one? If not, is it possible to implement one for Ruby? What features might a Metaobject protocol possess if Ruby was to have one?
What is a Metaobject protocol?
The best description I've come across is from the Class::MOP documentation:
A meta object protocol is an API to an object system.
To be more specific, it abstracts the components of an object system (classes, object, methods, object attributes, etc.). These abstractions can then be used to inspect and manipulate the object system which they describe.
It can be said that there are two MOPs for any object system; the implicit MOP and the explicit MOP. The implicit MOP handles things like method dispatch or inheritance, which happen automatically as part of how the object system works. The explicit MOP typically handles the introspection/reflection features of the object system.
All object systems have implicit MOPs. Without one, they would not work. Explicit MOPs are much less common, and depending on the language can vary from restrictive (Reflection in Java or C#) to wide open (CLOS is a perfect example).
Does Ruby have one?
According to this thread on Reopening builtin classes, redefining builtin functions? Perlmonks article I think the answer is no (at least in the strictest sense of what a MOP is).
Clearly there is some wriggle room here so it might be worth posting a question in the Perl side of SO because the Class::MOP / Moose author does answer questions there.
If you look closer to the definition, youll see that Ruby does have a MOP. Is it like the one in CLOS? No, CLOS is a meta-circular MOP which is great (I'd even say genius), but it's not the one true way, take a look at Smalltalk. To implement a (let's say basic) MOP all you need is to provide functions that allow your runtime to:
Create or delete a new class
Create a new property or method
Cause a class to inherit from a different class ("change the class structure")
Generate or change the code defining the methods of a class.
And Ruby provides a way to do all that.
On a side note: The author of Class::MOP is right (IMHO) when it claims that some of the things you can do with a meta circular MOP can be hard to do in Ruby (DISCLAIMER: I have zero, zilch, nada Perl knowledge, so I'm thinking Smalltalk like MOP vs CLOS like MOP here) but most of them are very specific (I'm thinking about metaclass instantation here) and there are ways to make things work without them. I think it all depends on your point view, meta circular MOPs are cooler but more on the academic side and non meta circular MOPs are more practical and easier to implement.

Is Automatic Refactoring Possible in Dynamic Languages?

Perhaps I am limited by my experience with dynamic languages (Ruby on Netbeans and Groovy on Eclipse), but it seems to me that the nature of dynamic languages makes it impossible to refactor (renaming methods, classes, pushing-up, pulling-down, etc.) automatically.
Is it possible to refactor AUTOMATICALLY in any dynamic language (with any IDE/tool)? I am especially interested in Ruby, Python and Groovy, and how the refactoring compares to the 100% automatic refactoring available in all Java IDEs.
Given that automatic refactoring was invented in a dynamic language (Smalltalk), I would have to say "Yes".
In particular, John Brant, Don Roberts and Ralph Johnson developed the Refactoring Browser which is one of the core tools in, for instance, Squeak.
My Google-fu is weak today, but you could try find this paper: Don Roberts, John Brant, and Ralph Johnson, A Refactoring Tool for Smalltalk, "The Theory and Practice of Object Systems", (3) 4, 1997.
Smalltalk does not declare any types. The Refactoring Browser has successfully performed correct refactorings in commercial code since 1995 and is incorporated in nearly all current Smalltalk IDE's. - Don Roberts
Automatic Refactoring was invented in Smalltalk, a highly dynamic language.
And it works like a charm ever since.
You can try yourself in a free Smalltalk version (for instance http://pharo-project.org)
In a dynamic language you can also script refactorings yourself or query the
system. Simple example to get the number of Test classes:
TestCase allSubclasses size
I have wondered the same thing. I'm not a compiler/interpreter writer, but I think the answer will be that it is impossible to get it perfect. However, you can get it correct in most cases.
First, I'm going to change the name "dynamic" language to "interpreted" language which is what I think of with Ruby, Javascript, etc. Interpreted languages tend to take advantage of run-time capabilities.
For instance, most scripting languages allow the following
-- pseudo-code but you get the idea
eval("echo(a)");
I just "ran" a string! You would have to refactor that string also. And will a be a variable or does this language allow you to print the character a without quotes if there is no variable a?
I want to believe this kind of coding is probably the exception and that you will get good refactoring almost all of the time. Unfortunately it seems that when I look through libraries for scripting languages, they get into such exceptions normally and maybe even base their architecture on them.
Or to up the ante a bit:
def functionThatAssumesInputWillCreateX(input)
eval(input)
echo(x)
def functionWithUnknownParms( ... )
eval(argv[1]);
At least when you refactor Java, and change a variable from int to string, you get errors in all the places that were expecting the int still:
String wasInt;
out = 3 + wasInt;
With interpreted languages you will probably not see this until run-time.
Ditto the points about the Refactoring Browser...it is highly effective in Smalltalk. However, I imagine there are certain types of refactoring that would be impossible without type information (whether obtain by explicit type annotation in the language or through some form of type inferencing in a dynamic language is irrelevant). One example: when renaming a method in Smalltalk, it will rename all implementors and senders of that method, which most often is just fine, but is sometimes undesirable. If you had type information on variables, you could scope the rename to just the implementors in the current class hierarchy and all senders when the message is being sent to a variable declared to be of a type in that hierarchy (however, I could imagine scenarios where even with type declaration, that would break down and produce undesirable results).

Why isn't DRY considered a good thing for type declarations?

It seems like people who would never dare cut and paste code have no problem specifying the type of something over and over and over. Why isn't it emphasized as a good practice that type information should be declared once and only once so as to cause as little ripple effect as possible throughout the source code if the type of something is modified? For example, using pseudocode that borrows from C# and D:
MyClass<MyGenericArg> foo = new MyClass<MyGenericArg>(ctorArg);
void fun(MyClass<MyGenericArg> arg) {
gun(arg);
}
void gun(MyClass<MyGenericArg> arg) {
// do stuff.
}
Vs.
var foo = new MyClass<MyGenericArg>(ctorArg);
void fun(T)(T arg) {
gun(arg);
}
void gun(T)(T arg) {
// do stuff.
}
It seems like the second one is a lot less brittle if you change the name of MyClass, or change the type of MyGenericArg, or otherwise decide to change the type of foo.
I don't think you're going to find a lot of disagreement with your argument that the latter example is "better" for the programmer. A lot of language design features are there because they're better for the compiler implementer!
See Scala for one reification of your idea.
Other languages (such as the ML family) take type inference much further, and create a whole style of programming where the type is enormously important, much more so than in the C-like languages. (See The Little MLer for a gentle introduction.)
It isn't considered a bad thing at all. In fact, C# maintainers are already moving a bit towards reducing the tiring boilerplate with the var keyword, where
MyContainer<MyType> cont = new MyContainer<MyType>();
is exactly equivalent to
var cont = new MyContainer<MyType>();
Although you will see many people who will argue against var usage, which kind of shows that many people is not familiar with strong typed languages with type inference; type inference is mistaken for dynamic/soft typing.
Repetition may lead to more readable code, and sometimes may be required in the general case. I've always seen the focus of DRY being more about duplicating logic than repeating literal text. Technically, you can eliminate 'var' and 'void' from your bottom code as well. Not to mention you indicate scope with indentation, why repeat yourself with braces?
Repetition can also have practical benefits: parsing by a program is easier by keeping the 'void', for example.
(However, I still strongly agree with you on prefering "var name = new Type()" over "Type name = new Type()".)
It's a bad thing. This very topic was mentioned in Google's Go language Techtalk.
Albert Einstein said, "Everything should be made as simple as possible, but not one bit simpler."
Your complaint makes no sense in the case of a dynamically typed language, so you must intend this to refer to statically typed languages. In that case, your replacement example implicitly uses Generics (aka Template Classes), which means that any time that fun or gun is used, a new definition based upon the type of the argument. That could result in dozens of extra methods, regardless of the intent of the programmer. In particular, you're throwing away the benefit of compiler-checked type-safety for a runtime error.
If your goal was to simply pass through the argument without checking its type, then the correct type would be Object not T.
Type declarations are intended to make the programmer's life simpler, by catching errors at compile-time, instead of failing at runtime. If you have an overly complex type definition, then you probably don't understand your data. In your example, I would have suggested adding fun and gun to MyClass, instead of defining them separately. If fun and gun don't apply to all possible template types, then they should be defined in an explicit subclass, not as separate functions that take a templated class argument.
Generics exist as a way to wrap behavior around more specific objects. List, Queue, Stack, these are fine reasons for Generics, but at the end of the day, the only thing you should be doing with a bare Generic is creating an instance of it, and calling methods on it. If you really feel the need to do more than that with a Generic, then you probably need to embed your Generic class as an instance object in a wrapper class, one that defines the behaviors you need. You do this for the same reason that you embed primitives into a class: because by themselves, numbers and strings do not convey semantic information about their contents.
Example:
What semantic information does List convey? Just that you're working with multiple triples of integers. On the other hand, List, where a color has 3 integers (red, blue, green) with bounded values (0-255) conveys the intent that you're working with multiple Colors, but provides no hint as to whether the List is ordered, allows duplicates, or any other information about the Colors. Finally a Palette can add those semantics for you: a Palette has a name, contains multiple Colors, but no duplicates, and order isn't important.
This has gotten a bit far afield from the original question, but what it means to me is that DRY (Don't Repeat Yourself) means specifying information once, but that specification should be as precise as is necessary.

Resources