Is there a meaningful difference between pass_by_reference vs pass_by_object_sharing in ruby? - ruby

Context: i argue that saying pass_by_reference when it's really pass_by_sharing is misleading
Here is the excerpt from the book "Effective Ruby" I'm arguing against
"Most objects are passed around as references and not as actual values. When these types of objects are inserted into a container the collection class is actually storing a reference to the object and not the object itself. (The notable exception to the rule is the Fixnum class whose objects are always passed by value and not by reference.)
The same is true when objects are passed as method arguments. The method will receive a reference to the object and not a new copy. This is great for efficiency but has a startling implication.
"

The 'call by value' and 'call by object sharing' terminology matches Ruby's behavior, and
the terminology is consistent with other object orientated languages that have the same
semantics.
'Call by value' and 'call by object sharing' basically mean the same thing in object orientated languages, so which one is used doesn't really matter. Someone just thought it would clarify the confusion in the terminology to add more terminology.
If 'call by reference' was implemented in Ruby though, it would be something like:
def f(byref x)
x = "CHANGED"
end
x = ""
f(x)
# X is "CHANGED"
Here, the value of x is changed. The value being which object x refers to.
Using terms 'call by reference' just creates confusion though because they mean
different things to different people. It's unnecessary in
languages like Ruby because you don't have a choice. In languages with different
calling mechanisms like C++ and C# it makes more sense to teach these terms because
they have a real effect on programs and we can come up with non hypothetical examples
of them.
When explaining parameters in Ruby, you don't need to use any of these terms though.
They're meaningless to people that don't already know the language. Just
describe the behavior itself without that terminology and avoid the baggage.
I would say if you insist on using these terms, then use 'call by value' because it's usually considered more correct. The 'Programming Ruby' book calls it 'call by value', as well as plenty of Ruby programmers. Using the term with a different meaning than its technical one isn't helpful.

You are right. Ruby is pass-by-value only. The semantics of passing and assigning in Ruby are exactly identical to those in Java. And Java is universally described (on Stack Overflow and the rest of the Internet) as pass-by-value only. Terms about languages such as pass-by-value and pass-by-reference must be consistently used across languages to be meaningful.
The thing that is often misunderstood by people who say Java, Ruby, etc. "pass objects by reference" is that "objects" are not values in these languages, and thus cannot be "passed". The value of every variable and result of every expression is a "reference", which is a pointer to an object. The expression for creating an object returns an object pointer; when you access an attribute through the dot notation, the left side takes an object pointer; when you assign one variable to another, you copy the pointer resulting in two pointers to the same object. You always deal with pointers to objects, never objects themselves.
This is made explicit in Java as the only types in Java are primitive types and reference types -- there are no "object types". So every value in Java that is not a primitive is a reference (a pointer to an object). Ruby is dynamically-typed, so variables don't have explicit types. But you can imagine a dynamically-typed language as just a statically-typed language having exactly one type; and for languages like Python and Ruby, if this type were described, it be a pointer-to-object type.
The issue ultimately boils down to a problem of definitions. People argue over things because there is no precise definition, or they each have slightly different definitions. Rather then argue over vaguely-defined things like what is the "value" of a variable, or whether named values are "variables" or "names", etc., we need to use a definition for pass-by-value and pass-by-reference that is based purely on semantics of a language structure. #fgb's answer provides a clear semantic test for pass-by-reference. In "true pass-by-reference", e.g. with & in C++ and PHP, or with ref or out in C#, simple assignment (i.e. =) to a parameter variable has the same effect as simple assignment to the passed variable in the original scope. In pass-by-value, simple assignment (i.e. =) to a parameter variable has no effect in the original scope. This is what we see in Java, Python, Ruby, and many other languages.
I dislike people coming up with new names like "pass by object sharing", when they don't understand that the semantics are covered by an existing term, pass-by-value. Adding a new term only adds more to the confusion rather than reduce it, because it does not resolve the definitions of existing terms, only adding a new term that also needs to be defined.

Related

Java: Are Instance methods Prohibited for Domain Objects in Functional Programming?

Since Functional Programming treats Data and Behavior separately, and behavior is not supposed to mutate the the state of an Instance, does FP recommend not having instance methods at all for Domain Objects? Or should I always declare all the fields final?
I am asking more in the context of Object oriented languages like Java.
Since Functional Programming treats Data and Behavior separately,
I heard that said a lot, but it is not necessarily true. Yes, syntactically they are different, but encapsulation is a thing in FP too. You don't really want your data structures exposed for the same reason you don't want it in OOP, you want to evolve it later. You want to add features, or optimize it. Once you gave direct access to the data you've essentially lost control of that data.
For example in haskell, there are modules, which are actually the data + behavior in a single unit. Normally the "constructors" of data (i.e. the direct access to "fields") are not available for outside functions. (There are exceptions as always.)
does FP recommends not having instance methods at all for Domain Objects
FP is a paradigm which says that software should be build using a (mathematical) composition of (mathematical) functions. That is essentially it. Now if you squint enough, you could call a method a function, with just one additional parameter this. Provided everything is immutable.
So I would say no, "FP" does not explicitly define syntax and it can be compatible with objects under certain conditions.
I am asking more in the context of Object oriented languages like Java.
This is where it kind-of gets hazy. Java is not well suited to do functional programming. Keep in mind, that it may have borrowed certain syntax from traditional FP languages, but that doesn't make it suitable for FP.
For example immutability, pure functions, function composition are all things that you should have to do FP, Java has none of those. I mean you can write code to "pretend", but you would be swimming against the tide.
does FP recommends not having instance methods at all for Domain Objects?
In the Domain Driven Design book, Eric Evans discusses modeling your domain with Entities, and Value Objects.
The Value Object pattern calls for immutable private data; once you initialize the value object, it does not change. In the language of Command Query Separation, we would say that the interface of a Value Object supports queries, but not commands.
So an instance method on a value object is very similar to a closure, with the immutable private state of the object playing the role of the captured variables.
Your fields should be final, but functional code and instance methods are not mutually exclusive.
Take a look at the BigDecimal class for example:
BigDecimal x = new BigDecimal(1);
BigDecimal y = new BigDecimal(2);
BigDecimal z = a.add(b);
x and y are immutable and the add method leaves them unchanged and creates a new BigDecimal.

How to Work with Ruby Duck Typing

I am learning Ruby and I'm having a major conceptual problem concerning typing. Allow me to detail why I don't understand with paradigm.
Say I am method chaining for concise code as you do in Ruby. I have to precisely know what the return type of each method call in the chain, otherwise I can't know what methods are available on the next link. Do I have to check the method documentation every time?? I'm running into this constantly running tutorial exercises. It seems I'm stuck with a process of reference, infer, run, fail, fix, repeat to get code running rather then knowing precisely what I'm working with during coding. This flies in the face of Ruby's promise of intuitiveness.
Say I am using a third party library, once again I need to know what types are allow to pass on the parameters otherwise I get a failure. I can look at the code but there may or may not be any comments or declaration of what type the method is expecting. I understand you code based on methods are available on an object, not the type. But then I have to be sure whatever I pass as a parameter has all the methods the library is expect, so I still have to do type checking. Do I have to hope and pray everything is documented properly on an interface so I know if I'm expected to give a string, a hash, a class, etc.
If I look at the source of a method I can get a list of methods being called and infer the type expected, but I have to perform analysis.
Ruby and duck typing: design by contract impossible?
The discussions in the preceding stackoverflow question don't really answer anything other than "there are processes you have to follow" and those processes don't seem to be standard, everyone has a different opinion on what process to follow, and the language has zero enforcement. Method Validation? Test-Driven Design? Documented API? Strict Method Naming Conventions? What's the standard and who dictates it? What do I follow? Would these guidelines solve this concern https://stackoverflow.com/questions/616037/ruby-coding-style-guidelines? Is there editors that help?
Conceptually I don't get the advantage either. You need to know what methods are needed for any method called, so regardless you are typing when you code anything. You just aren't informing the language or anyone else explicitly, unless you decide to document it. Then you are stuck doing all type checking at runtime instead of during coding. I've done PHP and Python programming and I don't understand it there either.
What am I missing or not understanding? Please help me understand this paradigm.
This is not a Ruby specific problem, it's the same for all dynamically typed languages.
Usually there are no guidelines for how to document this either (and most of the time not really possible). See for instance map in the ruby documentation
map { |item| block } → new_ary
map → Enumerator
What is item, block and new_ary here and how are they related? There's no way to tell unless you know the implementation or can infer it from the name of the function somehow. Specifying the type is also hard since new_ary depends on what block returns, which in turn depends on the type of item, which could be different for each element in the Array.
A lot of times you also stumble across documentation that says that an argument is of type Object, Which again tells you nothing since everything is an Object.
OCaml has a solution for this, it supports structural typing so a function that needs an object with a property foo that's a String will be inferred to be { foo : String } instead of a concrete type. But OCaml is still statically typed.
Worth noting is that this can be a problem in statically typed lanugages too. Scala has very generic methods on collections which leads to type signatures like ++[B >: A, That](that: GenTraversableOnce[B])(implicit bf: CanBuildFrom[Array[T], B, That]): That for appending two collections.
So most of the time, you will just have to learn this by heart in dynamically typed languages, and perhaps help improve the documentation of libraries you are using.
And this is why I prefer static typing ;)
Edit One thing that might make sense is to do what Scala also does. It doesn't actually show you that type signature for ++ by default, instead it shows ++[B](that: GenTraversableOnce[B]): Array[B] which is not as generic, but probably covers most of the use cases. So for Ruby's map it could have a monomorphic type signature like Array<a> -> (a -> b) -> Array<b>. It's only correct for the cases where the list only contains values of one type and the block only returns elements of one other type, but it's much easier to understand and gives a good overview of what the function does.
Yes, you seem to misunderstand the concept. It's not a replacement for static type checking. It's just different. For example, if you convert objects to json (for rendering them to client), you don't care about actual type of the object, as long as it has #to_json method. In Java, you'd have to create IJsonable interface. In ruby no overhead is needed.
As for knowing what to pass where and what returns what: memorize this or consult docs each time. We all do that.
Just another day, I've seen rails programmer with 6+ years of experience complain on twitter that he can't memorize order of parameters to alias_method: does new name go first or last?
This flies in the face of Ruby's promise of intuitiveness.
Not really. Maybe it's just badly written library. In core ruby everything is quite intuitive, I dare say.
Statically typed languages with their powerful IDEs have a small advantage here, because they can show you documentation right here, very quickly. This is still accessing documentation, though. Only quicker.
Consider that the design choices of strongly typed languages (C++,Java,C#,et al) enforce strict declarations of type passed to methods, and type returned by methods. This is because these languages were designed to validate that arguments are correct (and since these languages are compiled, this work can be done at compile time). But some questions can only be answered at run time, and C++ for example has the RTTI (Run Time Type Interpreter) to examine and enforce type guarantees. But as the developer, you are guided by syntax, semantics and the compiler to produce code that follows these type constraints.
Ruby gives you flexibility to take dynamic argument types, and return dynamic types. This freedom enables you to write more generic code (read Stepanov on the STL and generic programming), and gives you a rich set of introspection methods (is_a?, instance_of?, respond_to?, kind_of?, is_array?, et al) which you can use dynamically. Ruby enables you to write generic methods, but you can also explicity enforce design by contract, and process failure of contract by means chosen.
Yes, you will need to use care when chaining methods together, but learning Ruby is not just a few new keywords. Ruby supports multiple paradigms; you can write procedural, object oriend, generic, and functional programs. The cycle you are in right now will improve quickly as you learn about Ruby.
Perhaps your concern stems from a bias towards strongly typed languages (C++, Java, C#, et al). Duck typing is a different approach. You think differently. Duck typing means that if an object looks like a , behaves like a , then it is a . Everything (almost) is an Object in Ruby, so everything is polymorphic.
Consider templates (C++ has them, C# has them, Java is getting them, C has macros). You build an algorithm, and then have the compiler generate instances for your chosen types. You aren't doing design by contract with generics, but when you recognize their power, you write less code, and produce more.
Some of your other concerns,
third party libraries (gems) are not as hard to use as you fear
Documented API? See Rdoc and http://www.ruby-doc.org/
Rdoc documentation is (usually) provided for libraries
coding guidelines - look at the source for a couple of simple gems for starters
naming conventions - snake case and camel case are both popular
Suggestion - approach an online tutorial with an open mind, do the tutorial (http://rubymonk.com/learning/books/ is good), and you will have more focused questions.

What is special about boolean?

In Ruby, there is a convention to have a method name end with a question mark to indicate that its return value is boolean. Why is boolean considered so special? Is there anything convenient if you know that a method's return value is particularly boolean? After all, in Ruby, you can insert all kinds of value returning (getter) methods into a conditional without caring whether it is boolean or not.
I think it is a waste to use the question mark just for indicating a boolean value. There should be more useful uses. I have plenty of use case where I want to have a pair of getter and setter methods, where the setter method should return self so that I can use it in a method chain. And naming them something like get_foo and set_foo looks cumbersome. Rather than following the convention, I am tempted to name a pair of getter and setter methods like this:
def foo?; #foo end
def foo v; #foo = v end
where the value of #foo is not (necessarily) boolean. (Besides potential criticism that breaking the convention will confuse other programmers), is there something wrong with doing that?
There is nothing special at all, it's just a convention. A question can be answered with "yes" or "no", but also with another stuff like someone's name.
By returning a boolean on methods with a question mark, it indicates it to be an explicit behavior.
If you make the answer be "yes" or "no", it's easy for the reader of your code to identify the behavior of your method without even looking at the implementation. On the other hand, if you make it return any other type, it is more difficult for the reader to understand your code without reading your class and method definition.
With a boolean there are only two possible answers. If the return value is not boolean it can be anything, which would not help at all. You would still need to look at the method implementation. You should always look further to understand some piece of code, but using this convention makes it simpler.
There is a convention to use question mark in method names to indicate that a method is a predicate. AFAIK, this predicate is not required (by the convention) to return a boolean value, thanks to simple rules for truthy/falsey values.
Besides potential criticism that breaking the convention will confuse other programmers, is there something wrong with doing that?
Confusing and surprising fellow programmers is bad. Ruby couldn't care less. It's just a convention. And conventions exist for a reason.
You can put anything in a flow control construct, but semantically booleans are appropriate. "If" in real human language typically takes a boolean, and the same is true of the construct in many programming languages. Ruby likes to make things convenient and assigns a "truthiness" value to everything in the language, which affects how it behaves in a boolean context.
In other words, booleans are the only things that are almost exclusively used for flow control, so the convention is to make them look "right" for flow-control constructs. It's their native environment.
(Besides potential criticism that breaking the convention will confuse other programmers), is there something wrong with doing that?
In the same sense that there is nothing wrong with naming all your variables after 1920s comedians, no, there's nothing wrong with that. But also in the same sense as naming all your variables after 1920s comedians, it isn't a very good idea. Nowhere in any language that I know of -- human or computer -- does the question mark mean "get." So the semantics of your code are off with that convention.
This question and the answers boil down to "POLS" AKA "Principle of Least Surprise".
A method name can be a random choice of letters and numbers separated by underscores, with '!', '?' and '=' sprinkled through them, if we chose to do so. They could be randomly created by the code at run time, and, as long as the rest of the code used the same arrangement of characters, the program would run and Ruby would be happy.
We humans, the programmers, determine the name of the methods used, to represent something, a characteristic or an action. Trying to use randomly named methods would lead to madness, or at least a very hard to maintain program. So, instead, we try to use sensible names for things. Sometimes they're verbs or adjectives, sometimes they're more descriptive because the method does several things.
As part of that naming, sometimes we want to provide additional hints about the behavior of the method. By convention in Ruby, we use "!" to warn the coder that the method changes something or is destructive. "=" indicates the method takes a parameter and assigns it to the receiver/object. It's a setter method and in many other languages it'd be idiomatic to use "set_flag..." or "set_value..." as the name. It's just a convention in that language, and followed by developers in the language.
We use "?" in Ruby to ask a question about an object, whether it is, or isn't, true about that object. We could say "is_true?" or "true?" and indicate we are testing whether something is true about it. If it's true, or false, it's a Boolean response so we return a true/false value.

Is it possible to do functional programming in a language without functions?

In this comment, it was stated that Ruby doesn't have functions, only methods. If Ruby doesn't have functions, is it not possible to do functional programming in it? Or am I confused about the term "function"?
I mean "functional programming" in the sense of functions as first class objects, not as in prohibiting mutable state.
Blocks and Procs are first-class functions. You can pass them around to methods and functions. That's how Ruby is able to support FP-ish things like map and reduce.
More generally, a method can be viewed as a function with extra associated state (its self), but methods are rarely passed around in Ruby — though they can be — so in practice they're not as important to FP-ish idioms as blocks and Procs.
Yes. Method vs function is quite a fine distinction.
It's easy to view each particular method implementation as a function; just have it take as an extra parameter the object on which the method was invoked (if your language doesn't pass it explicitly; not terribly familiar with Ruby). That doesn't quite give you virtual method calls (i.e. where the particular implementation called is determined by the object at runtime). But it's also very easy to imagine a virtual method call as calling a function that just inspects its first parameter (self, this, whatever it's called) and uses it to determine which method implementation to call. With those conventions established, object.method(param1, param2) differs from method(object, param1, param2) only in a trivial syntactic way.
Personally, I view the above as "the truth", and that object-oriented languages just offer syntactic sugar and optimised execution for this because it's such a core part of writing/executing OO programs. That sort of system is also exactly how you do OO when you have functions but not true classes/methods.
It's also trivially easy to implement functions with methods, if you think methods aren't functions. Just have an object with a single method and no attributes! This is also how you do functional programming in languages like Java that insist upon everything being an object and don't let you pass methods/functions as first-class values.
All you need to do functional programming is things you can pass around as first-class values, which can be used to execute code determined by the creator of the "thing" (rather than determined by the code that's using the "thing"), on demand by code that has access to the "thing". I can't think of a programming language that doesn't have this capability.
A function (or more precisely a procedure, since we are not talking about referential transparency here) is isomorphic to an object with only a single method.
That's how first-class procedures are faked in Java, for example: with so-called SAM interfaces (Single Abstract Method). It's also how first-class procedures are "faked" in Ruby: anything which responds to call (and maybe to_proc) is a first-class procedure. There is a convenience class called Proc which provides additional features for "procedures" such as currying, and there is literal syntax (-> (x, y) { x + y }) for procedures which creates instances of the Proc class, but those two aren't strictly necessary:
def (i_am_a_first_class_procedure = Object.new).call(x) p x end
i_am_a_first_class_procedure.(42)
# 42
Scala is similar, except the method is called apply, not call. In Python, it's a "magic" method called __call__.
Note: I am ignoring closures here. Closures are procedures with state, and objects of course can have state, too, so there's not a real problem in representing them, but expressing the lexical capture of free variables in terms of an object with instance variables gets rather hairy.
Naturally yes, as long as the language is Turing complete.
Added later:
Actually, Ruby supports several cases of typical functional programming. Matz himself stated something about "providing toys for functional programming kids" when he wass adding #curry method to Proc class in Ruby 1.9 But you can do functional programming with methods as well.

Why isn't DRY considered a good thing for type declarations?

It seems like people who would never dare cut and paste code have no problem specifying the type of something over and over and over. Why isn't it emphasized as a good practice that type information should be declared once and only once so as to cause as little ripple effect as possible throughout the source code if the type of something is modified? For example, using pseudocode that borrows from C# and D:
MyClass<MyGenericArg> foo = new MyClass<MyGenericArg>(ctorArg);
void fun(MyClass<MyGenericArg> arg) {
gun(arg);
}
void gun(MyClass<MyGenericArg> arg) {
// do stuff.
}
Vs.
var foo = new MyClass<MyGenericArg>(ctorArg);
void fun(T)(T arg) {
gun(arg);
}
void gun(T)(T arg) {
// do stuff.
}
It seems like the second one is a lot less brittle if you change the name of MyClass, or change the type of MyGenericArg, or otherwise decide to change the type of foo.
I don't think you're going to find a lot of disagreement with your argument that the latter example is "better" for the programmer. A lot of language design features are there because they're better for the compiler implementer!
See Scala for one reification of your idea.
Other languages (such as the ML family) take type inference much further, and create a whole style of programming where the type is enormously important, much more so than in the C-like languages. (See The Little MLer for a gentle introduction.)
It isn't considered a bad thing at all. In fact, C# maintainers are already moving a bit towards reducing the tiring boilerplate with the var keyword, where
MyContainer<MyType> cont = new MyContainer<MyType>();
is exactly equivalent to
var cont = new MyContainer<MyType>();
Although you will see many people who will argue against var usage, which kind of shows that many people is not familiar with strong typed languages with type inference; type inference is mistaken for dynamic/soft typing.
Repetition may lead to more readable code, and sometimes may be required in the general case. I've always seen the focus of DRY being more about duplicating logic than repeating literal text. Technically, you can eliminate 'var' and 'void' from your bottom code as well. Not to mention you indicate scope with indentation, why repeat yourself with braces?
Repetition can also have practical benefits: parsing by a program is easier by keeping the 'void', for example.
(However, I still strongly agree with you on prefering "var name = new Type()" over "Type name = new Type()".)
It's a bad thing. This very topic was mentioned in Google's Go language Techtalk.
Albert Einstein said, "Everything should be made as simple as possible, but not one bit simpler."
Your complaint makes no sense in the case of a dynamically typed language, so you must intend this to refer to statically typed languages. In that case, your replacement example implicitly uses Generics (aka Template Classes), which means that any time that fun or gun is used, a new definition based upon the type of the argument. That could result in dozens of extra methods, regardless of the intent of the programmer. In particular, you're throwing away the benefit of compiler-checked type-safety for a runtime error.
If your goal was to simply pass through the argument without checking its type, then the correct type would be Object not T.
Type declarations are intended to make the programmer's life simpler, by catching errors at compile-time, instead of failing at runtime. If you have an overly complex type definition, then you probably don't understand your data. In your example, I would have suggested adding fun and gun to MyClass, instead of defining them separately. If fun and gun don't apply to all possible template types, then they should be defined in an explicit subclass, not as separate functions that take a templated class argument.
Generics exist as a way to wrap behavior around more specific objects. List, Queue, Stack, these are fine reasons for Generics, but at the end of the day, the only thing you should be doing with a bare Generic is creating an instance of it, and calling methods on it. If you really feel the need to do more than that with a Generic, then you probably need to embed your Generic class as an instance object in a wrapper class, one that defines the behaviors you need. You do this for the same reason that you embed primitives into a class: because by themselves, numbers and strings do not convey semantic information about their contents.
Example:
What semantic information does List convey? Just that you're working with multiple triples of integers. On the other hand, List, where a color has 3 integers (red, blue, green) with bounded values (0-255) conveys the intent that you're working with multiple Colors, but provides no hint as to whether the List is ordered, allows duplicates, or any other information about the Colors. Finally a Palette can add those semantics for you: a Palette has a name, contains multiple Colors, but no duplicates, and order isn't important.
This has gotten a bit far afield from the original question, but what it means to me is that DRY (Don't Repeat Yourself) means specifying information once, but that specification should be as precise as is necessary.

Resources