Unexpected behavior from Ruby 'super' keyword - Nokogiri inheritance - ruby

The rules of Ruby's super keyword is that if it is called without arguments, all of the original arguments are forwarded. If it is called with explicit arguments, the explicit arguments are exclusively passed in.
In this example, arguments should never be forwarded, since I am calling super with exact arguments.
Example:
#doc = Nokogiri::HTML::DocumentFragment.parse("<body></body>")
class Cat < Nokogiri::XML::Node
def initialize(arg1, arg2)
super("cat", arg2) # Pass arg2 to super
# Do something with arg1 later
end
end
When calling: Cat.new("dog", #doc) I expect to get back a <cat></cat> tag, and I expect the first argument to be ignored. Instead I am getting a <dog></dog> tag.
Is there a reason this case would defy expected behavior?

If you look at the source to nokogiri, it's actually the new method that sets the node's name, not the initialize method. Nothing mysterious is happening with regards to invoking super, it's just that the initialize method doesn't do anything with those arguments.
I assume this is because the new method is the one that is supposed to be allocating storage and so on for the object, which in nokogiri's case means creating the underlying libxml node, which is the thing that contain's the node's name.

Related

Why does `gsub` call `to_hash`?

I am writing a DSL. I don't want users to have to quote the arguments to pass strings, therefore I overwrite method_missing to convert an unknown method to a string. In the following example, create is the DSL method, and I wanted user to type arg1 and arg2 without the quotes.
def method_missing(m, *arg)
m.to_s
end
def create(*args)
arg1.gsub(#do something here)
end
create arg1 arg2
However, this raises and error when I use gsub on the 'string':
'gsub': can't convert String to Hash (String#to_hash gives String) (TypeError)
I guess the method_missing overwriting is messed it up since it looks like gsub is calling String#to_hash, which is not a method in String, thus it is routed to method_missing.
I am wondering why gsub calls String#to_hash, or whether there is any other way to let users of the DSL not have to type quotes, without overwriting method_missing.
String#gsub does different things depending on the argument count and types, and if a block was given:
gsub(pattern, replacement) → new_str
gsub(pattern, hash) → new_str
gsub(pattern) {|match| block } → new_str
gsub(pattern) → enumerator
The second one is documented as:
If the second argument is a Hash, and the matched text is one of its keys, the corresponding value is the replacement string.
But how to distinguish it from the first? Both take two arguments! That's a little bit complicated but in your case Ruby (well, the reference implementation called CRuby or MRI to be exact) starts with checking if the second argument has the internal type T_HASH (it doesn't as it's most likely T_STRING due to #to_s), then it checks if #to_hash can be called. Either because it responds to it or #method_missing can instead. You have defined it so Ruby calls it. However it doesn't return a T_HASH and that is the cause of the exception you've posted.
A possible solution is defining main.method_missing and not Object#method_missing (as String inherits from Object):
def self.method_missing(m, *arg)
m.to_s
end
However I recommend sticking to quotes or writing your own small parser for this kind of file if it shouldn't adhere to Ruby's syntax. Using *_missing may be the cause of confusing or unhelpful error messages. Or even none (I guess create arg1 arg2 should've been create arg1, arg2).
gsub probably uses method_missing itself somewhere, so it seems that defining it globally there is causing internal issues with the method call. If you're going to use method_missing make sure you always define it in a module or a class:
module CoolDSL
def self.method_missing(m, *arg)
m.to_s
end
def self.create(*args)
args[0].gsub(/1/, "2")
end
def self.do_thing
create arg1 arg2
end
end
CoolDSL.do_thing
Naturally, that's not exactly useful as a DSL, so you'll want to learn the power of instance_eval and yield. I like this guide.

Ruby as a "pure" object oriented language --- inconsistency with Ruby puts?

I've often read that Ruby is a pure object oriented language since commands are typically given as messages passed to the object.
For example:
In Ruby one writes: "A".ord to get the ascii code for A and 0x41.chr to emit the character given its ascii code.
This is in contrast to Python's: ord("A") and chr(0x41)
So far so good --- Ruby's syntax is message passing.
But the apparent inconsistency appears when considering the string output command:
Now one has: puts str or puts(str) instead of str.puts
Given the pure object orientation expectation for Ruby's syntax, I would have expected the output command to be a message passed to the string object, i.e. calling a method from the string class, hence str.puts
Any explanations? Am I missing something?
Thanks
I would have expected the output command to be a message passed to the string object, i.e. calling a method from the string class, hence str.puts
This is incorrect expectation, let's start with that. Why would you tell a string to puts itself? What would it print itself to? It knows nothing (and should know nothing) of files, I/O streams, sockets and other places you can print things to.
When you say puts str, it's actually seen as self.puts str (implicit receiver). That is, the message is sent to the current object.
Now, all objects include Kernel module. Therefore, all objects have Kernel#puts in their lists of methods. Any object can puts (including current object, self).
As the doc says,
puts str
is translated to
$stdout.puts str
That is, by default, the implementation is delegated to standard output (print to console). If you want to print to a file or a socket, you have to invoke puts on an instance of file or socket classes. This is totally OO.
Ruby isn't entirely OO (for example, methods are not objects), but in this case, it is. puts is Kernel#puts, which is shorthand for $stdout.puts. That is, you're calling the puts method of the $stdout stream and passing a string as the parameter to be output to the stream. So, when you call
puts "foo"
You're really calling:
$stdout.puts("foo")
Which is entirely consistent with OO.
puts is a method on an output streams e.g.
$stdout.puts("this", "is", "a", "test")
Printing something to somewhere at least involves two things: what is written and where it is written to. Depending on what you focus on, there can be different implementations, even in OOP. Besides that, Ruby has a way to make a method look more like a function (i.e., not being particularly tied to a receiver as in OOP) for methods that are used all over the place. So there are at least three logical options that could be thought of for such methods like printing.
An OOP method defined on the object to be printed
An OOP method defined on the object where it should be printed
A function-style method
For the second option, IO#write is one example; The receiver is the destination of writing.
The puts without an explicit receiver is actually Kernel#puts, and takes neither of the two as the arguments; it is an example of the third option; you are correct to point out that this is not so OOP, but Matz especially provided the Kernel module to be able to do things like this: a function-style method.
The first option is what you are expecting; it is nothing wrong. It happens that there is no well known method of this type, but it was proposed in the Ruby core by one of the developers, but unfortunately, it did not make it. Actually, I felt the same thing as you, and have something similar in my personal library called Object#intercept. A simplified version is this:
class Object
def intercept
tap{|x| p x}
end
end
:foo.intercept # => :foo
You can replace p with puts if you want.

In Ruby, how do sub, gsub (and other text methods) in shell one-liners work without referring to an object?

I saw this piece of code somewhere on the web:
ruby -pe 'gsub /^\s*|\s*$/, ""'
Evidently this piece of code removes leading and trailing whitespace from each line from STDIN.
I understand the regex and replacement, no problem, but what I don't get is how the method gsub is receiving an object to act upon. I understand that the -p flag wraps this whole thing in a while gets; print; ... ; end block, but how does gsub receive the string to act upon? At the very least, shouldn't it be a $_.gsub(..) instead? How does the current input line get "magically" passed to gsub?
Does the code in these Perl-like one-liners get interpreted in a somewhat different manner? I'm looking for a general idea of the differences from traditional, script-based Ruby code. Haven't found a comprehensive set of resources on this, I'm afraid.
It turns out that this is an instance method defined on Kernel, which magically gets turned on only when you use the -p or -n flag.
ruby -pe 'puts method(:gsub);'
#<Method: Object(Kernel)#gsub>
See the documentation here.
Other magical methods I found are chop, print, and sub.
The magical methods are all sent to $_ implicitly.
Easy:
class Object
def gsub(*args, &block)
$_.gsub(*args, &block)
end
end
Since every object is an instance of Object (well, almost every object), every object has a gsub method now. So, you can call
some_object.gsub('foo', 'bar')
on any object, and it will just work. And since it doesn't matter what object you call it on, because it doesn't actually do anything with that object, you might just as well call it on self:
self.gsub('foo', 'bar')
Of course, since self is the implicit receiver, this is the same as
gsub('foo', 'bar')
For methods such as this, which don't actually depend on the receiver, and are only added to the Object class for convenience reasons, it is a common convention to make them private so that you cannot accidentally call them with an explicit receiver and then somehow get confused into thinking that this method does something to the receiver.
Also, it is common to put such methods (which are actually intended to be used more like procedures than methods, i.e. completely independent of their receiver) into the Kernel mixin, which is mixed into Object instead of directly into the Object class to distinguish them from methods that are available to every object but actually do depend on its internal state, such as Object#class, Object#to_s etc.
module Kernel
private
def gsub(*args, &block)
$_.gsub(*args, &block)
end
end
Other methods that are defined in this way, which you may have come across already are require, load, puts, print, p, gets, loop, raise, rand, throw, catch, lambda, proc, eval, Array, Integer, Float etc.

What does the end.method do in Ruby?

I've seen code like this:
def some_method
# ...
end.another_method
What does the end.another_method part do?
I believe that your example is wrong, as what you are doing here is defining a method and calling a method on the result of a method definition (not a method call), which is always (usually?) nil.
There's a similar form which fmendez is referring to, but end is the end of a block, not a method definition in that case.
So, for example:
array.map do |element|
element * element
end.sum
would, hypothetically, return a sum of squares of elements of given array.
But, if you are doing method chaining like this, it is more common to use bracket style blocks instead of do..end, so the above example would read:
array.map{ |element|
element * element
}.sum
Blocks in Ruby are method arguments, not unlike any other method arguments (apart from the dedicated syntax), so putting dot after end is not different than putting the dot after ) in
'hello'.concat(' world!').capitalize
Which is also an example of method chaining.
In Ruby, the . is the message sending operator. (In other languages, it would be called a method calling operator instead.) So, when you say
foo.bar
it means "evaluate foo and send the message bar to the result of evaluating foo".
In this particular case, you are sending the message another_method to the result of evaluating
def some_method; end
The Ruby Language Specification says that the value of a method definition expression is undefined and should be ignored; and on most Ruby implementations, method definition expressions simply evaluate to nil, which isn't terribly useful.
However, on some implementations, method definition expressions do evaluate to something more useful than nil. On Rubinius, for example, they evaluate to the CompiledMethod object for the method being defined. And CompiledMethod has a rich API, so sending messages to a CompiledMethod object definitely makes sense.
It has also been proposed that method definition expressions should return a Symbol corresponding to the name of the method being defined or a Method object.
Put simply: the dot in this particular case means the exact same thing it always means in Ruby: send a message, call a method, invoke a member function, whatever you want to call it.

When does Ruby know that a method exists?

One question that ran through my mind was how does the Ruby interpreter know that a method exists on a object if the definition is yet to be interpreted? Like, wouldn't it matter whether you define the method first than use it, rather than use it then define it?
It doesn't know, and it doesn't care - until execution. When a method call statement is executed, the interpreter looks to see if the class (object, not code!) has the named function. If it does not, it looks up the ancestor tree. If it does not find any, it calls the method_missing method. If that is not defined, you get your error.
If your function call does not get executed, you will not get any errors.
The interpreter doesn't know about undefined methods ahead of time, for example:
o = Object.new
o.foo # => Raises NoMethodError.
class Object
def foo
puts "Foo!"
end
end
o.foo # => prints "Foo!", since the method is defined.
However, Ruby has a neat feature called method_missing which let's the receiver of a method call take the method name and arguments as separate arguments and handle accordingly as long as no defined method already handles the call.
def o.method_missing(sym, *args)
puts "OK: #{sym}(#{args.inspect})"
# Do something depending on the value of 'sym' and args...
end
o.bar(1, 2, 3) #=> OK: bar(1, 2, 3)
"Method missing" is used by things like active record find methods and other places where it could make sense to have "dynamically defined" functions.
The problem is, the interpreter tried to find it when you use it, and since it won't be there, it may fail.
In ( some ) compiled languages, it doesn't matter, because while compiling, the compiler may say "I'll look for this on a second pass" but I don't think this is the case with Ruby.

Resources