Why does `gsub` call `to_hash`? - ruby

I am writing a DSL. I don't want users to have to quote the arguments to pass strings, therefore I overwrite method_missing to convert an unknown method to a string. In the following example, create is the DSL method, and I wanted user to type arg1 and arg2 without the quotes.
def method_missing(m, *arg)
m.to_s
end
def create(*args)
arg1.gsub(#do something here)
end
create arg1 arg2
However, this raises and error when I use gsub on the 'string':
'gsub': can't convert String to Hash (String#to_hash gives String) (TypeError)
I guess the method_missing overwriting is messed it up since it looks like gsub is calling String#to_hash, which is not a method in String, thus it is routed to method_missing.
I am wondering why gsub calls String#to_hash, or whether there is any other way to let users of the DSL not have to type quotes, without overwriting method_missing.

String#gsub does different things depending on the argument count and types, and if a block was given:
gsub(pattern, replacement) → new_str
gsub(pattern, hash) → new_str
gsub(pattern) {|match| block } → new_str
gsub(pattern) → enumerator
The second one is documented as:
If the second argument is a Hash, and the matched text is one of its keys, the corresponding value is the replacement string.
But how to distinguish it from the first? Both take two arguments! That's a little bit complicated but in your case Ruby (well, the reference implementation called CRuby or MRI to be exact) starts with checking if the second argument has the internal type T_HASH (it doesn't as it's most likely T_STRING due to #to_s), then it checks if #to_hash can be called. Either because it responds to it or #method_missing can instead. You have defined it so Ruby calls it. However it doesn't return a T_HASH and that is the cause of the exception you've posted.
A possible solution is defining main.method_missing and not Object#method_missing (as String inherits from Object):
def self.method_missing(m, *arg)
m.to_s
end
However I recommend sticking to quotes or writing your own small parser for this kind of file if it shouldn't adhere to Ruby's syntax. Using *_missing may be the cause of confusing or unhelpful error messages. Or even none (I guess create arg1 arg2 should've been create arg1, arg2).

gsub probably uses method_missing itself somewhere, so it seems that defining it globally there is causing internal issues with the method call. If you're going to use method_missing make sure you always define it in a module or a class:
module CoolDSL
def self.method_missing(m, *arg)
m.to_s
end
def self.create(*args)
args[0].gsub(/1/, "2")
end
def self.do_thing
create arg1 arg2
end
end
CoolDSL.do_thing
Naturally, that's not exactly useful as a DSL, so you'll want to learn the power of instance_eval and yield. I like this guide.

Related

Ruby koan 280 - where is the reference to to_str?

I've had a look around and can't find this question:
For ruby koan 280 it's telling me the following underscore section should be false:
def test_to_str_allows_objects_to_be_treated_as_strings
assert_equal __, File.exist?(CanBeTreatedAsString.new) # test passes, if __ is changed to false
end
OK, fine. But how does this test that to_str allows objects to be treated as Strings? Here is the CanBeTreatedAsString class, which DOES include a to_str method:
class CanBeTreatedAsString
def to_s
"string-like"
end
def to_str
to_s
end
end
...but how is that relevant to the assert_equal code above? Is it that .exist? expects a String?
This page:
http://www.ruby-doc.org/core-2.2.0/File.html#method-c-exist-3F
says the parameter can be an IO object. Are some methods specific about the parameter types they receive? And if so, how do I tell?
File.exist? takes a string or an IO. Part of how it does that is by calling to_str on the object. A string returns itself for to_str. Otherwise, it's only supposed to be implemented on objects that can be used as a string.
Due to Ruby's duck typing conventions, there isn't an easy way to tell. However, usually, if a method accepts a string, then it will call String.try_convert (which uses to_str) to allow duck typing. In a similar fashion, many objects that expect an int call Integer.try_convert (which calls to_int) to convert the argument.
Here's more information on the various conversion protocols: http://pivotallabs.com/messages-not-types-exploring-rubys-conversion-protocols/
EDIT: Forgot to add the how can you tell

Ruby difference between send and instance_eval?

I know send takes string or symbol with arguments while instance_eval takes string or block, and their difference could be apparent given receivers.
My question is what the 'under the hood' difference is for the example below?
1234.send 'to_s' # '1234'
1234.instance_eval 'to_s' # '1234'
From the fine manual:
send(symbol [, args...]) → obj
send(string [, args...]) → obj
Invokes the method identified by symbol, passing it any arguments specified. [...] When the method is identified by a string, the string is converted to a symbol.
and for instance_eval:
instance_eval(string [, filename [, lineno]] ) → obj
instance_eval {| | block } → obj
Evaluates a string containing Ruby source code, or the given block, within the context of the receiver (obj). In order to set the context, the variable self is set to obj while the code is executing, giving the code access to obj’s instance variables.
So send executes a method whereas instance_eval executes an arbitrary block of code (as a string or block) with self set to the object that you're calling instance_eval on.
In your case, there isn't much difference as the string you're handing to instance_eval is just a single method. The main difference is that anyone reading your code (including you in six months) will be wondering why you're using instance_eval to call a single method.
You might also be interested in Object#public_send and BasicObject#__send__
Whatever you can do with send is a proper subset of that of instance_eval. Namely, the argument to send has to be a single method (and its arguments), whereas the argument to instance_method is an arbitrary code. So whenever you have send, you can rewrite it with instance_eval, but not vice versa.
However, performancewise, send is much faster than instance_eval since there is no additional parsing required to execute send, whereas instance_eval needs to parse the whole argument.
In your example, the result will be the same, but the first one will run faster.

In Ruby, how do sub, gsub (and other text methods) in shell one-liners work without referring to an object?

I saw this piece of code somewhere on the web:
ruby -pe 'gsub /^\s*|\s*$/, ""'
Evidently this piece of code removes leading and trailing whitespace from each line from STDIN.
I understand the regex and replacement, no problem, but what I don't get is how the method gsub is receiving an object to act upon. I understand that the -p flag wraps this whole thing in a while gets; print; ... ; end block, but how does gsub receive the string to act upon? At the very least, shouldn't it be a $_.gsub(..) instead? How does the current input line get "magically" passed to gsub?
Does the code in these Perl-like one-liners get interpreted in a somewhat different manner? I'm looking for a general idea of the differences from traditional, script-based Ruby code. Haven't found a comprehensive set of resources on this, I'm afraid.
It turns out that this is an instance method defined on Kernel, which magically gets turned on only when you use the -p or -n flag.
ruby -pe 'puts method(:gsub);'
#<Method: Object(Kernel)#gsub>
See the documentation here.
Other magical methods I found are chop, print, and sub.
The magical methods are all sent to $_ implicitly.
Easy:
class Object
def gsub(*args, &block)
$_.gsub(*args, &block)
end
end
Since every object is an instance of Object (well, almost every object), every object has a gsub method now. So, you can call
some_object.gsub('foo', 'bar')
on any object, and it will just work. And since it doesn't matter what object you call it on, because it doesn't actually do anything with that object, you might just as well call it on self:
self.gsub('foo', 'bar')
Of course, since self is the implicit receiver, this is the same as
gsub('foo', 'bar')
For methods such as this, which don't actually depend on the receiver, and are only added to the Object class for convenience reasons, it is a common convention to make them private so that you cannot accidentally call them with an explicit receiver and then somehow get confused into thinking that this method does something to the receiver.
Also, it is common to put such methods (which are actually intended to be used more like procedures than methods, i.e. completely independent of their receiver) into the Kernel mixin, which is mixed into Object instead of directly into the Object class to distinguish them from methods that are available to every object but actually do depend on its internal state, such as Object#class, Object#to_s etc.
module Kernel
private
def gsub(*args, &block)
$_.gsub(*args, &block)
end
end
Other methods that are defined in this way, which you may have come across already are require, load, puts, print, p, gets, loop, raise, rand, throw, catch, lambda, proc, eval, Array, Integer, Float etc.

Are "begin" and "end" reserved words or not?

I'm kind of confused about reserved words in Ruby.
"The Ruby Programming Language", co-authored by Matz, says that begin and end are reserved words of the language. They're certainly used syntactically to mark out blocks.
However, range objects in the language have methods named begin and end, as in
(1..10).end
=> 10
Now, testing this out, I find that, indeed, I can define methods named "begin" and "end" on objects, though if I try to name a variable "begin" it fails. (Here's a sample of using it as a method name, it actually works...:)
class Foo
def begin
puts "hi"
end
end
Foo.new.begin
So, I suppose I'm asking, what actually is the status of reserved words like this? I would have imagined that they couldn't be used for method names (and yet it seems to work) or that at the very least it would be terrible style (but it is actually used in the core language for the Range class).
I'm pretty confused as to when they're allowed to be used and for what. Is there even documentation on this?
Yes, they are reserved words. Yes, they can be used for method names. No, you can't call them without an explicit receiver. It's probably not a good idea anyway.
class Foo
def if(foo)
puts foo
end
end
Foo.new.if("foo") # outputs foo, returns nil
Update: Here's a quote from "The Ruby Programming Language", by Matz (the creator of Ruby) himself:
In most languages, these words would be called “reserved words” and
they would be never allowed as identifiers. The Ruby parser is
flexible and does not complain if you prefix these keywords with #,
##, or $ prefixes and use them as instance, class, or global variable
names. Also, you can use these keywords as method names, with the
caveat that the method must always be explicitly invoked through an
object.
When they are given in a form that is unambiguously a method call, you can use them. If you have a period in front of it .begin or have parentheses after is begin(), then it is unambiguously a method call. When you try to use it as a variable begin, it is ambiguous (in principle).
Actually, as Perry, notes, begin() might be tricky. I checked with irb with Ruby 1.9.3, and the following strange thing happens:
irb(main):001:0> def begin(foo)
irb(main):002:1> puts 'a'
irb(main):003:1> end
=> nil
irb(main):004:0> begin(3)
irb(main):005:1>
irb(main):006:1* end
=> 3
It is not defined, and what looks like a method call might be just a block returning the last-evaluated 3. But the lines around def begin(foo) remains mystery.

Unexpected behavior from Ruby 'super' keyword - Nokogiri inheritance

The rules of Ruby's super keyword is that if it is called without arguments, all of the original arguments are forwarded. If it is called with explicit arguments, the explicit arguments are exclusively passed in.
In this example, arguments should never be forwarded, since I am calling super with exact arguments.
Example:
#doc = Nokogiri::HTML::DocumentFragment.parse("<body></body>")
class Cat < Nokogiri::XML::Node
def initialize(arg1, arg2)
super("cat", arg2) # Pass arg2 to super
# Do something with arg1 later
end
end
When calling: Cat.new("dog", #doc) I expect to get back a <cat></cat> tag, and I expect the first argument to be ignored. Instead I am getting a <dog></dog> tag.
Is there a reason this case would defy expected behavior?
If you look at the source to nokogiri, it's actually the new method that sets the node's name, not the initialize method. Nothing mysterious is happening with regards to invoking super, it's just that the initialize method doesn't do anything with those arguments.
I assume this is because the new method is the one that is supposed to be allocating storage and so on for the object, which in nokogiri's case means creating the underlying libxml node, which is the thing that contain's the node's name.

Resources