Ruby Koans - Regex and .sub: Don't understand reason behind answer - ruby

For clarification, here's the exact question in the about_regular_expressions.rb file that I'm having trouble with:
def test_sub_is_like_find_and_replace
assert_equal __, "one two-three".sub(/(t\w*)/) { $1[0, 1] }
end
I know what the answer is to this, but I don't understand what's happening to get that answer. I'm pretty new to Ruby and to regex, and in particular I'm confused about the code between the braces and how that's coming into play.

The code inside the braces is a block that sub uses to replace the match:
In the block form [...] The value returned by the block will be substituted for the match on each call.
The block receives the match as an argument but the usual regex variables ($1, $2, ...) are also available.
In this specific case, the $1 inside the block is "two" and the array notation extracts the first character of $1 (which is "t" in this case). So, the block returns "t" and sub replaces the "two" in the original string with just "t".

Related

Ruby: How does implicit return value work in blocks?

EDIT: My original question turned out not to be related to my actual problem, but as I learned something in the process, I decided to rephrase my initial statement.
I wanted to replace each space with a _ within a pair of brackets in a string. Here is my example input:
This is my string (nice, isn't it?). It can have various types [of input].
Desired output:
This is my string (nice,_isn't_it?). It can have various types [of_input].
I have the following code:
my_string = my_string.gsub(/\([^\(\)]+\)|\[[^\[\]]+\]/) { |bracketed|
bracketed.gsub(/ /, '_')
}
Why does bracketed.gsub(/ /, '_') equal to bracketed = bracketed.gsub(/ /, '_')? How is that different from gsub!? I don't fully understand the logic behind Ruby's assumption what to return here.
Why does bracketed.gsub(/ /, '_') equal to bracketed = bracketed.gsub(/ /, '_')? How is that different from gsub!?
gsub returns a new string.
gsub! changes the existing string.
So bracketed = bracketed.gsub(/ /, '_') and bracketed.gsub!(/ /, '_') are pretty much equivalent.
(There is only a minor behavioural difference in that gsub! will return nil if no pattern was matched. But either way, you'll mutate the original bracketed variable in the same way.)
However, you're asking the wrong question... Let's look again at your original code, which could be written as:
my_string.gsub!(/\([^)]+\)|\[[^\]]+\]/) do |bracketed|
bracketed.gsub(/ /, '_')
end
From the documentation for String#gsub:
In the block form, the current match string is passed in as a parameter [...] The value returned by the block will be substituted for the match on each call.
In ruby, the final evaluated line in a method/block is its implicit return value. All you are doing here is passing a value back up to the original gsub method; it doesn't matter whether or not you mutate/reassign the bracketed variable.
Perhaps this example will make things a little clearer:
"hello (world)".gsub!(/\([^)]+\)|\[[^\]]+\]/) do |bracketed|
bracketed = "something different"
"TEST!!!"
end
# => "hello (TEST!!!)"

Take an array and a letter as arguments and return a new array with words that contain that letter

I can run a search and find the element I want and can return those words with that letter. But when I start to put arguments in, it doesn't work. I tried select with include? and it throws an error saying, private method. This is my code, which returns what I am expecting:
my_array = ["wants", "need", 3, "the", "wait", "only", "share", 2]
def finding_method(source)
words_found = source.grep(/t/) #I just pick random letter
print words_found
end
puts finding_method(my_array)
# => ["wants", "the", "wait"]
I need to add the second argument, but it breaks:
def finding_method(source, x)
words_found = source.grep(/x/)
print words_found
end
puts finding_method(my_array, "t")
This doesn't work, (it returns an empty array because there isn't an 'x' in the array) so I don't know how to pass an argument. Maybe I'm using the wrong method to do what I'm after. I have to define 'x', but I'm not sure how to do that. Any help would be great.
Regular expressions support string interpolation just like strings.
/x/
looks for the character x.
/#{x}/
will first interpolate the value of the variable and produce /t/, which does what you want. Mostly.
Note that if you are trying to search for any text that might have any meaning in regular expression syntax (like . or *), you should escape it:
/#{Regexp.quote(x)}/
That's the correct answer for any situation where you are including literal strings in regular expression that you haven't built yourself specifically for the purpose of being a regular expression, i.e. 99% of cases where you're interpolating variables into regexps.

What is between { }?

There is a piece of code:
def test_sub_is_like_find_and_replace
assert_equal "one t-three", "one two-three".sub(/(t\w*)/) { $1[0, 1] }
end
I found it really hard to understand what is between { } braces. Could anyone explain it please?
The {...} is a block. Ruby will pass the matched value to the block, and substitute the return value of the block back into the string. The String#sub documentation explains this more fully:
In the block form, the current match string is passed in as a parameter, and variables such as $1, $2, $`, $&, and $' will be set appropriately. The value returned by the block will be substituted for the match on each call.
Edit: Per Michael's comment, if you're confused about $1[0, 1], this is just taking the first capture ($1) and taking a substring of it (the first character, specifically). $1 is a global variable set to the contents of the first capture after a regex (in true Perl fashion), and since it's a string, the #[] operator is used to take a substring of it starting at index 0, with a length of 1.
The sub method either takes two arguments, first being the text to replace replace and the second being the replacement, or one argument being the text to replace and a block defining how to handle the replacement.
The block method is useful if you can't define your replacement as a simple string.
For example:
"foo".sub(/(\w)/) { $1.upcase }
# => "Foo"
"foo".sub(/(\w+)/) { $1.upcase }
# => "FOO"
The gsub method works the same way, but applies more than once:
"foo".gsub(/(\w)/) { $1.upcase }
# => "FOO"
In all cases, $1 refers to the contents captured by the brackets (\w).
Your code, illustrated
r = "one two-three".sub(/(t\w*)/) do
$1 # => "two"
$1[0, 1] # => "t"
end
r # => "one t-three"
sub is taking in a regular expression in it. The $1 is a reserved global variable that contains the match for the regular expression.
The brackets represent a block of code used that will substitute the match with the string returned by the block. In this case
puts $1
#=> "two"
puts $1[0, 1]
#=> "t"

How to change case of letters in string using RegEx in Ruby

Say I have a string : "hEY "
I want to convert it to "Hey "
string.gsub!(/([a-z])([A-Z]+ )/, '\1'.upcase)
That is the idea I have, but it seems like the upcase method does nothing when I use it within the gsub method. Why is that?
EDIT: I came up with this method:
string.gsub!(/([a-z])([A-Z]+ )/) { |str| str.downcase!.capitalize! }
Is there a way to do this within the regex though? I don't really understand the '\1' '\2' thing. Is that backreferencing? How does that work
#sawa Has the simple answer, and you've edited your question with another mechanism. However, to answer two of your questions:
Is there a way to do this within the regex though?
No, Ruby's regex does not support a case-changing feature as some other regex flavors do. You can "prove" this to yourself by reviewing the official Ruby regex docs for 1.9 and 2.0 and searching for the word "case":
https://github.com/ruby/ruby/blob/ruby_1_9_3/doc/re.rdoc
https://github.com/ruby/ruby/blob/ruby_2_0_0/doc/re.rdoc
I don't really understand the '\1' '\2' thing. Is that backreferencing? How does that work?
Your use of \1 is a kind of backreference. A backreference can be when you use \1 and such in the search pattern. For example, the regular expression /f(.)\1/ will find the letter f, followed by any character, followed by that same character (e.g. "foo" or "f!!").
In this case, within a replacement string passed to a method like String#gsub, the backreference does refer to the previous capture. From the docs:
"If replacement is a String it will be substituted for the matched text. It may contain back-references to the pattern’s capture groups of the form \d, where d is a group number, or \k<n>, where n is a group name. If it is a double-quoted string, both back-references must be preceded by an additional backslash."
In practice, this means:
"hello world".gsub( /([aeiou])/, '_\1_' ) #=> "h_e_ll_o_ w_o_rld"
"hello world".gsub( /([aeiou])/, "_\1_" ) #=> "h_\u0001_ll_\u0001_ w_\u0001_rld"
"hello world".gsub( /([aeiou])/, "_\\1_" ) #=> "h_e_ll_o_ w_o_rld"
Now, you have to understand when code runs. In your original code…
string.gsub!(/([a-z])([A-Z]+ )/, '\1'.upcase)
…what you are doing is calling upcase on the string '\1' (which has no effect) and then calling the gsub! method, passing in a regex and a string as parameters.
Finally, another way to achieve this same goal is with the block form like so:
# Take your pick of which you prefer:
string.gsub!(/([a-z])([A-Z]+ )/){ $1.upcase << $2.downcase }
string.gsub!(/([a-z])([A-Z]+ )/){ [$1.upcase,$2.downcase].join }
string.gsub!(/([a-z])([A-Z]+ )/){ "#{$1.upcase}#{$2.downcase}" }
In the block form of gsub the captured patterns are set to the global variables $1, $2, etc. and you can use those to construct the replacement string.
I don't know why you are trying to do it in a complicated way, but the usual way is:
"hEY".capitalize # => "Hey"
If you insist in using a regex and upcase, then you would also need downcase:
"hEY".downcase.sub(/\w/){$&.upcase} # => "Hey"
If you really want to just swap the case of every letter in the string, you can avoid the complexity of regex entirely because There's A Method For That™.
"hEY".swapcase # => "Hey"
"HellO thERe".swapcase # => "hELLo THerE"
There's also swapcase! to do it destructively.

What Does This Ruby/RegEx Code Do?

I'm going through Beginning Ruby From Novice To Professional 2nd Edition and am currently on page 49 where we are learning about RegEx basics. Each RegEx snippet in the book has a code trailing it that hasn't been explained.
{ |x| puts x }
In context:
"This is a test".scan(/[a-m]/) { |x| puts x }
Could someone please clue me in?
A method such as scan is an iterator; in this case, each time the passed regex is matched, scan does something programmer-specified. In Ruby, the "something" is expressed as a block, represented by { code } or do code end (with different precedences), which is passed as a special parameter to the method. A block may start with a list of parameters (and local variables), which is the |x| part; scan invokes the block with the string it matched, which is bound to x inside the block. (This syntax comes from Smalltalk.)
So, in this case, scan will invoke its block parameter every time /[a-m]/ matches, which means on every character in the string between a and m.
It prints all letters in the string between a and m: http://ideone.com/lKaoI
|x| puts x is an annonymouse function, (or a "block", in ruby, as far as I can tell, or a lambda in other languages), that prints its argument.
More information on that can be found in:
Wikipedia - Ruby - Blocks and iterators
Understanding Ruby Blocks, Procs and Lambdas
The output is
h
i
i
a
e
Each character of the string "This is a test" is checked against the regular expression [a-m] which means "exactly one character in the range a..m, and is printed on its own line (via puts) if it matches. The first character T does not match, the second one h does match, etc. The last one that does is the e in "test".
In the context of your book's examples, it's included after each expression because it just means "Print out every match."
It is a code block, which runs for each match of the regular expression.
{ } creates the code block.
|x| creates the argument for the code block
puts prints out a string, and x is the string it prints.
The regular expression matches any single character in the character class [a-m]. Therefore, there are five different matches, and it prints out:
h
i
i
a
e
The { |x| puts x } defines a new block that takes a single argument named x. When the block is called, it passes its argument x to puts.
Another way to write the same thing would be:
"This is a test".scan(/[a-m]/) do |x|
puts x
end
The block gets called by the scan function each time the regular expression matches something in the string, so each match will get printed.
There is more information about blocks here:
http://www.ruby-doc.org/docs/ProgrammingRuby/html/tut_containers.html

Resources