What Does This Ruby/RegEx Code Do?

What Does This Ruby/RegEx Code Do? - ruby

I'm going through Beginning Ruby From Novice To Professional 2nd Edition and am currently on page 49 where we are learning about RegEx basics. Each RegEx snippet in the book has a code trailing it that hasn't been explained.
{ |x| puts x }
In context:
"This is a test".scan(/[a-m]/) { |x| puts x }
Could someone please clue me in?

A method such as scan is an iterator; in this case, each time the passed regex is matched, scan does something programmer-specified. In Ruby, the "something" is expressed as a block, represented by { code } or do code end (with different precedences), which is passed as a special parameter to the method. A block may start with a list of parameters (and local variables), which is the |x| part; scan invokes the block with the string it matched, which is bound to x inside the block. (This syntax comes from Smalltalk.)
So, in this case, scan will invoke its block parameter every time /[a-m]/ matches, which means on every character in the string between a and m.

It prints all letters in the string between a and m: http://ideone.com/lKaoI
|x| puts x is an annonymouse function, (or a "block", in ruby, as far as I can tell, or a lambda in other languages), that prints its argument.
More information on that can be found in:
Wikipedia - Ruby - Blocks and iterators
Understanding Ruby Blocks, Procs and Lambdas

The output is
h
i
i
a
e
Each character of the string "This is a test" is checked against the regular expression [a-m] which means "exactly one character in the range a..m, and is printed on its own line (via puts) if it matches. The first character T does not match, the second one h does match, etc. The last one that does is the e in "test".

In the context of your book's examples, it's included after each expression because it just means "Print out every match."
It is a code block, which runs for each match of the regular expression.
{ } creates the code block.
|x| creates the argument for the code block
puts prints out a string, and x is the string it prints.
The regular expression matches any single character in the character class [a-m]. Therefore, there are five different matches, and it prints out:
h
i
i
a
e

The { |x| puts x } defines a new block that takes a single argument named x. When the block is called, it passes its argument x to puts.
Another way to write the same thing would be:
"This is a test".scan(/[a-m]/) do |x|
puts x
end
The block gets called by the scan function each time the regular expression matches something in the string, so each match will get printed.
There is more information about blocks here:
http://www.ruby-doc.org/docs/ProgrammingRuby/html/tut_containers.html

Related

Extracting unique words

I need to take a file name and an integer N, and return the first N unique words in the file given. Let us say that input.txt has this content:
I like pancakes in my breakfast. Also, I like pancakes in my dinner.
The output of running this with N = 13 could be
I
like
pancakes
in
my
breakfast.
Also,
dinner.
I know how to open the file and read line by line, but beyond that, I don't know how to take the unique words out if the lines.

Let's first create a test file.
str =<<END
We like pancakes for breakfast,
but we know others like waffles.
END
FName = 'temp'
File.write(FName, str)
#=> 65 (characters written)
We need to return an array containing the first nbr_unique unique words from the file named file, so let's write a method that will do that.
def unique_words(fname, nbr_unique)
<code needed here>
end
You need to add unique words to an array that will be returned by this method, so let's begin by creating an empty array and then return that array at the end of the method.
def unique_words(fname, nbr_unique)
arr = []
<code needed here>
arr
end
You know how to read a file line-by-line, so let's do that, using the class method IO::foreach1.
def unique_words(fname, nbr_unique)
arr = []
File.foreach(fname) do |line|
<code need here to process line>
end
arr
end
The block variable line equals "We like pancakes for breakfast,\n" after the first line is read. Firstly, the newline character needs to be removed. Examine the methods of the class
String to see if one can be used to do that.
The second line contains the word "we". I assume "We" and "we" are not to be regarded as unique words. This is usually handled by converting all characters of a string to either all lowercase or all uppercase. You can do this to each line or to each word (after words have been extracted from a line). Again, look for a suitable method in the class String for doing this.
Next you need to extract words from each line. Once again, look for a String method for doing that.
Next we need to determine if, say, "like" (or "LIKE") is to be added to the array arr. Look at the instance methods for the class Array for a suitable method. If it is added we need to see if arr now contains nbr_unique words. If it does we don't need to read any more lines of the file, so we need to break out of foreach's block (perhaps use the keyword break).
There's one more thing we need to take care of. The first line contains "breakfast,", the second, "waffles.". We obviously don't want the words returned to contain punctuation. There are two ways to do that. The first is to remove the punctuation, the second is to accept only letters.
Given a string that contains punctuation (a line or a word) we can create a second string that equals the original string with the punctuation removed. One way to do that is to use the method String#tr. Suppose the string is "breakfast,". Then
"breakfast,".tr(".,?!;:'", "") #=> "breakfast"
To only accept letters we could use any of the following regular expressions (all return "breakfast"):
"breakfast,".gsub(/[a-zA-Z]+/, "")
"breakfast,".gsub(/[a-z]+/i, "")
"breakfast,".gsub(/[[:alphaa:]]+/, "")
"breakfast,".gsub(/\p{L}+/, "")
The first two work with ASCII characters only. The third (POSIX) and fourth work (\p{} construct) with Unicode (search within Regexp).
Note that it is more efficient to remove punctuation from a line before words are extracted.
Extra credit: use Enumerator#with_object
Whenever you see an object (here arr) initialized to be be empty, manipulated and then returned at the end of a method, you should consider using the method Enumerator#with_object or (more commonly), Enumerable#each_with_object. Both of these return the object referred to in the method name.
The method IO::foreach returns an enumerator (an instance of the class Enumerator) when it does not have a block (see doc). We therefore could write
def unique_words(fname, nbr_unique)
File.foreach(fname).with_object([]) do |line, arr|
<code need here to process line>
end
end
We have eliminated two lines (arr = [] and arr), but have also confined arr's scope to the block. This is not a big deal but is the Ruby way.
More extra credit: use methods of the class Set
Suppose we wrote the following.
require 'set'
def unique_words(fname, nbr_unique)
File.foreach(fname).with_object(Set.new) do |line, set|
<code need here to process line>
end.to_a
end
When we extract the word "we" from the second line we need to check if it should be added to the set. Since sets have unique elements we can just try to do it. We won't be able to do that because set will already contain that word from the first line of the file. A handy method for doing that is Set#add?:
set.add?("we")
#=> nil
Here the method returns nil, meaning the set already contains that word. It also tells us that we don't need to check if the set now contains nbr_unique words. Had we been able to add the word to the set, set (with the added word) would be returned.
The block returns the value of set (a set). The method Set#to_a converts that set to an array, which is returned by the method.
1 Notice that I've invoked the class method IO::foreach by writing File.foreach(fname)... below. This is permissible because File is a subclass of IO (File.superclass #=> IO). I could have instead written IO.foreach(fname)..., but it is more common to use File as the receiver.

Take an array and a letter as arguments and return a new array with words that contain that letter

I can run a search and find the element I want and can return those words with that letter. But when I start to put arguments in, it doesn't work. I tried select with include? and it throws an error saying, private method. This is my code, which returns what I am expecting:
my_array = ["wants", "need", 3, "the", "wait", "only", "share", 2]
def finding_method(source)
words_found = source.grep(/t/) #I just pick random letter
print words_found
end
puts finding_method(my_array)
# => ["wants", "the", "wait"]
I need to add the second argument, but it breaks:
def finding_method(source, x)
words_found = source.grep(/x/)
print words_found
end
puts finding_method(my_array, "t")
This doesn't work, (it returns an empty array because there isn't an 'x' in the array) so I don't know how to pass an argument. Maybe I'm using the wrong method to do what I'm after. I have to define 'x', but I'm not sure how to do that. Any help would be great.

Regular expressions support string interpolation just like strings.
/x/
looks for the character x.
/#{x}/
will first interpolate the value of the variable and produce /t/, which does what you want. Mostly.
Note that if you are trying to search for any text that might have any meaning in regular expression syntax (like . or *), you should escape it:
/#{Regexp.quote(x)}/
That's the correct answer for any situation where you are including literal strings in regular expression that you haven't built yourself specifically for the purpose of being a regular expression, i.e. 99% of cases where you're interpolating variables into regexps.

How does "each" function work in Ruby (and therefor Rails)?

In the book I'm reading to learn Rails (RailsSpace) , the author creates two functions (below) to turn all caps city names like LOS ANGELES into Los Angeles. There's something I don't get about the first function, below, however.
Namely, where does "word" come from? I understand that "word" is a local/block variable that disappears after the function has been completed, but what is being passed into/assigned to "word." IN other words, what is being split?
I would have expected there to have been some kind of argument taking an array or hash passed into this function...and then the "each" function run over that..
def capitalize_each
space = " "
split(space).each{ |word| word.capitalize! }.join(space)
end
# Capitalize each word in place.
def capitalize_each!
replace capitalize_each end
end

Let's break this up.
split(space)
turns the string into a list of would-be words. (Actually, if the string has two spaces in a row, the list will have an empty string in it. but that doesn't matter for this purpose.) I assume this is an instance method in String; otherwise, split wouldn't be defined.
.each { |word| word.capitalize! }
.each takes each thing in the list (returned by split), and runs the following block on it, passing the thing as an arg to the block. The |word| says that this block is going to call the arg "word". So effectively, what this does is capitalize each word in the string (and each blank string and lonely bit of punctuation too, but again, that's not important -- capitalization doesn't change characters that have no concept of case).
.join(space)
glues the words back together, reinserting the space that was used to separate them before. The string it returns is the return value of the function as well.

At first I thought that the method was incomplete because of the absence of self at the beginning but it seems that even without it split is being called over the string given, space would simply be a default separator. This is how the method could look with explicit self.
class String
def capitalize_each(separator = ' ')
self.split(separator).each{|word| word.capitalize!}.join(separator)
end
end
puts "LOS ANGELES".capitalize_each #=> Los Angeles
puts "LOS_ANGELES".capitalize_each('_') #=> Los_Angeles

The string is being split by spaces, i.e. into words.
So the 'each' iterator goes through all the words, one by one, each time the word is in the 'word' object. So then for that object (word) it uses the capitalize function for it. Finally it all gets joined back together With Spaces. So The End Result is Capitalized.

These methods are meant to be defined in the String class, so what is being split is whatever string you are calling the capitalize_each method on.
Some example usage (and a slightly better implementation):
class String
def capitalize_each
split(/\s+/).each{ |word| word.capitalize! }.join " "
end
def capitalize_each!
replace capitalize_each
end
end
puts "hi, i'm a sentence".capitalize_each #=> Hi, I'm A Sentence

Think of |word| word.capitalize! as a function whch you're passing into the each method. The function has one argument (word) and simply evaluates .capitalize! on it.
Now what the each method is doing is taking each item in split(space) and evaluating your function on it. So:
"abcd".each{|x| print x}
will evaluate, in order, print "a", print "b", print "c".
http://www.ruby-doc.org/core/classes/Array.html#M000231
To demystify this behavior a bit, it helps to understand exactly what it means to "take each item in __". Basically, any object which is enumerable can be .eached in this way.

If you're referring to how it gets into your block in the first place, it's yielded into the block. #split returns an Array, and it's #each method is doing something along the lines of:
for object in stored_objects
yield object
end

This works, but if you want to turn one array into another array, it's idiomatically better to use map instead of each, like this:
words.map{|word|word.capitalize}
(Without the trailing !, capitalize makes a new string instead of modifying the old string, and map collects those new strings into a new array. In contrast, each returns the old array.)
Or, following gunn's lead:
class String
def capitalize_each
self.split(/\s/).map{|word|word.capitalize}.join(' ')
end
end
"foo bar baz".capitalize_each #=> "Foo Bar Baz"
by default, split splits on strings of spaces, but by passing a regular expression it matches each individual space characters even if they're in a row.

Ruby Koans - Regex and .sub: Don't understand reason behind answer

For clarification, here's the exact question in the about_regular_expressions.rb file that I'm having trouble with:
def test_sub_is_like_find_and_replace
assert_equal __, "one two-three".sub(/(t\w*)/) { $1[0, 1] }
end
I know what the answer is to this, but I don't understand what's happening to get that answer. I'm pretty new to Ruby and to regex, and in particular I'm confused about the code between the braces and how that's coming into play.

The code inside the braces is a block that sub uses to replace the match:
In the block form [...] The value returned by the block will be substituted for the match on each call.
The block receives the match as an argument but the usual regex variables ($1, $2, ...) are also available.
In this specific case, the $1 inside the block is "two" and the array notation extracts the first character of $1 (which is "t" in this case). So, the block returns "t" and sub replaces the "two" in the original string with just "t".

Ruby 1.9 prints strings twice

class Test
def printsomething
p "lol"
end
end
teet = Test.new
p "#{teet.printsomething}"
Output for above code is "lol"\n"lol"
why is this happening? I am running ruby 1.9.2 Archlinux x86_64

p is an inspect not really meant to be used to output text string. What it does is prints out the literal content of an object not an escaped string.
Just replace p with puts
You can see what I mean if you do this:
p "#{teet}"
=> "#<Test:0x00000100850678>"
Notice how it's inside quotes.

First thing that Ruby does when it sees a double-quoted string is replacing the #{expr} parts with the result of evaluated expr. For example, "#{2+2}" becomes "4". So, let's see what happens here. Ruby evaluates teet.printsomething. During this evaluation it executes the method and it prints "lol" in the 3-rd line. Note that although the method printsomething doesn't have a return statement, it nevertheless returns some value: it's the value returned by the last statement of that method. The return value of p object is the object itself, so the result of the method printsomething is "lol". This result now replaces the #{} part in the string, and the string becomes "lol" instead of "#{teet.printsomething}". Now the p method in the 7-th line is executed and outputs "lol" again.
What happens if you replace p with puts? The difference is that the return value of puts is nil. When the result of expr is nil, the whole expression #{} is replaced by empty string. So the 7-th line becomes puts "". As a result, the whole program outputs "lol" followed by an empty line.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

What Does This Ruby/RegEx Code Do? - ruby

Related

Extracting unique words

Take an array and a letter as arguments and return a new array with words that contain that letter

How does "each" function work in Ruby (and therefor Rails)?

Ruby Koans - Regex and .sub: Don't understand reason behind answer

Ruby 1.9 prints strings twice

Categories

Resources