In the book I'm reading to learn Rails (RailsSpace) , the author creates two functions (below) to turn all caps city names like LOS ANGELES into Los Angeles. There's something I don't get about the first function, below, however.
Namely, where does "word" come from? I understand that "word" is a local/block variable that disappears after the function has been completed, but what is being passed into/assigned to "word." IN other words, what is being split?
I would have expected there to have been some kind of argument taking an array or hash passed into this function...and then the "each" function run over that..
def capitalize_each
space = " "
split(space).each{ |word| word.capitalize! }.join(space)
end
# Capitalize each word in place.
def capitalize_each!
replace capitalize_each end
end
Let's break this up.
split(space)
turns the string into a list of would-be words. (Actually, if the string has two spaces in a row, the list will have an empty string in it. but that doesn't matter for this purpose.) I assume this is an instance method in String; otherwise, split wouldn't be defined.
.each { |word| word.capitalize! }
.each takes each thing in the list (returned by split), and runs the following block on it, passing the thing as an arg to the block. The |word| says that this block is going to call the arg "word". So effectively, what this does is capitalize each word in the string (and each blank string and lonely bit of punctuation too, but again, that's not important -- capitalization doesn't change characters that have no concept of case).
.join(space)
glues the words back together, reinserting the space that was used to separate them before. The string it returns is the return value of the function as well.
At first I thought that the method was incomplete because of the absence of self at the beginning but it seems that even without it split is being called over the string given, space would simply be a default separator. This is how the method could look with explicit self.
class String
def capitalize_each(separator = ' ')
self.split(separator).each{|word| word.capitalize!}.join(separator)
end
end
puts "LOS ANGELES".capitalize_each #=> Los Angeles
puts "LOS_ANGELES".capitalize_each('_') #=> Los_Angeles
The string is being split by spaces, i.e. into words.
So the 'each' iterator goes through all the words, one by one, each time the word is in the 'word' object. So then for that object (word) it uses the capitalize function for it. Finally it all gets joined back together With Spaces. So The End Result is Capitalized.
These methods are meant to be defined in the String class, so what is being split is whatever string you are calling the capitalize_each method on.
Some example usage (and a slightly better implementation):
class String
def capitalize_each
split(/\s+/).each{ |word| word.capitalize! }.join " "
end
def capitalize_each!
replace capitalize_each
end
end
puts "hi, i'm a sentence".capitalize_each #=> Hi, I'm A Sentence
Think of |word| word.capitalize! as a function whch you're passing into the each method. The function has one argument (word) and simply evaluates .capitalize! on it.
Now what the each method is doing is taking each item in split(space) and evaluating your function on it. So:
"abcd".each{|x| print x}
will evaluate, in order, print "a", print "b", print "c".
http://www.ruby-doc.org/core/classes/Array.html#M000231
To demystify this behavior a bit, it helps to understand exactly what it means to "take each item in __". Basically, any object which is enumerable can be .eached in this way.
If you're referring to how it gets into your block in the first place, it's yielded into the block. #split returns an Array, and it's #each method is doing something along the lines of:
for object in stored_objects
yield object
end
This works, but if you want to turn one array into another array, it's idiomatically better to use map instead of each, like this:
words.map{|word|word.capitalize}
(Without the trailing !, capitalize makes a new string instead of modifying the old string, and map collects those new strings into a new array. In contrast, each returns the old array.)
Or, following gunn's lead:
class String
def capitalize_each
self.split(/\s/).map{|word|word.capitalize}.join(' ')
end
end
"foo bar baz".capitalize_each #=> "Foo Bar Baz"
by default, split splits on strings of spaces, but by passing a regular expression it matches each individual space characters even if they're in a row.
Related
So here is the string I want to convert to an array, where then I want to reverse each word without reversing the entire sentence, and then join them back and provide the output.
For instance, I want to change "Hello there, and how are you?" to "olleH ,ereht dna woh era ?uoy"
This is the string:
sentence1="Hello there, and how are you?"
and, this is my code in which I have to incorporate .each(which i know is wrong, but don't know how)
def reverse_each_word(sentence1)
split_array = sentence1.split
reversed_array = split_array.reverse
reversed_array.each do |joined_array|
joined_array.join(' ')
end
end
and as mentioned, the desired result has to be:
"olleH ,ereht dna woh era ?uoy"
You're calling join in a string, since you're iterating over each element in reversed_array, and all those ones are string objects:
p sentence1.split.first.join(' ')
# undefined method `join' for "Hello":String (NoMethodError)
It might work if you use something to store the value in each iteration within the block, it can be a variable declared outside the iteration, or better map, after that, you can just reverse each string and then join everything:
def reverse_each_word(sentence1)
sentence1.split.map do |joined_array|
joined_array.reverse
end.join(' ')
end
p reverse_each_word(sentence1) # "olleH ,ereht dna woh era ?uoy"
Notice this can be written as sentence1.split.map(&:reverse).join(' ') too.
In case you're looking for each to solve this problem, you'll need a variable where to store each "modified" string as long as you're iterating over each of those elements:
memo = ''
sentence1.split.each { |joined_array| memo << "#{joined_array.reverse} " }
p memo.rstrip # "olleH ,ereht dna woh era ?uoy"
There you have a memo variable which is an empty string, just for the reason to be filled with each reversed string, you reverse the string and add a white space to the right. The last string is going to have an additional whitespace, so rstrip helps you to "remove" it.
For collect you can use the map approach, because they're aliases.
I would be inclined to use String#gsub with a regular expression.
str = "Hello there, and how are you?"
str.gsub(/\S+/) { |s| s.reverse }
#=> "olleH ,ereht dna woh era ?uoy"
The regular expression reads, "match one or more characters other than whitespace characters".
I need to take a file name and an integer N, and return the first N unique words in the file given. Let us say that input.txt has this content:
I like pancakes in my breakfast. Also, I like pancakes in my dinner.
The output of running this with N = 13 could be
I
like
pancakes
in
my
breakfast.
Also,
dinner.
I know how to open the file and read line by line, but beyond that, I don't know how to take the unique words out if the lines.
Let's first create a test file.
str =<<END
We like pancakes for breakfast,
but we know others like waffles.
END
FName = 'temp'
File.write(FName, str)
#=> 65 (characters written)
We need to return an array containing the first nbr_unique unique words from the file named file, so let's write a method that will do that.
def unique_words(fname, nbr_unique)
<code needed here>
end
You need to add unique words to an array that will be returned by this method, so let's begin by creating an empty array and then return that array at the end of the method.
def unique_words(fname, nbr_unique)
arr = []
<code needed here>
arr
end
You know how to read a file line-by-line, so let's do that, using the class method IO::foreach1.
def unique_words(fname, nbr_unique)
arr = []
File.foreach(fname) do |line|
<code need here to process line>
end
arr
end
The block variable line equals "We like pancakes for breakfast,\n" after the first line is read. Firstly, the newline character needs to be removed. Examine the methods of the class
String to see if one can be used to do that.
The second line contains the word "we". I assume "We" and "we" are not to be regarded as unique words. This is usually handled by converting all characters of a string to either all lowercase or all uppercase. You can do this to each line or to each word (after words have been extracted from a line). Again, look for a suitable method in the class String for doing this.
Next you need to extract words from each line. Once again, look for a String method for doing that.
Next we need to determine if, say, "like" (or "LIKE") is to be added to the array arr. Look at the instance methods for the class Array for a suitable method. If it is added we need to see if arr now contains nbr_unique words. If it does we don't need to read any more lines of the file, so we need to break out of foreach's block (perhaps use the keyword break).
There's one more thing we need to take care of. The first line contains "breakfast,", the second, "waffles.". We obviously don't want the words returned to contain punctuation. There are two ways to do that. The first is to remove the punctuation, the second is to accept only letters.
Given a string that contains punctuation (a line or a word) we can create a second string that equals the original string with the punctuation removed. One way to do that is to use the method String#tr. Suppose the string is "breakfast,". Then
"breakfast,".tr(".,?!;:'", "") #=> "breakfast"
To only accept letters we could use any of the following regular expressions (all return "breakfast"):
"breakfast,".gsub(/[a-zA-Z]+/, "")
"breakfast,".gsub(/[a-z]+/i, "")
"breakfast,".gsub(/[[:alphaa:]]+/, "")
"breakfast,".gsub(/\p{L}+/, "")
The first two work with ASCII characters only. The third (POSIX) and fourth work (\p{} construct) with Unicode (search within Regexp).
Note that it is more efficient to remove punctuation from a line before words are extracted.
Extra credit: use Enumerator#with_object
Whenever you see an object (here arr) initialized to be be empty, manipulated and then returned at the end of a method, you should consider using the method Enumerator#with_object or (more commonly), Enumerable#each_with_object. Both of these return the object referred to in the method name.
The method IO::foreach returns an enumerator (an instance of the class Enumerator) when it does not have a block (see doc). We therefore could write
def unique_words(fname, nbr_unique)
File.foreach(fname).with_object([]) do |line, arr|
<code need here to process line>
end
end
We have eliminated two lines (arr = [] and arr), but have also confined arr's scope to the block. This is not a big deal but is the Ruby way.
More extra credit: use methods of the class Set
Suppose we wrote the following.
require 'set'
def unique_words(fname, nbr_unique)
File.foreach(fname).with_object(Set.new) do |line, set|
<code need here to process line>
end.to_a
end
When we extract the word "we" from the second line we need to check if it should be added to the set. Since sets have unique elements we can just try to do it. We won't be able to do that because set will already contain that word from the first line of the file. A handy method for doing that is Set#add?:
set.add?("we")
#=> nil
Here the method returns nil, meaning the set already contains that word. It also tells us that we don't need to check if the set now contains nbr_unique words. Had we been able to add the word to the set, set (with the added word) would be returned.
The block returns the value of set (a set). The method Set#to_a converts that set to an array, which is returned by the method.
1 Notice that I've invoked the class method IO::foreach by writing File.foreach(fname)... below. This is permissible because File is a subclass of IO (File.superclass #=> IO). I could have instead written IO.foreach(fname)..., but it is more common to use File as the receiver.
I can run a search and find the element I want and can return those words with that letter. But when I start to put arguments in, it doesn't work. I tried select with include? and it throws an error saying, private method. This is my code, which returns what I am expecting:
my_array = ["wants", "need", 3, "the", "wait", "only", "share", 2]
def finding_method(source)
words_found = source.grep(/t/) #I just pick random letter
print words_found
end
puts finding_method(my_array)
# => ["wants", "the", "wait"]
I need to add the second argument, but it breaks:
def finding_method(source, x)
words_found = source.grep(/x/)
print words_found
end
puts finding_method(my_array, "t")
This doesn't work, (it returns an empty array because there isn't an 'x' in the array) so I don't know how to pass an argument. Maybe I'm using the wrong method to do what I'm after. I have to define 'x', but I'm not sure how to do that. Any help would be great.
Regular expressions support string interpolation just like strings.
/x/
looks for the character x.
/#{x}/
will first interpolate the value of the variable and produce /t/, which does what you want. Mostly.
Note that if you are trying to search for any text that might have any meaning in regular expression syntax (like . or *), you should escape it:
/#{Regexp.quote(x)}/
That's the correct answer for any situation where you are including literal strings in regular expression that you haven't built yourself specifically for the purpose of being a regular expression, i.e. 99% of cases where you're interpolating variables into regexps.
I've created a web framework that uses the following function:
def to_class(text)
text.capitalize
text.gsub(/(_|-)/, '')
end
To turn directory names that are snake_cased or hyphen-cased into PascalCased class names for your project.
Problem is, the function only removed _ and -, and doesn't capitalize the next letter. Using .capitalize, or .upcase, is there a way to achieve making your snake/hyphen_/-cased names into proper PascalCased class names?
gsub(/(?:^|[_-])([a-z])?/) { $1.upcase unless $1.nil? }
This splits the _-cased string into an array; capitalizes every member and glues the array back to a string:
def to_pascal_case(str)
str.split(/-|_/).map(&:capitalize).join
end
p to_pascal_case("snake_cased") #=>"SnakeCased"
Your code does not work for several reasons:
The resulting object of the capitalize method is discarded - you
should do something like text.capitalize! or text = text.capitalize.
But the capitalize method just upcases the first letter of the string,
not the first letter of every word.
Rails has a similar method called camelize. It basically capitalizes every part of the string consisting of [a-z0-9] and removes everything else.
You can probably golf it down to something smaller, but:
txt = 'foo-bar_baz'
txt.gsub(/(?:^|[-_])([a-z])/) { |m| m.upcase }.gsub(/[-_]/, '') # FooBarBaz
I'm going through Beginning Ruby From Novice To Professional 2nd Edition and am currently on page 49 where we are learning about RegEx basics. Each RegEx snippet in the book has a code trailing it that hasn't been explained.
{ |x| puts x }
In context:
"This is a test".scan(/[a-m]/) { |x| puts x }
Could someone please clue me in?
A method such as scan is an iterator; in this case, each time the passed regex is matched, scan does something programmer-specified. In Ruby, the "something" is expressed as a block, represented by { code } or do code end (with different precedences), which is passed as a special parameter to the method. A block may start with a list of parameters (and local variables), which is the |x| part; scan invokes the block with the string it matched, which is bound to x inside the block. (This syntax comes from Smalltalk.)
So, in this case, scan will invoke its block parameter every time /[a-m]/ matches, which means on every character in the string between a and m.
It prints all letters in the string between a and m: http://ideone.com/lKaoI
|x| puts x is an annonymouse function, (or a "block", in ruby, as far as I can tell, or a lambda in other languages), that prints its argument.
More information on that can be found in:
Wikipedia - Ruby - Blocks and iterators
Understanding Ruby Blocks, Procs and Lambdas
The output is
h
i
i
a
e
Each character of the string "This is a test" is checked against the regular expression [a-m] which means "exactly one character in the range a..m, and is printed on its own line (via puts) if it matches. The first character T does not match, the second one h does match, etc. The last one that does is the e in "test".
In the context of your book's examples, it's included after each expression because it just means "Print out every match."
It is a code block, which runs for each match of the regular expression.
{ } creates the code block.
|x| creates the argument for the code block
puts prints out a string, and x is the string it prints.
The regular expression matches any single character in the character class [a-m]. Therefore, there are five different matches, and it prints out:
h
i
i
a
e
The { |x| puts x } defines a new block that takes a single argument named x. When the block is called, it passes its argument x to puts.
Another way to write the same thing would be:
"This is a test".scan(/[a-m]/) do |x|
puts x
end
The block gets called by the scan function each time the regular expression matches something in the string, so each match will get printed.
There is more information about blocks here:
http://www.ruby-doc.org/docs/ProgrammingRuby/html/tut_containers.html