I have the following string:
str = "This is a string"
What I want to do is compare it with this array:
a = ["this", "is", "something"]
The result should be an array with "this" and "is" because both are present in the array and in the given string. "something" is not present in the string so it shouldn't appear. How can I do this?
One way to do this:
str = "This is a string"
a = ["this","is","something"]
str.downcase.split & a
# => ["this", "is"]
I am assuming Array a will always have keys(elements) in downcase.
There's always many ways to do this sort of thing
str = "this is the example string"
words_to_compare = ["dogs", "ducks", "seagulls", "the"]
words_to_compare.select{|word| word =~ Regexp.union(str.split) }
#=> ["the"]
Your question has an XY problem smell to it. Usually when we want to find what words exist the next thing we want to know is how many times they exist. Frequency counts are all over the internet and Stack Overflow. This is a minor modification to such a thing:
str = "This is a string"
a = ["this", "is", "something"]
a_hash = a.each_with_object({}) { |i, h| h[i] = 0 } # => {"this"=>0, "is"=>0, "something"=>0}
That defined a_hash with the keys being the words to be counted.
str.downcase.split.each{ |k| a_hash[k] += 1 if a_hash.key?(k) }
a_hash # => {"this"=>1, "is"=>1, "something"=>0}
a_hash now contains the counts of the word occurrences. if a_hash.key?(k) is the main difference we'd see compared to a regular word-count as it's only allowing word-counts to occur for the words in a.
a_hash.keys.select{ |k| a_hash[k] > 0 } # => ["this", "is"]
It's easy to find the words that were in common because the counter is > 0.
This is a very common problem in text processing so it's good knowing how it works and how to bend it to your will.
I have this exercise:
Write a Title class which is initialized with a string.
It has one method -- fix -- which should return a title-cased version of the string:
Title.new("a title of a book").fix =
A Title of a Book
You'll need to use conditional logic - if and else statements - to make this work.
Make sure you read the test specification carefully so you understand the conditional logic to be implemented.
Some methods you'll want to use:
String#downcase
String#capitalize
Array#include?
Also, here is the Rspec, I should have included that:
describe "Title" do
describe "fix" do
it "capitalizes the first letter of each word" do
expect( Title.new("the great gatsby").fix ).to eq("The Great Gatsby")
end
it "works for words with mixed cases" do
expect( Title.new("liTTle reD Riding hOOD").fix ).to eq("Little Red Riding Hood")
end
it "downcases articles" do
expect( Title.new("The lord of the rings").fix ).to eq("The Lord of the Rings")
expect( Title.new("The sword And The stone").fix ).to eq("The Sword and the Stone")
expect( Title.new("the portrait of a lady").fix ).to eq("The Portrait of a Lady")
end
it "works for strings with all uppercase characters" do
expect( Title.new("THE SWORD AND THE STONE").fix ).to eq("The Sword and the Stone")
end
end
end
Thank you #simone, I incorporated your suggestions:
class Title
attr_accessor :string
def initialize(string)
#string = string
end
IGNORE = %w(the of a and)
def fix
s = string.split(' ')
s.map do |word|
words = word.downcase
if IGNORE.include?(word)
words
else
words.capitalize
end
end
s.join(' ')
end
end
Although I'm still running into errors when running the code:
expected: "The Great Gatsby"
got: "the great gatsby"
(compared using ==)
exercise_spec.rb:6:in `block (3 levels) in <top (required)>'
From my beginner's perspective, I cannot see what I'm doing wrong?
Final edit: I just wanted to say thanks for all the effort every one put in in assisting me earlier. I'll show the final working code I was able to produce:
class Title
attr_accessor :string
def initialize(string)
#string = string
end
def fix
word_list = %w{a of and the}
a = string.downcase.split(' ')
b = []
a.each_with_index do |word, index|
if index == 0 || !word_list.include?(word)
b << word.capitalize
else
b << word
end
end
b.join(' ')
end
end
Here's a possible solution.
class Title
attr_accessor :string
IGNORES = %w( the of a and )
def initialize(string)
#string = string
end
def fix
tokens = string.split(' ')
tokens.map do |token|
token = token.downcase
if IGNORES.include?(token)
token
else
token.capitalize
end
end.join(" ")
end
end
Title.new("a title of a book").fix
Your starting point was good. Here's a few improvements:
The comparison is always lower-case. This will simplify the if-condition
The list of ignored items is into an array. This will simplify the if-condition because you don't need an if for each ignored string (they could be hundreds)
I use a map to replace the tokens. It's a common Ruby pattern to use blocks with enumerations to loop over items
There are two ways you can approach this problem:
break the string into words, possibly modify each word and join the words back together; or
use a regular expression.
I will say something about the latter, but I believe your exercise concerns the former--which is the approach you've taken--so I will concentrate on that.
Split string into words
You use String#split(' ') to split the string into words:
str = "a title of a\t book"
a = str.split(' ')
#=> ["a", "title", "of", "a", "book"]
That's fine, even when there's extra whitespace, but one normally writes that:
str.split
#=> ["a", "title", "of", "a", "book"]
Both ways are the same as
str.split(/\s+/)
#=> ["a", "title", "of", "a", "book"]
Notice that I've used the variable a to signify that an array is return. Some may feel that is not sufficiently descriptive, but I believe it's better than s, which is a little confusing. :-)
Create enumerators
Next you send the method Enumerable#each_with_index to create an enumerator:
enum0 = a.each_with_index
# => #<Enumerator: ["a", "title", "of", "a", "book"]:each_with_index>
To see the contents of the enumerator, convert enum0 to an array:
enum0.to_a
#=> [["a", 0], ["title", 1], ["of", 2], ["a", 3], ["book", 4]]
You've used each_with_index because the first word--the one with index 0-- is to be treated differently than the others. That's fine.
So far, so good, but at this point you need to use Enumerable#map to convert each element of enum0 to an appropriate value. For example, the first value, ["a", 0] is to be converted to "A", the next is to be converted to "Title" and the third to "of".
Therefore, you need to send the method Enumerable#map to enum0:
enum1 = enum.map
#=> #<Enumerator: #<Enumerator: ["a", "title", "of", "a",
"book"]:each_with_index>:map>
enum1.to_a
#=> [["a", 0], ["title", 1], ["of", 2], ["a", 3], ["book", 4]]
As you see, this creates a new enumerator, which could think of as a "compound" enumerator.
The elements of enum1 will be passed into the block by Array#each.
Invoke the enumerator and join
You want to a capitalize the first word and all other words other than those that begin with an article. We therefore must define some articles:
articles = %w{a of it} # and more
#=> ["a", "of", "it"]
b = enum1.each do |w,i|
case i
when 0 then w.capitalize
else articles.include?(w) ? w.downcase : w.capitalize
end
end
#=> ["A", "Title", "of", "a", "Book"]
and lastly we join the array with one space between each word:
b.join(' ')
=> "A Title of a Book"
Review details of calculation
Let's go back to the calculation of b. The first element of enum1 is passed into the block and assigned to the block variables:
w, i = ["a", 0] #=> ["a", 0]
w #=> "a"
i #=> 0
so we execute:
case 0
when 0 then "a".capitalize
else articles.include?("a") ? "a".downcase : "a".capitalize
end
which returns "a".capitalize => "A". Similarly, when the next element of enum1 is passed to the block:
w, i = ["title", 1] #=> ["title", 1]
w #=> "title"
i #=> 1
case 1
when 0 then "title".capitalize
else articles.include?("title") ? "title".downcase : "title".capitalize
end
which returns "Title" since articles.include?("title") => false. Next:
w, i = ["of", 2] #=> ["of", 2]
w #=> "of"
i #=> 2
case 2
when 0 then "of".capitalize
else articles.include?("of") ? "of".downcase : "of".capitalize
end
which returns "of" since articles.include?("of") => true.
Chaining operations
Putting this together, we have:
str.split.each_with_index.map do |w,i|
case i
when 0 then w.capitalize
else articles.include?(w) ? w.downcase : w.capitalize
end
end
#=> ["A", "Title", "of", "a", "Book"]
Alternative calculation
Another way to do this, without using each_with_index, is like this:
first_word, *remaining_words = str.split
first_word
#=> "a"
remaining_words
#=> ["title", "of", "a", "book"]
"#{first_word.capitalize} #{ remaining_words.map { |w|
articles.include?(w) ? w.downcase : w.capitalize }.join(' ') }"
#=> "A Title of a Book"
Using a regular expression
str = "a title of a book"
str.gsub(/(^\w+)|(\w+)/) do
$1 ? $1.capitalize :
articles.include?($2) ? $2 : $2.capitalize
end
#=> "A Title of a Book"
The regular expression "captures" [(...)] a word at the beginning of the string [(^\w+)] or [|] a word that is not necessarily at the beginning of string [(\w+)]. The contents of the two capture groups are assigned to the global variables $1 and $2, respectively.
Therefore, stepping through the words of the string, the first word, "a", is captured by capture group #1, so (\w+) is not evaluated. Each subsequent word is not captured by capture group #1 (so $1 => nil), but is captured by capture group #2. Hence, if $1 is not nil, we capitalize the (first) word (of the sentence); else we capitalize $2 if the word is not an article and leave it unchanged if it is an article.
def fix
string.downcase.split(/(\s)/).map.with_index{ |x,i|
( i==0 || x.match(/^(?:a|is|of|the|and)$/).nil? ) ? x.capitalize : x
}.join
end
Meets all conditions:
a, is, of, the, and all lowercase
capitalizes all other words
all first words are capitalized
Explanation
string.downcase calls one operation to make the string you're working with all lower case
.split(/(\s)/) takes the lower case string and splits it on white-space (space, tab, newline, etc) into an array, making each word an element of the array; surrounding the \s (the delimiter) in the parentheses also retains it in the array that's returned, so we don't lose that white-space character when rejoining
.map.with_index{ |x,i| iterates over that returned array, where x is the value and i is the index number; each iteration returns an element of a new array; when the loop is complete you will have a new array
( i==0 || x.match(/^(?:a|is|of|the|and)$/).nil? ) if it's the first element in the array (index of 0), or the word matches a,is,of,the, or and -- that is, the match is not nil -- then x.capitalize (capitalize the word), otherwise (it did match the ignore words) so just return the word/value, x
.join take our new array and combine all the words into one string again
Additional
Ordinarily, what is inside parentheses in regex is considered a capture group, meaning that if the pattern inside is matched, a special variable will retain the value after the regex operations have finished. In some cases, such as the \s we wanted to capture that value, because we reuse it, in other cases like our ignore words, we need to match, but do not need to capture them. To avoid capturing a match you can pace ?: at the beginning of the capture group to tell the regex engine not to retain the value. There are many benefits of this that fall outside the scope of this answer.
Here is another possible solution to the problem
class Title
attr_accessor :str
def initialize(str)
#str = str
end
def fix
s = str.downcase.split(" ") #convert all the strings to downcase and it will be stored in an array
words_cap = []
ignore = %w( of a and the ) # List of words to be ignored
s.each do |item|
if ignore.include?(item) # check whether word in an array is one of the words in ignore list.If it is yes, don't capitalize.
words_cap << item
else
words_cap << item.capitalize
end
end
sentence = words_cap.join(" ") # convert an array of strings to sentence
new_sentence =sentence.slice(0,1).capitalize + sentence.slice(1..-1) #Capitalize first word of the sentence. Incase it is not capitalized while checking the ignore list.
end
end
I can't tell what's wrong with my code:
def morse_code(str)
string = []
string.push(str.split(' '))
puts string
puts string[2]
end
What I'm expecting is if I use "what is the dog" for str, I would get the following results:
=> ["what", "is", "the", "dog"]
=> "the"
But what I get instead is nil. If I do string[0], it just gives me the entire string again. Does the .split function not break them up into different elements? If anyone could help, that would be great. Thank you for taking the time to read this.
Your code should be :
def morse_code(str)
string = []
string.push(*str.split(' '))
puts string
p string[2]
end
morse_code("what is the dog" )
# >> what
# >> is
# >> the
# >> dog
# >> "the"
str.split(' ') is giving ["what", "is", "the", "dog"], and you are pushing this array object to the array string. Thus string became [["what", "is", "the", "dog"]]. Thus string is an array of size 1. Thus if you want to access any index like 1, 2 so on.., you will get nil. You can debug it using p(it calls #inspect on the array), BUT NOT puts.
def morse_code(str)
string = []
string.push(str.split(' '))
p string
end
morse_code("what is the dog" )
# >> [["what", "is", "the", "dog"]]
With Array, puts works completely different way than p. I am not good to read MRI code always, thus I take a look at sometime Rubinious code. Look how they defined IO::puts, which is same as MRI. Now look the specs for the code
it "flattens a nested array before writing it" do
#io.should_receive(:write).with("1")
#io.should_receive(:write).with("2")
#io.should_receive(:write).with("3")
#io.should_receive(:write).with("\n").exactly(3).times
#io.puts([1, 2, [3]]).should == nil
end
it "writes nothing for an empty array" do
x = []
#io.should_receive(:write).exactly(0).times
#io.puts(x).should == nil
end
it "writes [...] for a recursive array arg" do
x = []
x << 2 << x
#io.should_receive(:write).with("2")
#io.should_receive(:write).with("[...]")
#io.should_receive(:write).with("\n").exactly(2).times
#io.puts(x).should == nil
end
We can now be sure that, IO::puts or Kernel::puts behaves with array just the way, as Rubinious people implemented it. You can now take a look at the MRI code also. I just found the MRI one, look the below test
def test_puts_recursive_array
a = ["foo"]
a << a
pipe(proc do |w|
w.puts a
w.close
end, proc do |r|
assert_equal("foo\n[...]\n", r.read)
end)
end
Sorry to ask this but I really need to get this done. I'd like to be able to pass in a string and strip out the stop_words. I have the following:
class Query
def self.normalize term
stop_words=["a","big","array"]
term.downcase!
legit=[]
if !stop_words.include?(term)
legit << term
end
return legit
end
def self.check_parts term
term_parts=term.split(' ')
tmp_part=[]
term_parts.each do |part|
t=self.normalize part
tmp_part << t
end
return tmp_part
end
end
I would think that this would return only terms that are not in the stop_words list but I'm getting back either an empty array or an array of the terms passed in. Like this:
ruby-1.9.2-p290 :146 > Query.check_parts "here Is my Char"
=> [[], [], [], ["char"]]
ruby-1.9.2-p290 :147 >
What am I doing wrong?
thx in advance
If you just want to filter out the terms and get an array of downcased words, it is simple.
module Query
StopWords = %w[a big array]
def self.check_parts string; string.downcase.split(/\s+/) - StopWords end
end
Query.check_parts("here Is my Char") # => ["here", "is", "my", "char"]
Why do you want the result as an array I don't know but
term_parts=term.split(' ')
term_parts.reject { |part| stop_words.include?(part) }
You could write what you expect.
By the way, you have an array for array because
def self.check_parts term
term_parts=term.split(' ')
tmp_part=[] # creates an array
term_parts.each do |part|
t=self.normalize part # normalize returns an empty array
# or one of only one element (a term).
tmp_part << t # you add an array into the array
end
return tmp_part
end
I have an array lets say
array1 = ["abc", "a", "wxyz", "ab",......]
How do I make sure neither for example "a" (any 1 character), "ab" (any 2 characters), "abc" (any 3 characters), nor words like "that", "this", "what" etc nor any of the foul words are saved in array1?
This removes elements with less than 4 characters and words like this, that, what from array1 (if I got it right):
array1.reject! do |el|
el.length < 4 || ['this', 'that', 'what'].include?(el)
end
This changes array1. If you use reject (without !), it'll return the result and not change array1
You can open and add a new interface to the Array class which will disallow certain words. Example:
class Array
def add(ele)
unless rejects.include?(ele)
self.push ele
end
end
def rejects
['this', 'that', 'what']
end
end
arr = []
arr.add "one"
puts arr
arr.add "this"
puts arr
arr.add "aslam"
puts arr
Output would be:
one one one aslam
And notice the word "this" was not added.
You could create a stop list. Using a hash for this would be more efficient than an array as lookup time will be consistant with a hash. With an array the lookup time is proportional to the number of elements in the array. If you are going to check for stop words a lot, I suggest using a hash that contains all the stop words. Using your code, you could do the following
badwords_a = ["abc", "a", "wxyz", "ab"] # Your array of bad words
badwords_h = {} # Initialize and empty hash
badwords_a.each{|word| badwords_h[word] = nil} # Fill the hash
goodwords = []
words_to_process = ["abc","a","Foo","Bar"] # a list of words you want to process
words_to_process.each do |word| # Process new words
if badwords_h.key?(word)
else
goodwords << word # Add the word if it did not match the bad list
end
end
puts goodwords.join(", ")