How do I use regular expression to select from an array? - ruby

I have a code that asks for the user to either type Cats or Dogs then it'll search an array for everything that contains the word Cats or Dogs then puts them all out.
print "Cats or Dogs? "
userinput = gets.chomp
lines = [["Cats are smarter than dogs"],["Dogs also like meat"], ["Cats are nice"]]
lines.each do |line|
if line =~ /(.*?)#{userinput}(.*)/
puts line
end
end
So If I were to input Cats. I should get two sentences:
Cats are smarter than dogs
Cats are nice
You could even input smarter and I'll get
Cats are smarter than dogs
I'm strictly looking for ways to use an regular expression to search through an array or string and take out the lines/sentences that match the expression.
If anyone is wondering, the lines array was originally from an file and I turned each line into an array part.
EDIT:
Wow, how far I came in the coding world.
print "Cats or Dogs? "
userinput = gets.chomp
lines = [["Cats are smarter than dogs"],["Dogs also like meat"], ["Cats are nice"]]
lines.each do |linesInside|
linesInside.each do |line|
if line =~ /(.*?)#{userinput}(.*)/
puts line
end
end
end
Took literally 5 seconds to solve what took me ages to give up on at the time.

Try this
...
lines = ["Cats are smarter than dogs", "Dogs also like meat", "Cats are nice"]
regexp = Regexp.new(userinput)
selected_lines = lines.grep(regexp)
puts selected_lines
How does this work?
grep filters an array using pattern matching
Notice that I am using an array of strings. Your example code uses an array of single-element arrays, I assume you mean to just use an array of strings.

You can, of course, do that without a regex.
lines = ["Dogs are smarter than cats", "Cats also like meat", "Dogs are nice"]
print "Cats or Dogs? "
input = gets.chomp.downcase
If input #=> "dogs",
lines.select { |line| line.downcase.split.include?(input) }
#=> ["Dogs are smarter than cats", "Dogs are nice"]
If input #=> "cats",
lines.select { |line| line.downcase.split.include?(input) }
#=> ["Dogs are smarter than cats", "Cats also like meat"]

Since your array is an array of arrays, you could call flatten first :
lines.flatten.grep(/#{userinput}/i)
i is for case insensitive search, so that 'Dogs' matches 'Dogs' and 'dogs'.
If you want whole-word search :
lines.flatten.grep(/\b#{userinput}\b/i)
Finally, if you don't really need an array of arrays, just read an array from your file directly, either with File.readlines(f) or File.foreach(f).

Related

Can anyone explain me what is happening in 9th line "words.each do|word|"? why it is not words in the pipe why it's word?

puts "Text please: "
text = gets.chomp
puts "Redacted letter: "
redacted = gets.chomp
words = text.split(" ")
words.each do|word|
if word == redacted
print "REDACTED "
else
print word +" "
end
end
Can anyone explain me what is happening in 9th line, words.each do |word|? why it is not words in the pipe why it's word?
And if i want to print out "REDACTED" as output what i have to do?
please need help.
words is the whole list (of words found by splitting up the input text wherever there's a space) . More technically, it's an instance of the class Array, which has a method named each designed to run some block of code repeatedly - once for every element of the array. In this case, that means once per word in the input text.
Your snippet is calling each on words and passing a block of code to it - that's the do...end construct. (Code blocks can also be delimited by curly braces, {...}.)
The first thing inside the code block after the keyword do is a list of parameter names inside pipes. When the block is executed, any arguments passed to it will be given names from that list, in order.
Every time Array#each runs the block, it passes a single element of the array as an argument. In this case, that element will be assigned to a local variable named word within the body of the block.
So everything from do through end happens multiple times - once for every word in the array. The first time, word holds the first word. The second time, it holds the second word. And so on.
words never changes; it always holds the whole array.
You could, if you wanted, use the name words inside the block as well; it would be confusing, and the code in the block would not be able to access the outer words, but that outer words would still be intact at the end of the loop.
Sample run:
Code> puts "Text please: "
Output> Text please:
Code> text = gets.chomp
Input> Now is the time for all good men to come to the aid of their party.
Result> text == "Now is the time for all good men to come to the aid of their party."
Code> puts "Redacted letter: "
Output> Redacted letter:
Code> redacted = gets.chomp
Input> good
Result> redacted == "good"
Code> words = text.split(" ")
Result> words == [ "Now", "is", "the", "time", "for", "all", "good",
"men", "to", "come", "to", "the", "aid", "of", "their",
"party." ]
Code> words.each do |word|
Result> word == "Now"
Code> if word == redacted
Result> if "Now" == "good" #=> false
Code> else
Code> print word +" "
Output> "Now "
Code> words.each do |word|
Result> word == "is"
Code> if word == redacted
Result> if "is" == "good" #=> false
Code> else
Code> print word +" "
Output> "is " (Cumulative output: "Now is ")
.... and so on for "the", "time", "for", "all" ...
Code> words.each do |word|
Result> word == "good"
Code> if word == redacted
Result> if "good" == "good" #=> true
Code> print "REDACTED "
Output> "REDACTED "
... and so on for men, to, come, to, the, aid, of, their, party.
Code> end
Cumulative Output> Now is the time for all REDACTED men to come to the aid of their party.
To answer your questions succinctly. You ask three questions...
1) "what is happening in 9th line": In the beginning, you have this array of words, to which you apply .each. What this does is allows you to iterate over each element in the array. Each element is given to the word variable in the lines of code you have right after that do, and before the last end in your code (what you have there between the do and the last end is called a "block" of code, by the way).
2) "why it is not words in the pipe why it's word?" Well, this is Ruby syntax, really - the two things are different. The array you are iterating over is called words, and you want to do something with each one of those words in that array. People tend to use a plural for the name of container, like an array, and a singular when talking about "each" item in the array. So, when you say words.each, you tend to think of each one as a single "word". You could have used any name to represent a single element in the array called words, but word seems best.
3) "if i want to print out "REDACTED" as output": a) Run the code. b) When it asks for Text please:, give it one word (say "ABC" without quotes). c) When it then asks Redacted letter:, give it the same word (again, "ABC" without quotes). You should then be greeted with REDACTED.
Now that your stated questions are answered, if you want to go into learning why this happens, consider the if statement. You are going through each element in the array words, and seeing as how there is only one word ("ABC") this will only happen once. Since that "word" is exactly the same as the string currently in the variable redacted, the comparison turns out to be equal ("ABC" == "ABC"), and the command print "REDACTED " is run once and only once. If you had more words in the array, and none of them were "ABC", you'd also see those in your output
I'm glad you are learning Ruby. It's a great language! Keep up with it. You'll get better over time, and it's very rewarding.

Counting words in Ruby with some exceptions

Say that we want to count the number of words in a document. I know we can do the following:
text.each_line(){ |line| totalWords = totalWords + line.split.size }
Say, that I just want to add some exceptions, such that, I don't want to count the following as words:
(1) numbers
(2) standalone letters
(3) email addresses
How can we do that?
Thanks.
You can wrap this up pretty neatly:
text.each_line do |line|
total_words += line.split.reject do |word|
word.match(/\A(\d+|\w|\S*\#\S+\.\S+)\z/)
end.length
end
Roughly speaking that defines an approximate email address.
Remember Ruby strongly encourages the use of variables with names like total_words and not totalWords.
assuming you can represent all the exceptions in a single regular expression regex_variable, you could do:
text.each_line(){ |line| totalWords = totalWords + line.split.count {|wrd| wrd !~ regex_variable }
your regular expression could look something like:
regex_variable = /\d.|^[a-z]{1}$|\A([^#\s]+)#((?:[-a-z0-9]+\.)+[a-z]{2,})\Z/i
I don't claim to be a regex expert, so you may want to double check that, particularly the email validation part
In addition to the other answers, a little gem hunting came up with this:
WordsCounted Gem
Get the following data from any string or readable file:
Word count
Unique word count
Word density
Character count
Average characters per word
A hash map of words and the number of times they occur
A hash map of words and their lengths
The longest word(s) and its length
The most occurring word(s) and its number of occurrences.
Count invividual strings for occurrences.
A flexible way to exclude words (or anything) from the count. You can pass a string, a regexp, an array, or a lambda.
Customisable criteria. Pass your own regexp rules to split strings if you prefer. The default regexp has two features:
Filters special characters but respects hyphens and apostrophes.
Plays nicely with diacritics (UTF and unicode characters): "São Paulo" is treated as ["São", "Paulo"] and not ["S", "", "o", "Paulo"].
Opens and reads files. Pass in a file path or a url instead of a string.
Have you ever started answering a question and found yourself wandering, exploring interesting, but tangential issues, or concepts you didn't fully understand? That's what happened to me here. Perhaps some of the ideas might prove useful in other settings, if not for the problem at hand.
For readability, we might define some helpers in the class String, but to avoid contamination, I'll use Refinements.
Code
module StringHelpers
refine String do
def count_words
remove_punctuation.split.count { |w|
!(w.is_number? || w.size == 1 || w.is_email_address?) }
end
def remove_punctuation
gsub(/[.!?,;:)](?:\s|$)|(?:^|\s)\(|\-|\n/,' ')
end
def is_number?
self =~ /\A-?\d+(?:\.\d+)?\z/
end
def is_email_address?
include?('#') # for testing only
end
end
end
module CountWords
using StringHelpers
def self.count_words_in_file(fname)
IO.foreach(fname).reduce(0) { |t,l| t+l.count_words }
end
end
Note that using must be in a module (possibly a class). It does not work in main, presumably because that would make the methods available in the class self.class #=> Object, which would defeat the purpose of Refinements. (Readers: please correct me if I'm wrong about the reason using must be in a module.)
Example
Let's first informally check that the helpers are working correctly:
module CheckHelpers
using StringHelpers
s = "You can reach my dog, a 10-year-old golden, at fido#dogs.org."
p s = s.remove_punctuation
#=> "You can reach my dog a 10 year old golden at fido#dogs.org."
p words = s.split
#=> ["You", "can", "reach", "my", "dog", "a", "10",
# "year", "old", "golden", "at", "fido#dogs.org."]
p '123'.is_number? #=> 0
p '-123'.is_number? #=> 0
p '1.23'.is_number? #=> 0
p '123.'.is_number? #=> nil
p "fido#dogs.org".is_email_address? #=> true
p "fido(at)dogs.org".is_email_address? #=> false
p s.count_words #=> 9 (`'a'`, `'10'` and "fido#dogs.org" excluded)
s = "My cat, who has 4 lives remaining, is at abbie(at)felines.org."
p s = s.remove_punctuation
p s.count_words
end
All looks OK. Next, put I'll put some text in a file:
FName = "pets"
text =<<_
My cat, who has 4 lives remaining, is at abbie(at)felines.org.
You can reach my dog, a 10-year-old golden, at fido#dogs.org.
_
File.write(FName, text)
#=> 125
and confirm the file contents:
File.read(FName)
#=> "My cat, who has 4 lives remaining, is at abbie(at)felines.org.\n
# You can reach my dog, a 10-year-old golden, at fido#dogs.org.\n"
Now, count the words:
CountWords.count_words_in_file(FName)
#=> 18 (9 in ech line)
Note that there is at least one problem with the removal of punctuation. It has to do with the hyphen. Any idea what that might be?
Something like...?
def is_countable(word)
return false if word.size < 2
return false if word ~= /^[0-9]+$/
return false if is_an_email_address(word) # you need a gem for this...
return true
end
wordCount = text.split().inject(0) {|count,word| count += 1 if is_countable(word) }
Or, since I am jumping to the conclusion that you can just split your entire text into an array with split(), you might need:
wordCount = 0
text.each_line do |line|
line.split.each{|word| wordCount += 1 if is_countable(word) }
end

Searching for single words and combination words in Ruby

I want my output to search and count the frequency of the words "candy" and "gram", but also the combinations of "candy gram" and "gram candy," in a given text (whole_file.)
I am currently using the following code to display the occurrences of "candy" and "gram," but when I aggregate the combinations within the %w, only the word and frequencies of "candy" and "gram" display. Should I try a different way? thanks so much.
myArray = whole_file.split
stop_words= %w{ candy gram 'candy gram' 'gram candy' }
nonstop_words = myArray - stop_words
key_words = myArray - nonstop_words
frequency = Hash.new (0)
key_words.each { |word| frequency[word] +=1 }
key_words = frequency.sort_by {|x,y| x }
key_words.each { |word, frequency| puts word + ' ' + frequency.to_s }
It sounds like you're after n-grams. You could break the text into combinations of consecutive words in the first place, and then count the occurrences in the resulting array of word groupings. Here's an example:
whole_file = "The big fat man loves a candy gram but each gram of candy isn't necessarily gram candy"
[["candy"], ["gram"], ["candy", "gram"], ["gram", "candy"]].each do |term|
terms = whole_file.split(/\s+/).each_cons(term.length).to_a
puts "#{term.join(" ")} #{terms.count(term)}"
end
EDIT: As was pointed out in the comments below, I wasn't paying close enough attention and was splitting the file on each loop which is obviously not a good idea, especially if it's large. I also hadn't accounted for the fact that the original question may've need to sort by the count, although that wasn't explicitly asked.
whole_file = "The big fat man loves a candy gram but each gram of candy isn't necessarily gram candy"
# This is simplistic. You would need to address punctuation and other characters before
# or at this step.
split_file = whole_file.split(/\s+/)
terms_to_count = [["candy"], ["gram"], ["candy", "gram"], ["gram", "candy"]]
counts = []
terms_to_count.each do |term|
terms = split_file.each_cons(term.length).to_a
counts << [term.join(" "), terms.count(term)]
end
# Seemed like you may need to do sorting too, so here that is:
sorted = counts.sort { |a, b| b[1] <=> a[1] }
sorted.each do |count|
puts "#{count[0]} #{count[1]}"
end
Strip punctuation and convert to lower-case
The first thing you probably want to do is remove all punctuation from the string holding the contents of the file and then convert what's left to lower case, the latter so you don't have worry about counting 'Cat' and 'cat' as the same word. Those two operations can be done in either order.
Changing upper-case letters to lower-case is easy:
text = whole_file.downcase
To remove the punctuation it is probably easier to decide what to keep rather than what to discard. If we only want to keep lower-case letters, you can do this:
text = whole_file.downcase.gsub(/[^a-z]/, '')
That is, substitute an empty string for all characters other than (^) lowercase letters.1
Determine frequency of individual words
If you want to count the number of times text contains the word 'candy', you can use the method String#scan on the string text and then determine the size of the array that is returned:
text.scan(/\bcandy\b/).size
scan returns an array with every occurrence of the string 'candy'; .size returns the size of that array. Here \b ensures 'candy gram' has a word "boundary" at each end, which could be whitespace or the beginning or end of a line or the file. That's to prevent `candycane' from being counted.
A second way is to convert the string text to an array of words, as you have done2:
myArray = text.split
If you don't mind, I'd like to call this:
words = text.split
as I find that more expressive.3
The most direct way to determine the number of times 'candy' appears is to use the method Enumberable#count, like this:
words.count('candy')
You can also use the array difference method, Array#-, as you noted:
words.size - (words - ['candy']).size
If you wish to know the number of times either 'candy' or 'gram' appears, you could of course do the above for each and sum the two counts. Some other ways are:
words.size - (myArray - ['candy', 'gram']).size
words.count { |word| word == 'candy' || word = 'gram' }
words.count { |word| ['candy', 'gram'].include?(word) }
Determine the frequency of all words that appear in the text
Your use of a hash with a default value of zero was a good choice:
def frequency_of_all_words(words)
frequency = Hash.new(0)
words.each { |word| frequency[word] +=1 }
frequency
end
I wrote this as a method to emphasize that words.each... does not return frequency. Often you would see this written more compactly using the method Enumerable#each_with_object, which returns the hash ("object"):
def frequency_of_all_words(words)
words.each_with_object(Hash.new(0)) { |word, h| h[word] +=1 }
end
Once you have the hash frequency you can sort it as you did:
frequency.sort_by {|word, freq| freq }
or
frequency.sort_by(&:last)
which you could write:
frequency.sort_by {|_, freq| freq }
since you aren't using the first block variable. If you wanted the most frequent words first:
frequency.sort_by(&:last).reverse
or
frequency.sort_by {|_, freq| -freq }
All of these will give you an array. If you want to convert it back to a hash (with the largest values first, say):
Hash[frequency.sort_by(&:last).reverse]
or in Ruby 2.0+,
frequency.sort_by(&:last).reverse.to_h
Count the number of times a substring appears
Now let's count the number of times the string 'candy gram' appears. You might think we could use String#scan on the string holding the entire file, as we did earlier4:
text.scan(/\bcandy gram\b/).size
The first problem is that this won't catch 'candy\ngram'; i.e., when the words are separated by a newline character. We could fix that by changing the regex to /\bcandy\sgram\b/. A second problem is that 'candy gram' might have been 'candy. Gram' in the file, in which case you might not want to count it.
A better way is to use the method Enumerable#each_cons on the array words. The easiest way to show you how that works is by example:
words = %w{ check for candy gram here candy gram again }
#=> ["check", "for", "candy", "gram", "here", "candy", "gram", "again"]
enum = words.each_cons(2)
#=> #<Enumerator: ["check", "for", "candy", "gram", "here", "candy",
# "gram", "again"]:each_cons(2)>
enum.to_a
#=> [["check", "for"], ["for", "candy"], ["candy", "gram"],
# ["gram", "here"], ["here", "candy"], ["candy", "gram"],
# ["gram", "again"]]
each_cons(2) returns an enumerator; I've converted it to an array to display its contents.
So we can write
words.each_cons(2).map { |word_pair| word_pair.join(' ') }
#=> ["check for", "for candy", "candy gram", "gram here",
# "here candy", "candy gram", "gram again"]
and lastly:
words.each_cons(2).map { |word_pair|
word_pair.join(' ') }.count { |s| s == 'candy gram' }
#=> 2
1 If you also wanted to keep dashes, for hyphenated words, change the regex to /[^-a-z]/ or /[^a-z-]/.
2 Note from String#split that .split is the same as both .split(' ') and .split(/\s+/)).
3 Also, Ruby's naming convention is to use lower-case letters and underscores ("snake-case") for variables and methods, such as my_array.

How can I stop the lines from repeating?

Look at this code. I got the desired result, which was to scan a person's input to see if it matches an internal array.
sentence = []
compare = []
database_array = ["Mouse", "killer", "Blood", "Vampires", "True Blood", "Immortal" ]
def parser_sentence compare
database_array = ["Mouse", "killer", "Blood", "Vampires", "True Blood", "Immortal"]
initial_index = 0
while compare.count > initial_index
compare.each do |item|
if item == database_array[initial_index]
puts "You found the key word, it was #{item}"
else
puts "Sorry the key word was not inside your sentence"
end
end
initial_index = initial_index + 1
end
end
puts "Please enter in your sentences of words and i will parse it for the key word."
sentence = gets.chomp
compare = sentence.split (" ")
Because each loop is telling it to repeat, it does so, but how can I stop this?
In this case, regex will be more efficient and less error prone than splitting the input string, especially since you have a two-word phrase in the keyword list.
def parser_sentence(sentence)
matching_words = sentence.scan(Regexp.union(database_array))
if matching_words.empty?
puts "Sorry the key word was not inside your sentence"
else
puts "You found the key word, it was #{matching_words.join(" ")}"
end
end
Slight modifications can make it case sensitive (if you need it), or add word boundaries to the keywords so as to not match partial words.
One possible solution that doesn't involve looping is to intersect your compare and database_array arrays, like so:
matching_words = compare & database_array
This will compare both arrays and create a new array containing only elements that are common to both. For example:
# If the user input the sentence "The Mouse is Immortal", then...
compare = ["The", "Mouse", "is", "Immortal"]
# matching_words will contain an array with elements ["Mouse", "Immortal"]
matching_words = compare & database_array
You can then check the length of the array and display out your messages. I believe this can replace your entire function like so:
def parser_sentence compare
matching_words = compare & database_array
if matching_works.length > 0
puts "You found the key word, it was #{matching_words.join(" ")}"
else
puts "Sorry the key word was not inside your sentence"
end
end
Note about the use of join, if you're unfamiliar with that, it basically creates a string using each element in the array separated by the separator string passed in, which in my example is merely a blank space; substitute for your own separate of course, or whatever you want to do with it.

Ruby- find strings that contain letters in an array

I've googled everywhere and can't seem to find an example of what I'm looking for. I'm trying to learn ruby and i'm writing a simple script. The user is prompted to enter letters which are loaded into an array. The script then goes through a file containing a bunch of words and pulls out the words that contain what is in the array. My problem is that it only pulls words out if they are in order of the array. For example...
characterArray = Array.new;
puts "Enter characters that the password contains";
characters = gets.chomp;
puts "Searching words containing #{characters}...";
characterArray = characters.scan(/./);
searchCharacters=characterArray[0..characterArray.size].join;
File.open("dictionary.txt").each { |line|
if line.include?(searchCharacters)
puts line;
end
}
If i was to use this code and enter "dog"
The script would return
dog
doggie
but i need the output to return words even if they're not in the same order. Like...
dog
doggie
rodge
Sorry for the sloppy code. Like i said still learning. Thanks for your help.
PS. I've also tried this...
File.open("dictionary.txt").each { |line|
if line =~ /[characterArray[0..characterArray.size]]/
puts line;
end
}
but this returns all words that contain ANY of the letters the user entered
First of all, you don't need to create characterArray yourself. When you assign result of function to a new variable, it will work without it.
In your code characters will be, for example, "asd". characterArray then will be ["a", "s", "d"]. And searchCharacters will be "asd" again. It seems you don't need this conversion.
characterArray[0..characterArray.size] is just equal to characterArray.
You can use each_char iterator to iterate through characters of string. I suggest this:
puts "Enter characters that the password contains";
characters = gets.chomp;
File.open("dictionary.txt").each { |line|
unless characters.each_char.map { |c| line.include?(c) }.include? false
puts line;
end
}
I've checked it works properly. In my code I make an array:
characters.each_char.map { |c| line.include?(c) }
Values of this array will indicate: true - character found in line, false - character not found. Length of this array equals to count of characters in characters. We will consider line good if there is no false values.

Resources