Look at this code. I got the desired result, which was to scan a person's input to see if it matches an internal array.
sentence = []
compare = []
database_array = ["Mouse", "killer", "Blood", "Vampires", "True Blood", "Immortal" ]
def parser_sentence compare
database_array = ["Mouse", "killer", "Blood", "Vampires", "True Blood", "Immortal"]
initial_index = 0
while compare.count > initial_index
compare.each do |item|
if item == database_array[initial_index]
puts "You found the key word, it was #{item}"
else
puts "Sorry the key word was not inside your sentence"
end
end
initial_index = initial_index + 1
end
end
puts "Please enter in your sentences of words and i will parse it for the key word."
sentence = gets.chomp
compare = sentence.split (" ")
Because each loop is telling it to repeat, it does so, but how can I stop this?
In this case, regex will be more efficient and less error prone than splitting the input string, especially since you have a two-word phrase in the keyword list.
def parser_sentence(sentence)
matching_words = sentence.scan(Regexp.union(database_array))
if matching_words.empty?
puts "Sorry the key word was not inside your sentence"
else
puts "You found the key word, it was #{matching_words.join(" ")}"
end
end
Slight modifications can make it case sensitive (if you need it), or add word boundaries to the keywords so as to not match partial words.
One possible solution that doesn't involve looping is to intersect your compare and database_array arrays, like so:
matching_words = compare & database_array
This will compare both arrays and create a new array containing only elements that are common to both. For example:
# If the user input the sentence "The Mouse is Immortal", then...
compare = ["The", "Mouse", "is", "Immortal"]
# matching_words will contain an array with elements ["Mouse", "Immortal"]
matching_words = compare & database_array
You can then check the length of the array and display out your messages. I believe this can replace your entire function like so:
def parser_sentence compare
matching_words = compare & database_array
if matching_works.length > 0
puts "You found the key word, it was #{matching_words.join(" ")}"
else
puts "Sorry the key word was not inside your sentence"
end
end
Note about the use of join, if you're unfamiliar with that, it basically creates a string using each element in the array separated by the separator string passed in, which in my example is merely a blank space; substitute for your own separate of course, or whatever you want to do with it.
Related
I want to find if the ending of a string overlaps with the beginning of separate string. For example if I have these two strings:
string_1 = 'People say nothing is impossible, but I'
string_2 = 'but I do nothing every day.'
How do I find that the "but I" part at the end of string_1 is the same as the beginning of string_2?
I could write a method to loop over the two strings, but I'm hoping for an answer that has a Ruby string method that I missed or a Ruby idiom.
Set MARKER to some string that never appears in your string_1 and string_2. There are ways to do that dynamically, but I assume you can come up with some fixed such string in your case. I assume:
MARKER = "###"
to be safe for you case. Change it depending on your use case. Then,
string_1 = 'People say nothing is impossible, but I'
string_2 = 'but I do nothing every day.'
(string_1 + MARKER + string_2).match?(/(.+)#{MARKER}\1/) # => true
string_1 = 'People say nothing is impossible, but I'
string_2 = 'but you do nothing every day.'
(string_1 + MARKER + string_2).match?(/(.+)#{MARKER}\1/) # => false
You can use a simple loop and test at the end:
a=string_1.split(/\b/)
idx=0
while (idx<=a.length) do
break if string_2.start_with?(a[idx..-1].join)
idx+=1
end
p a[idx..-1].join if idx<a.length
Since this starts at 0, the longest sub string overlap is found.
You can use the same logic in a .detect block on the same array:
> a[(0..a.length).detect { |idx| string_2.start_with?(a[idx..-1].join) }..-1].join
=> "but I"
Or, as pointed out in comments, you can use the strings vs the array
string_1[(0..string_1.length).detect { |idx| string_2.start_with?(string_1[idx..-1]) }..-1]
Here's a solution that works by comparing the end of string_1 to the start of string_2—using the greatest common length as a starting point—with at least one matching character. It returns the index (from the end of string_1 or the beginning of string_2) if any matching character(s) are found, which can be used to extract the matching portion.
class String
def oindex(other)
[length, other.length].min.downto(1).detect do |i|
end_with?(other[0, i])
end
end
end
string_1 = 'People say nothing is impossible, but I'
string_2 = 'but I do nothing every day.'
if (idx = string_1.oindex(string_2))
puts "Last #{idx} characters match: #{string_1[-idx..-1]}"
end
Here's an alternative that finds all the indexes of the first character of the other string in the string, and uses those indexes as starting points to check for matches:
class String
def each_index(other)
return enum_for(__callee__, other) unless block_given?
i = -1
yield i while i = index(other, i.succ)
end
def oindex(other)
each_index(other.chr).detect do |i|
other.start_with?(self[i..-1]) and break length - i
end
end
end
This should be more efficient than checking every index, especially on longer strings with shorter matches, but I haven't benchmarked it.
Here are a couple of ways to do that. The first converts the two strings to arrays and then compares sequences from those arrays. The second operates on the two strings directly, comparing substrings.
#1 Convert strings to arrays and compare sequences from those arrays
Here's a simple alternative that requires the strings to be converted to arrays of words. It assumes all pairs of words are separated by one space.
def begins_with_ends?(end_str, begin_str)
end_arr = end_str.split
begin_arr = begin_str.split
!!begin_arr.each_index.find { |i| begin_arr[0,i+1] == end_arr[-1-i..-1] }
end
!!obj converts obj to false when it's "falsy" (nil or false) and to true when it's "truthy" (not "falsy"). For example, !!3 #=> true and !!nil #=> false.
end_str = 'People say nothing is impossible, but I when I'
begin_str = 'but I when I do nothing every day.'
begins_with_ends?(end_str, begin_str)
#=> true
Here the match is on the second word "I" in begin_str. Often, however, the last word of end_str only matches (at most) a single word in begin_str
#2 Compare substrings
I've implemented the following algorithm.
Set start_search to 0.
Attempt to match the last word of end_str (value of target) in begin_str, beginning at offset start_search. If no match is found return false; else let idx be the index of start_str where the last character of target appears.
Return true if the string comprised of the first idx characters of begin_str equals the string comprised by the last idx characters of end_str; else set start_search = idx + 2 and repeat step 2.
def begins_with_ends?(end_str, begin_str)
target = end_str[/[[:alnum:]]+\z/]
start_idx = 0
loop do
idx = begin_str.index(/\b#{target}\b/, start_idx)
return false if idx.nil?
idx += target.size
return true if end_str[-idx..-1] == begin_str[0, idx]
start_idx = idx + 2
end
end
begins_with_ends?(end_str, begin_str)
#=> true
This approach recognizes different numbers of spaces between the same two words in both strings (in which case there is no match).
Perhaps something like this would meet your needs?
string_1.split(' ') - string_2.split(' ')
=> ["People", "say", "is", "impossible,"]
Or this is more convoluted, but would give you the exact overlap:
string_2.
chars.
each_with_index.
map { |_, i| string_1.match(string_2[0..i]) }.
select { |s| s }.
max { |x| x.length }.
to_s
=> "but I"
My purpose is to accept a paragraph of text and find the specified phrase I want to REDACT, or replace.
I made a method that accepts an argument as a string of text. I break down that string into individual characters. Those characters are compared, and if they match, I replace those characters with *.
def search_redact(text)
str = ""
print "What is the word you would like to redact?"
redacted_name = gets.chomp
puts "Desired word to be REDACTED #{redacted_name}! "
#splits name to be redacted, and the text argument into char arrays
redact = redacted_name.split("")
words = text.split("")
#takes char arrays, two loops, compares each character, if they match it
#subs that character out for an asterisks
redact.each do |x|
if words.each do |y|
x == y
y.gsub!(x, '*') # sub redact char with astericks if matches words text
end # end loop for words y
end # end if statment
end # end loop for redact x
# this adds char array to a string so more readable
words.each do |z|
str += z
end
# prints it out so we can see, and returns it to method
print str
return str
end
# calling method with test case
search_redact("thisisapassword")
#current issues stands, needs to erase only if those STRING of characters are
# together and not just anywehre in the document
If I put in a phrase that shares characters with others parts of the text, for example, if I call:
search_redact("thisisapassword")
then it will replace that text too. When it accepts input from the user, I want to get rid of only the text password. But it then looks like this:
thi*i**********
Please help.
This is a classic windowing problem used to find a substring in a string. There are many ways to solve this, some that are much more efficient than others but I'm going to give you a simple one to look at that uses as much of your original code as possible:
def search_redact(text)
str = ""
print "What is the word you would like to redact?"
redacted_name = gets.chomp
puts "Desired word to be REDACTED #{redacted_name}! "
redacted_name = "password"
#splits name to be redacted, and the text argument into char arrays
redact = redacted_name.split("")
words = text.split("")
words.each.with_index do |letter, i|
# use windowing to look for exact matches
if words[i..redact.length + i] == redact
words[i..redact.length + i].each.with_index do |_, j|
# change the letter to an astrisk
words[i + j] = "*"
end
end
end
words.join
end
# calling method with test case
search_redact("thisisapassword")
The idea here is we're taking advantage of array == which allows us to say ["a", "b", "c"] == ["a", "b", "c"]. So now we just walk the input and ask does this sub array equal this other sub array. If they do match, we know we need to change the value so we loop through each element and replace it with a *.
I am working on a caesar cipher which is a real simple cipher which shifts each letter in a message to the right in accordance with a given key. For example, with a key of 3, the message "hello" would become encrypted as "ifmmp"
I have written this program as a series of loops which are... I forgot the term, but its where you have a loop inside of a loop. The term escapes me at the moment.
Anyway, the way I am doing this is by first converting the message, which might consist of several statements, into an array of words.
Then, I am converting each of those words into an array of letters, so that I can shift them individually.
Finally, I am merging the array of letters into a single words, and I am merging the array of words back into a single message.
The problem I am running into is that whenever I am trying to use the map and map! methods, I cannot get the shifted letters to retain their value. I come from a C/C++ background, and in those languages I wouldn't have a problem with doing this because I understand how pointers and references work, but I don't know how this works in Ruby.
My question is: How can I get the values of an array to be changed inside of a loop, and not reset back to their original values once I exit the loop? The commented code is as follows:
def caesar_cipher(message,key)
#Convert message to array
message = message.split(' ')
#Map each word in the array to the cipher method
message.map! do |word|
puts "message is: #{message} and the current word is: #{word}"
#Split each word into an array of characters
word = word.split('')
puts "after splitting word is: #{word.inspect}"
#Map each letter to cipher function
word.map do |letter|
puts "trying to shift the letter: #{letter.inspect}"
#Based on the value of the key, each letter will be shifted to the right key times
key.times do
#Cases when the letter is at the end of the alphabet
case letter
when "z"
letter = "a"
when "Z"
letter = "A"
#By default, each letter will be shifted to the next letter in the alphabet per each iteration of the loop
else
letter = letter.next!
end
puts "the letter is now: #{letter.inspect}"
end
#Join the array of letters back into a single word
word = word.join('')
puts "after joining word is: #{word.inspect}"
end
end
#Join the array of words back into the shifted message
message.join(' ')
end
Your code was mostly fine. I made just two tiny fixes
def caesar_cipher(message,key)
message = message.split(' ')
message.map! do |word|
word = word.split('')
word.map! do |letter| # or word = word.map
key.times do
case letter
when "z"
letter = "a"
when "Z"
letter = "A"
else
letter = letter.next!
end
end
letter # return the next letter from the block
end
word.join('')
end
message.join(' ')
end
puts caesar_cipher('hello', 2)
# >> jgnnq
What you were doing wrong
The values were not retaining changes because you didn't save them (map doesn't change the original array, it returns a changed copy)
Sometimes, return value of word.map was letter.next! (because it was the last expression evaluated in the block), which is a number, not a letter. You need to always return the letter.
Not a direct answer to the question, but you might find a more functional approach useful.
I try to reduce nested loops and conditional branch logic where possible, as they can be quite painful to follow.
def caesar_cipher(message, key)
key.times do
message = message
.split("")
.map(&:ord) # convert each character to ascii number
.map(&:next) # increment ascii number by 1
.map(&:chr) # convert ascii number back to character
.join
.gsub("{", "a") # fix characters out of range
.gsub("[", "A")
end
message
end
I'm making an auditor with ruby which started off fine this morning (using single word, user inputted content to omit) but now that I've tried to implement a wordlist, it puts the string to search through as many times as there are words in the wordlist, only censoring it once or twice. My code is as follows.
#by Nightc||ed, ©2015
puts "Enter string: "
text = gets.chomp
redact = File.read("wordlist.txt").split(" ")
words = text.split(" ")
redact.each do |beep|
words.each do |word|
if word != beep
print word + " "
else
print "[snip] "
end
end
end
sleep
I kind of understand why it doesn't work but I'm not sure how to fix it.
Any help would be appreciated.
There's an easier way than iterating through each array. The Array#include method can be easily used to see if the word is contained in your redacted list.
Here's some code that should behave how you wanted the original code to behave:
puts "Enter string: "
text = gets.chomp
redact = File.read("wordlist.txt").split(" ")
words = text.split(" ")
words.each do |word|
if redact.include? word
print "[snip] "
else
print word + " "
end
end
Scrubbing text gets very tricky. One thing you want to watch out for is word boundaries. Splitting on spaces will let a lot of beep words get through because of puctuation. Compare the first two results of the sample code below.
Next, assembling the split text back into its intended form with punction, spacing, etc., gets to be quite challenging. You may want to consider using regex for something presuambly as small as user comments. See the third result.
If you're doing this as a learning exercise, great, but if the application is sensitive where you're likely to take heat over failures to bleep words, you may want to look for an existing well-tested library.
#!/usr/bin/env ruby
# Bleeper
scifi_curses = ['friggin', 'gorram', 'fracking', 'dork']
text = "Why splitting spaces won't catch all the friggin bleeps ya gorram, fracking dork."
words = text.split(" ")
words.each do |this_word|
puts "bleep #{this_word}" if scifi_curses.include?(this_word)
end
puts
better_words = text.split(/\b/)
better_words.each do |this_word|
puts "bleep #{this_word}" if scifi_curses.include?(this_word)
end
puts
bleeped_text = text # keep copy of original if needed
scifi_curses.each do |this_curse|
bleeped_text.gsub!(this_curse, '[bleep]')
end
puts bleeped_text
You should get these results:
bleep friggin
bleep fracking
bleep friggin
bleep gorram
bleep fracking
bleep dork
Why splitting spaces won't catch all the [bleep] bleeps ya [bleep], [bleep] [bleep].
I want my output to search and count the frequency of the words "candy" and "gram", but also the combinations of "candy gram" and "gram candy," in a given text (whole_file.)
I am currently using the following code to display the occurrences of "candy" and "gram," but when I aggregate the combinations within the %w, only the word and frequencies of "candy" and "gram" display. Should I try a different way? thanks so much.
myArray = whole_file.split
stop_words= %w{ candy gram 'candy gram' 'gram candy' }
nonstop_words = myArray - stop_words
key_words = myArray - nonstop_words
frequency = Hash.new (0)
key_words.each { |word| frequency[word] +=1 }
key_words = frequency.sort_by {|x,y| x }
key_words.each { |word, frequency| puts word + ' ' + frequency.to_s }
It sounds like you're after n-grams. You could break the text into combinations of consecutive words in the first place, and then count the occurrences in the resulting array of word groupings. Here's an example:
whole_file = "The big fat man loves a candy gram but each gram of candy isn't necessarily gram candy"
[["candy"], ["gram"], ["candy", "gram"], ["gram", "candy"]].each do |term|
terms = whole_file.split(/\s+/).each_cons(term.length).to_a
puts "#{term.join(" ")} #{terms.count(term)}"
end
EDIT: As was pointed out in the comments below, I wasn't paying close enough attention and was splitting the file on each loop which is obviously not a good idea, especially if it's large. I also hadn't accounted for the fact that the original question may've need to sort by the count, although that wasn't explicitly asked.
whole_file = "The big fat man loves a candy gram but each gram of candy isn't necessarily gram candy"
# This is simplistic. You would need to address punctuation and other characters before
# or at this step.
split_file = whole_file.split(/\s+/)
terms_to_count = [["candy"], ["gram"], ["candy", "gram"], ["gram", "candy"]]
counts = []
terms_to_count.each do |term|
terms = split_file.each_cons(term.length).to_a
counts << [term.join(" "), terms.count(term)]
end
# Seemed like you may need to do sorting too, so here that is:
sorted = counts.sort { |a, b| b[1] <=> a[1] }
sorted.each do |count|
puts "#{count[0]} #{count[1]}"
end
Strip punctuation and convert to lower-case
The first thing you probably want to do is remove all punctuation from the string holding the contents of the file and then convert what's left to lower case, the latter so you don't have worry about counting 'Cat' and 'cat' as the same word. Those two operations can be done in either order.
Changing upper-case letters to lower-case is easy:
text = whole_file.downcase
To remove the punctuation it is probably easier to decide what to keep rather than what to discard. If we only want to keep lower-case letters, you can do this:
text = whole_file.downcase.gsub(/[^a-z]/, '')
That is, substitute an empty string for all characters other than (^) lowercase letters.1
Determine frequency of individual words
If you want to count the number of times text contains the word 'candy', you can use the method String#scan on the string text and then determine the size of the array that is returned:
text.scan(/\bcandy\b/).size
scan returns an array with every occurrence of the string 'candy'; .size returns the size of that array. Here \b ensures 'candy gram' has a word "boundary" at each end, which could be whitespace or the beginning or end of a line or the file. That's to prevent `candycane' from being counted.
A second way is to convert the string text to an array of words, as you have done2:
myArray = text.split
If you don't mind, I'd like to call this:
words = text.split
as I find that more expressive.3
The most direct way to determine the number of times 'candy' appears is to use the method Enumberable#count, like this:
words.count('candy')
You can also use the array difference method, Array#-, as you noted:
words.size - (words - ['candy']).size
If you wish to know the number of times either 'candy' or 'gram' appears, you could of course do the above for each and sum the two counts. Some other ways are:
words.size - (myArray - ['candy', 'gram']).size
words.count { |word| word == 'candy' || word = 'gram' }
words.count { |word| ['candy', 'gram'].include?(word) }
Determine the frequency of all words that appear in the text
Your use of a hash with a default value of zero was a good choice:
def frequency_of_all_words(words)
frequency = Hash.new(0)
words.each { |word| frequency[word] +=1 }
frequency
end
I wrote this as a method to emphasize that words.each... does not return frequency. Often you would see this written more compactly using the method Enumerable#each_with_object, which returns the hash ("object"):
def frequency_of_all_words(words)
words.each_with_object(Hash.new(0)) { |word, h| h[word] +=1 }
end
Once you have the hash frequency you can sort it as you did:
frequency.sort_by {|word, freq| freq }
or
frequency.sort_by(&:last)
which you could write:
frequency.sort_by {|_, freq| freq }
since you aren't using the first block variable. If you wanted the most frequent words first:
frequency.sort_by(&:last).reverse
or
frequency.sort_by {|_, freq| -freq }
All of these will give you an array. If you want to convert it back to a hash (with the largest values first, say):
Hash[frequency.sort_by(&:last).reverse]
or in Ruby 2.0+,
frequency.sort_by(&:last).reverse.to_h
Count the number of times a substring appears
Now let's count the number of times the string 'candy gram' appears. You might think we could use String#scan on the string holding the entire file, as we did earlier4:
text.scan(/\bcandy gram\b/).size
The first problem is that this won't catch 'candy\ngram'; i.e., when the words are separated by a newline character. We could fix that by changing the regex to /\bcandy\sgram\b/. A second problem is that 'candy gram' might have been 'candy. Gram' in the file, in which case you might not want to count it.
A better way is to use the method Enumerable#each_cons on the array words. The easiest way to show you how that works is by example:
words = %w{ check for candy gram here candy gram again }
#=> ["check", "for", "candy", "gram", "here", "candy", "gram", "again"]
enum = words.each_cons(2)
#=> #<Enumerator: ["check", "for", "candy", "gram", "here", "candy",
# "gram", "again"]:each_cons(2)>
enum.to_a
#=> [["check", "for"], ["for", "candy"], ["candy", "gram"],
# ["gram", "here"], ["here", "candy"], ["candy", "gram"],
# ["gram", "again"]]
each_cons(2) returns an enumerator; I've converted it to an array to display its contents.
So we can write
words.each_cons(2).map { |word_pair| word_pair.join(' ') }
#=> ["check for", "for candy", "candy gram", "gram here",
# "here candy", "candy gram", "gram again"]
and lastly:
words.each_cons(2).map { |word_pair|
word_pair.join(' ') }.count { |s| s == 'candy gram' }
#=> 2
1 If you also wanted to keep dashes, for hyphenated words, change the regex to /[^-a-z]/ or /[^a-z-]/.
2 Note from String#split that .split is the same as both .split(' ') and .split(/\s+/)).
3 Also, Ruby's naming convention is to use lower-case letters and underscores ("snake-case") for variables and methods, such as my_array.