Ruby Scan with regex not finding all patterns [closed] - ruby

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I'm trying to use a Regular Expression to find all sub strings in word. It is finding some but not all. On such example is 'an' in the word 'banana'.
def substrings str
pattern = '.'
subs = []
while pattern.length < str.length do
subs << str.scan(/#{pattern}/)
pattern << '.'
end
subs.flatten
end
puts substrings("banana").sort_by{ |s| "banana".index(/#{s}/)}

Regular expression matches will never overlap. If you ask for /../, you will get ["ba", "na", "na"]. You will not get ["ba", "an" ...] because "an" overlaps "ba". The next match search will start from the last match's end, always.
If you want to find overlapping sequences, you need to use lookahead/lookbehind to shorten your match size so the matches themselves don't overlap: /(?=(..))/. Note that you have to introduce a capture group, since the match itself is an empty string in this case.

def substrings str
(0...str.length).flat_map{|i| (i...str.length).map{|j| str[i..j]}}.uniq
end
substrings("banana")
Result
[
"b",
"ba",
"ban",
"bana",
"banan",
"banana",
"a",
"an",
"ana",
"anan",
"anana",
"n",
"na",
"nan",
"nana"
]
or
def substrings str
(0...str.length).to_a.combination(2).map{|r| str[*r]}.uniq
end
Result
[
"b",
"ba",
"ban",
"bana",
"banan",
"banana",
"an",
"ana",
"anan",
"anana",
"nan",
"nana",
"na",
"a"
]

Here's another way that does not use a regex. I see now how it can be done with a regex, but I don't know why you'd want to, unless it's just an exercise.
def substrings(str)
arr = str.chars
(1..str.size).each_with_object([]) { |i,a|
a << arr.each_cons(i).to_a.map(&:join) }.flatten
end
substrings("banana")
#=> ["b", "a", "n", "a", "n", "a", "ba", "an", "na", "an", "na", "ban",
# "ana", "nan", "ana", "bana", "anan", "nana", "banan", "anana"]
If you want to include the word "banana", change str.size to str.size+1.

Related

RUBY split word to letters and numbers [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I would like to ask you for help. I have keywords in this form "AB10" and I need to split i to "AB" and "10". What is the best way?
Thank you for your help!
One could use String#scan:
def divide_str(s)
s.scan(/\d+|\D+/)
end
divide_str 'AB10' #=> ["AB", "10"]
divide_str 'AB10CD20' #=> ["AB", "10", "CD", "20"]
divide_str '10AB20CD' #=> ["10", "AB", "20", "CD"]
The regular expression /\d+|\D+/ reads, "match one or more (+) digits (\d) or one or more non-digits (\D).
Here is another way, one that does not employ a regular expression.
def divide_str(s)
digits = '0'..'9'
s.each_char.slice_when do |x,y|
digits.cover?(x) ^ digits.cover?(y)
end.map(&:join)
end
divide_str 'AB10' #=> ["AB", "10"]
divide_str 'AB10CD20' #=> ["AB", "10", "CD", "20"]
divide_str '10AB20CD' #=> ["10", "AB", "20", "CD"]
See Enumerable#slice_when, Range#cover?, TrueClass#^ and FalseClass#^.
Use split like so:
my_str.split(/(\d+)/)
To split any string on the boundary between digits and letters, use either of these 2 methods:
Use split with regex in capturing parentheses to include the delimiter, here a stretch of digits, into the resulting array. Remove empty strings (if any) using a combination of reject and empty?:
strings = ['AB10', 'AB10CD20', '10AB20CD']
strings.each do |str|
arr = str.split(/(\d+)/).reject(&:empty?)
puts "'#{str}' => #{arr}"
end
Output:
'AB10' => ["AB", "10"]
'AB10CD20' => ["AB", "10", "CD", "20"]
'10AB20CD' => ["10", "AB", "20", "CD"]
Use split with non-capturing parentheses: (?:PATTERN), positive lookahead (?=PATTERN) and positive lookbehind (?<=PATTERN) regexes to match the letter-digit and digit-letter boundaries:
strings.each do |str|
arr = str.split(/ (?: (?<=[A-Za-z]) (?=\d) ) | (?: (?<=\d) (?=[A-Za-z]) ) /x)
puts "'#{str}' => #{arr}"
end
The two methods give the same output for the cases shown.

Finding dictionary words within a source text, using Ruby

Using Ruby, I need to output a list of words, found in a dictionary, that can be formed by eliminating letters from a source text.
E.g., if I input the source text "crazed" I want to get not only words like "craze" and "razed", whose letters are in the same order AND whose letters are adjacent to each other within the source text, but ALSO words like "rad" and "red", because those words exist and can be found by eliminating select letters from "crazed" AND the output words retain letter order. BUT, words like "dare" or "race" should not be in the output list, because the letter order of the letters in "dare" or "race" are not the same as those letters found in "crazed". (If "raed" or "crae" were words in the dictionary, they WOULD be part of the output.)
My thought was to go through the source text in a binary manner
(for "crazed", we'd get:
000001 = "d";
000010 = "e";
000011 = "ed";
000100 = "z";
000101 = "zd";
000111 = "zed";
001000 = "a";
001001 = "ad"; etc.)
and compare each result with words in a dictionary, though I don't know how to code that, nor whether that is most efficient. This is where I would greatly benefit from your help.
Also, the length of the source text would be variable; it wouldn't necessarily be six letters long (like "crazed"). Inputs would potentially be much larger (20-30 characters, possibly more).
I've searched here and found questions about anagrams and about words that can be in any letter order, but not specifically what i'm looking for. Is this even possible in Ruby? Thank you.
First let's read the words of a dictionary into an array, after chomping, downcasing and removing duplicates (if, for example, the dictionary contains both "A" and "a", as does the dictionary on my Mac that I've used below).
DICTIONARY = File.readlines("/usr/share/dict/words").map { |w| w.chomp.downcase }.uniq
#=> ["a", "aa", "aal", "aalii",..., "zyzomys", "zyzzogeton"]
DICTIONARY.size
#=> 234371
The following method generates all combinations of one or more characters of a given word, respecting order, and for each, joins the characters to form a string, checks to see if the string is in the dictionary, and if it is, saves the string to an array.
To check if a string matches a word in the dictionary I perform a binary search, using the method Array#bsearch. This makes use of the fact that the dictionary is already sorted in alphabetical order.
def subwords(word)
arr = word.chars
(1..word.size).each.with_object([]) do |n,a|
arr.combination(n).each do |comb|
w = comb.join
a << w if DICTIONARY.bsearch { |dw| w <=> dw }
end
end
end
subwords "crazed"
# => ["c", "r", "a", "z", "e", "d",
# "ca", "ce", "ra", "re", "ae", "ad", "ed",
# "cad", "rad", "red", "zed",
# "raze", "craze", "crazed"]
Yes, that particular dictionary contains all those strings (such as "z") that don't appear to be English words.
Another example.
subwords "importance"
#=> ["i", "m", "p", "o", "r", "t", "a", "n", "c", "e",
# "io", "it", "in", "ie", "mo", "mr", "ma", "me", "po", "pa", "or",
# "on", "oe", "ra", "re", "ta", "te", "an", "ae", "ne", "ce",
# "imp", "ima", "ion", "ira", "ire", "ita", "ian", "ice", "mor", "mot",
# "mon", "moe", "man", "mac", "mae", "pot", "poa", "pon", "poe", "pan",
# "pac", "ort", "ora", "orc", "ore", "one", "ran", "tan", "tae", "ace",
# "iota", "ione", "iran", "mort", "mora", "morn", "more", "mote",
# "moan", "mone", "mane", "mace", "port", "pore", "pote", "pone",
# "pane", "pace", "once", "rane", "race", "tane",
# "impot", "moran", "morne", "porta", "ponce", "rance",
# "import", "impone", "impane", "prance",
# "portance",
# "importance"]
An extensive solution set that comprises words that can be obtained from using letters in any order is below. The catch with using combination to find possible subwords is that the permutations of the combinations are missed. eg: drawing from 'importance', the combination of 'mpa' will arise at some point. since this isn't a dictionary word, it'll be skipped. thereby costing us, the permutation 'map'-- dictionary subword of 'importance'. below is an extensive solution that finds more possible dictionary words. I agree that my method can be optimized for speed.
#steps
#split string at ''
#find combinations for n=2 all the way to n=word.size
#for each combination
#find the permutations of all the arrangements
#then
#join the array
#check to see if word is in dictionary
#and it's not already collected
#if it is, add to collecting array
require 'set'
Dictionary=File.readlines('dictionary.txt').map(&:chomp).to_set
Dictionary.size #39501
def subwords(word)
#split string at ''
arr=word.split('')
#excluding single letter words
#you can change 2 to 1 in line below to select for single letter words too
(2..word.size).each_with_object([]) do |n,a|
#find combinations for n=2 all the way to n=word.size
arr.combination(n).each do |comb|
#for each combination
#find the permutations of all the arrangements
comb.permutation(n).each do |perm|
#join the array
w=perm.join
#check to see if word is in dictionary and it's not already collected
if Dictionary.include?(w) && !a.include?(w)
#if it is, add to collecting array
a<<w
end
end
end
end
end
p subwords('crazed')
#["car", "arc", "rec", "ace", "cad", "are", "era", "ear", "rad", "red", "adz", "zed", "czar", "care", "race", "acre", "card", "dace", "raze", "read", "dare", "dear", "adze", "daze", "craze", "cadre", "cedar", "crazed"]
p subwords('battle')
#["bat", "tab", "alb", "lab", "bet", "tat", "ate", "tea", "eat", "eta", "ale", "lea", "let", "bate", "beat", "beta", "abet", "bale", "able", "belt", "teat", "tale", "teal", "late", "bleat", "table", "latte", "battle", "tablet"]

Breaking it down - simple explanation for regular expressions used in ruby exercise [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Problem # 1
In this first problem (line 2) - can someone break it down as to what is happening in terms easy for someone new to Ruby regular expressions to understand?
def encode(string)
string.scan(/(.)(\1*)/).collect do |char, repeat|
[1 + repeat.length, char]
end.join
end
Problem # 2
In this second problem (same answer, but different format to solution) - can someone break it down as to what is happening in terms easy for someone new to Ruby to understand?
def encode(string)
string.split( // ).sort.join.gsub(/(.)\1{2,}/) {|s| s.length.to_s + s[0,1] }
end
It's easy for you to break these down and figure out what they're doing. Simply use IRB, PRY or Sublime Text 2 with the "Seeing is Believing" plugin to look at the results of each operation. I'm using the last one for this:
def encode(string)
string # => "foo"
.scan(/(.)(\1*)/) # => [["f", ""], ["o", "o"]]
.collect { |char, repeat|
[
1 + # => 1, 1 <-- these are the results of two passes through the block
repeat.length, # => 1, 2 <-- these are the results of two passes through the block
char # => "f", "o" <-- these are the results of two passes through the block
] # => [1, "f"], [2, "o"] <-- these are the results of two passes through the block
}.join # => "1f2o"
end
encode('foo') # => "1f2o"
And here's the second piece of code:
def encode(string)
string # => "foobarbaz"
.split( // ) # => ["f", "o", "o", "b", "a", "r", "b", "a", "z"]
.sort # => ["a", "a", "b", "b", "f", "o", "o", "r", "z"]
.join # => "aabbfoorz"
.gsub(/(.)\1{2,}/) {|s|
s.length.to_s +
s[0,1]
} # => "aabbfoorz"
end
encode('foobarbaz') # => "aabbfoorz"

Ruby: How do I use regex to return all matches? [duplicate]

This question already has answers here:
Ruby: filter array by regex?
(6 answers)
Closed 8 years ago.
I can intersect two arrays by doing:
keyphrase_matches = words & city.keywords
How can I achieve the same thing using regular expression? I want to test one array against a regular expression and get a new array with the matches.
You can use the Enumerable#grep method:
%w{a b c 1 2 3}.grep /\d/ # => ["1", "2", "3"]
Use array.grep(regex) to return all elements that match the given regex.
See Enumerable#grep.
As I understand, if arr1 and arr2 are two arrays of strings (though you did not say they contain strings), you want to know if a regular expression could be used to produce arr1 & arr2.
First some test data:
arr1 = "Now is the time for all good Rubyists".split
#=> ["Now", "is", "the", "time", "for", "all", "good", "Rubyists"]
arr2 = "to find time to have the good life".split
#=> ["to", "find", "time", "to", "have", "the", "good", "life"]
The result we want:
arr1 & arr2
#=> ["the", "time", "good"]
I can think of two ways you might use use Enumerable#grep, as suggested by #meagar and #August:
#1
arr1.select { |e| arr2.grep(/#{e}/).any? }
#=> ["the", "time", "good"]
#2
regex = Regexp.new("#{arr2.join('|')}")
#=> /to|find|time|to|have|the|good|life/
arr1.grep(regex)
#=> ["the", "time", "good"]
Of course, Array#& generally would be preferred, especially in Code Golf.

Determining if a prefix exists in a set

Given a set of strings, say:
"Alice"
"Bob"
"C"
"Ca"
"Car"
"Carol"
"Caroling"
"Carousel"
and given a single string, say:
"Carolers"
I would like a function that returns the smallest prefix not already inside the array.
For the above example, the function should return: "Caro". (A subsequent call would return "Carole")
I am very new to Ruby, and although I could probably hack out something ugly (using my C/C++/Objective-C brain), I would like to learn how to properly (elegantly?) code this up.
There's a little known magical module in Ruby called Abbrev.
require 'abbrev'
abbreviations = Abbrev::abbrev([
"Alice",
"Bob",
"C",
"Ca",
"Car",
"Carol",
"Caroling",
"Carousel"
])
carolers = Abbrev::abbrev(%w[Carolers])
(carolers.keys - abbreviations.keys).sort.first # => "Caro"
Above I took the first element but this shows what else would be available.
pp (carolers.keys - abbreviations.keys).sort
# >> ["Caro", "Carole", "Caroler", "Carolers"]
Wrap all the above in a function, compute the resulting missing elements, and then iterate over them yielding them to a block, or use an enumerator to return them one-by-one.
This is what is generated for a single word. For an array it is more complex.
require 'pp'
pp Abbrev::abbrev(['cat'])
# >> {"ca"=>"cat", "c"=>"cat", "cat"=>"cat"}
pp Abbrev::abbrev(['cat', 'car', 'cattle', 'carrier'])
# >> {"cattl"=>"cattle",
# >> "catt"=>"cattle",
# >> "cat"=>"cat",
# >> "carrie"=>"carrier",
# >> "carri"=>"carrier",
# >> "carr"=>"carrier",
# >> "car"=>"car",
# >> "cattle"=>"cattle",
# >> "carrier"=>"carrier"}
Your question still doesn't match what you are expecting as a result. It seems that you need prefixes, not the substrings (as "a" would be the shortest substring not already in the array). For searching the prefix, this should suffice:
array = [
"Alice",
"Bob",
"C",
"Ca",
"Car",
"Carol",
"Caroling",
"Carousel",
]
str = 'Carolers'
(0..str.length).map{|i|
str[0..i]
}.find{|s| !array.member?(s)}
I am not a Ruby expert, but I think you may want to approach this problem by converting your set into a trie. Once you have the trie constructed, your problem can be solved simply by walking down from the root of the trie, following all of the edges for the letters in the word, until you either find a node that is not marked as a word or walk off the trie. In either case, you've found a node that isn't part of any word, and you have the shortest prefix of your word in question that doesn't already exist inside of the set. Moreover, this would let you run any number of prefix checks quickly, since after you've built up the trie the algorithm takes time at most linear in the length of the string.
Hope this helps!
I'm not really sure what you're asking for other than an example of some Ruby code to find common prefixes. I'll assume you want to find the smallest string which is a prefix of the most number of strings in the given set. Here's an example implementation:
class PrefixFinder
def initialize(words)
#words = Hash[*words.map{|x|[x,x]}.flatten]
end
def next_prefix
max=0; biggest=nil
#words.keys.sort.each do |word|
0.upto(word.size-1) do |len|
substr=word[0..len]; regex=Regexp.new("^" + substr)
next if #words[substr]
count = #words.keys.find_all {|x| x=~regex}.size
max, biggest = [count, substr] if count > max
#puts "OK: s=#{substr}, biggest=#{biggest.inspect}"
end
end
#words[biggest] = biggest if biggest
biggest
end
end
pf = PrefixFinder.new(%w(C Ca Car Carol Caroled Carolers))
pf.next_prefix # => "Caro"
pf.next_prefix # => "Carole"
pf.next_prefix # => "Caroler"
pf.next_prefix # => nil
No comment on the performance (or correctness) of this code but it does show some Ruby idioms (instance variables, iteration, hashing, etc).
=> inn = ["Alice","Bob","C","Ca","Car","Carol","Caroling","Carousel"]
=> y = Array.new
=> str="Carolers"
Split the given string to an array
=> x=str.split('')
# ["C","a","r","o","l","e","r","s"]
Form all the combination
=> x.each_index {|i| y << x.take(i+1)}
# [["c"], ["c", "a"], ["c", "a", "r"], ["c", "a", "r", "o"], ["c", "a", "r", "o", "l"], ["c", "a", "r", "o", "l", "e"], ["c", "a", "r", "o", "l", "e", "r"], ["c", "a", "r", "o", "l", "e", "r", "s"]]
Using Join to concatenate the
=> y = y.map {|s| s.join }
# ["c", "ca", "car", "caro", "carol", "carole", "caroler", "carolers"]
Select the first item from the y thats not available in the input Array
=> y.select {|item| !inn.include? item}.first
You will get "caro"
Putting together all
def FindFirstMissingItem(srcArray,strtocheck)
y=Array.new
x=strtocheck.split('')
x.each_index {|i| y << x.take(i+1)}
y=y.map {|s| s.join}
y.select {|item| !srcArray.include? item}.first
end
And call
=> inn = ["Alice","Bob","C","Ca","Car","Carol","Caroling","Carousel"]
=> str="Carolers"
FindFirstMissingItem inn,str
Very simple version (but not very Rubyish):
str = 'Carolers'
ar = %w(Alice Bob C Ca Car Carol Caroling Carousel)
substr = str[0, n=1]
substr = str[0, n+=1] while ar.include? substr
puts substr

Resources