Find multiple longest common prefixes from list of string

Find multiple longest common prefixes from list of string - ruby

I'm trying to find all possible prefixes from a list of strings. We can remove "/" or ":" from prefixes to make it more readable.
input = ["item1", "item2", "product1", "product2", "variant:123", "variant:789"]
Expected output
item
product
variant

The key here is to find your delimiter. It looks like your delimiters are numbers and : and /. So you should be able to map through the array, use the delimiter in a regex to return the prefix. You also have the option to check it exists in the array (so you TRULY know that its a prefix) but I didnt include it in my answer.
input = ["item1", "item2", "product1", "product2", "variant:123", "variant:789"]
prefixes = input.map {|word| word.gsub(/:?\/?[0-9]/, '')}.uniq
=> ["item", "product", "variant"]
The more you delimiters you have, you can append it onto the regex pattern. Do a little reading here about wildcards :-)
Hope this answers your question!

I assume the order of the prefixes that is returned is unimportant. Also, I have disregarded the treatment of the characters "/" and ":" because that is straightforward and a distraction to the central problem.
Let's first create a helper method whose sole argument is an array of words that begin with the same letter.
def longest_prefix(arr)
a0, *a = arr
return a0 if a0.size == 1 || arr.size == 1
n = (1..a0.size-1).find do |i|
c = a0[i]
a.any? { |w| w[i] != c }
end
n.nil? ? a0 : a0[0,n]
end
For example,
arr = ["dog1", "doggie", "dog2", "doggy"]
longest_prefix arr
#=> "dog"
We now merely need to group the words by their first letters, then map the resulting key-value pairs to the return value of the helper method when its argument equals the value of the key-value pair.
def prefixes(arr)
arr.group_by { |w| w[0] }.map { |_,a| longest_prefix(a) }
end
Suppose, for example,
arr = ["dog1", "eagles", "eagle", "doggie", "dog2", "catsup",
"cats", "elephant", "cat", "doggy", "caustic"]
Then
prefixes arr
#=> ["dog", "e", "ca"]
Note that
arr.group_by { |w| w[0] }
#=> { "d"=>["dog1", "doggie", "dog2", "doggy"],
# "e"=>["eagles", "eagle", "elephant"],
# "c"=>["catsup", "cats", "cat", "caustic"] }
See Enumerable#group_by.

Related

Why does ruby's IO readlines method behave differently when followed by a filter

I'm building a little Wordle inspired project for fun and am gathering the words from my local dictionary. Originally I was doing this:
word_list = File.readlines("/usr/share/dict/words", chomp: true)
word_list.filter { |word| word.length == 5 }.map(&:upcase)
The first line takes absolutely ages. However when doing this:
word_list = File.readlines("/usr/share/dict/words", chomp: true).filter { |word| word.length == 5 }.map(&:upcase)
it completes in a matter of seconds. I can't work out how the filter block is being applied to the lines being read before they're assigned memory (which I'm assuming is what is causing the slow read time), clearly each method isn't being fully applied before the next is called but that is how I thought method chaining works.

Let's create a file.
File.write('t', "dog horse\npig porcupine\nowl zebra\n") #=> 34
then
a = File.readlines("t", chomp:true)
#=> ["dog horse", "pig porcupine", "owl zebra"]
so your block variable word holds a string of two words. That's obviously not what you want.
You could use IO::read to "gulp" the file into a string.
s = File.read("t")
#=> "dog horse\npig porcupine\nowl zebra\n"
then
a = s.scan(/\w+/)
#=> ["dog", "horse", "pig", "porcupine", "owl", "zebra"].
b = a.select { |word| word.size == 5 }
#=> ["horse", "zebra"]
c = b.map(&:upcase)
#=> ["HORSE", "ZEBRA"]
We could of course chain these operations:
File.read("t").scan(/\w+/).select { |word| word.size == 5 }.map(&:upcase)
#=> ["HORSE", "ZEBRA"]
scan(/\w+/) matches each string of word characters (letters, digits and underscores). To match only letters change that to scan(/[a-zA-Z]+/).
You could use IO#readlines, which reads lines into an array, by extracting words for each line, filtering the resulting array to keep ones having 5 characters, and then adding those words, after upcasing, to a previously-defined empty array.
File.readlines('t')
.each_with_object([]) { |line,arr| line.scan(/\w+/) }
.select { |word| word.size == 5 }
.map(&:upcase)
.each { |word| arr << word } #=> ["HORSE", "ZEBRA"]
You could add the optional parameter chomp: true to readline's arguments, but there is no reason to do so.
Better would be to use IO#foreach which, without a block, returns an enumerator which can be chained, avoiding for the temporary array created by readlines.
File.foreach('t').with_object([]) do |line,arr|
line.scan(/\w+/)
.select { |word| word.size == 5 }
.map(&:upcase)
.each { |word| arr << word }
end
#=> ["HORSE", "ZEBRA"]

Unscrambling a string given the number of splits and words that the sentence can be comprised of

Im working on a problem in which I'm given a string that has been scrambled. The scrambling works like this.
An original string is chopped into substrings at random positions and a random number of times.
Each substring is then moved around randomly to form a new string.
I'm also given a dictionary of words that are possible words in the string.
Finally, i'm given the number of splits in the string that were made.
The example I was given is this:
dictionary = ["world", "hello"]
scrambled_string = rldhello wo
splits = 1
The expected output of my program would be the original string, in this case:
"hello world"

Suppose the initial string
"hello my name is Sean"
with
splits = 2
yields
["hel", "lo my name ", "is Sean"]
and those three pieces are shuffled to form the following array:
["lo my name ", "hel", "is Sean"]
and then the elements of this array are joined to form:
scrambled = "lo my name helis Sean"
Also suppose:
dictionary = ["hello", "Sean", "the", "name", "of", "my", "cat", "is", "Sugar"]
First convert dictionary to a set to speed lookups.
require 'set'
dict_set = dictionary.to_set
#=> #<Set: {"hello", "Sean", "the", "name", "of", "my", "cat", "is", "Sugar"}>
Next I will create a helper method.
def indices_to_ranges(indices, last_index)
[-1, *indices, last_index].each_cons(2).map { |i,j| i+1..j }
end
Suppose we split scrambled twice (because splits #=> 2), specifically after the 'y' and the 'h':
indices = [scrambled.index('y'), scrambled.index('h')]
#=> [4, 11]
The first element of indices will always be -1 and the last value will always be scrambled.size-1.
We may then use indices_to_ranges to convert these indices to ranges of indices of characters in scrambed:
ranges = indices_to_ranges(indices, scrambled.size-1)
#=> [0..4, 5..11, 12..20]
a = ranges.map { |r| scrambled[r] }
#=> ["lo my", " name h", "elis Sean"]
We could of course combine these two steps:
a = indices_to_ranges(indices, scrambled.size-1).map { |r| scrambled[r] }
#=> ["lo my", " name h", "elis Sean"]
Next I will permute the values of a. For each permutation I will join the elements to form a string, then split the string on single spaces to form an array of words. If all of those words are in the dictionary we may claim success and are finished. Otherwise, a different array indices will be constructed and we try again, continuing until success is realized or all possible arrays indices have been considered. We can put all this in the following method.
def unscramble(scrambled, dict_set, splits)
last_index = scrambled.size-1
(0..scrambled.size-2).to_a.combination(splits).each do |indices|
indices_to_ranges(indices, last_index).
map { |r| scrambled[r] }.
permutation.each do |arr|
next if arr[0][0] == ' ' || arr[-1][-1] == ' '
words = arr.join.split(' ')
return words if words.all? { |word| dict_set.include?(word) }
end
end
end
Let's try it.
original string: "hello my name is Sean"
scrambled = "lo my name helis Sean"
splits = 4
unscramble(scrambled, dict_set, splits)
#=> ["my", "name", "hello", "is", "Sean"]
See Array#combination and Array#permutation.

bonkers answer (not quite perfect yet ... trouble with single chars):
#
# spaces appear to be important!
#check = {}
#ordered = []
def previous_words (word)
#check.select{|y,z| z[:previous] == word}.map do |nw,z|
#ordered << nw
previous_words(nw)
end
end
def in_word(dictionary, string)
# check each word in the dictionary to see if the string is container in one of them
dictionary.each do |word|
if word.include?(string)
return word
end
end
return nil
end
letters=scrambled.split("")
previous=nil
substr=""
letters.each do |l|
if in_word(dictionary, substr+l)
substr+= l
elsif (l==" ")
word=in_word(dictionary, substr)
#check[word]={found: 1}
#check[word][:previous] = previous if previous
substr=""
previous=word
else
word=in_word(dictionary, substr)
#check[word]={found: 1}
#check[word][:previous] = previous if previous
substr=l
previous=nil
end
end
word=in_word(dictionary, substr)
#check[word]={found: 1}
#check[word][:previous] = previous if previous
#check.select{|y,z| z[:previous].nil?}.map do |w,z|
#ordered << w
previous_words(w)
end
pp #ordered
output:
dictionary = ["world", "hello"]
scrambled = "rldhello wo"
... my code here ...
2.5.8 :817 > #ordered
=> ["hello", "world"]
dictionary = ["hello", "my", "name", "is", "Sean"]
scrambled = "me is Shelleano my na"
... my code here ...
2.5.8 :879 > #ordered
=> ["Sean", "hello", "my", "name", "is"]

How can I compare the ending of a word with a hash in Ruby?

I'm trying to do something like a string analyzer and I need to retrieve the ending of a word and compare it with the keys of an hash
word = "Test"
ending_hash = {"e" => 1, "st" => 2}
output = 2
I need the output to be 2 in this case, but actually I won't know if the length of the ending is of 1 or 2 characters. Is it possible to do?

Initially, assume that you know that word ends with (at least) one of the keys of ending_hash. You can then write:
word = "Test"
ending_hash = {"e" => 1, "st" => 2}
ending_hash.find { |k,v| word.end_with?(k) }.last
#=> 2
See Enumerable#find, String#end_with? and Array#last.
The intermediate calculation is as follows:
ending_hash.find { |k,v| word.end_with?(k) }
#=> ["st", 2]
If you are unsure if any of the keys may match the end of the string, write:
ending_hash = {"e" => 1, "f" => 2}
arr = ending_hash.find { |k,v| word.end_with?(k) }
#=> nil
arr.nil? ? nil : arr.last
#=> nil
or better:
ending_hash.find { |k,v| word.end_with?(k) }&.last
#=> nil
Here & is the Safe Navigation Operator. In a nutshell, if the expression preceding & returns nil, the SNO immediately returns nil for the entire expression, without executing last.
Even if word must end with one of the keys, you may want to write it this way so that you can check the return value and raise an exception if it is nil.
You could alternatively write:
ending_hash.find { |k,v| word.match? /#{k}\z/ }&.last
The regular expression reads, "match the value of k (#{k}) at the end of the string (the anchor \z)".
Note the following:
{"t"=>1, "st"=>2}.find { |k,v| word.end_with?(k) }&.last
#=> 1
{"st"=>1, "t"=>2}.find { |k,v| word.end_with?(k) }&.last
#=> 1
so the order of the keys may matter.
Lastly, as the block variable v is not used in the block calculation, the block variables would often be written |k,_| or |k,_v|, mainly to signal to the reader that only k is used in the block calculation.

If you know there will be only a small number of lengths of endings, it is much faster to check for all possible lengths, than to check for all endings. (It also makes sense to check them from longest to shortest, in case they overlap, otherwise the shorter will never be matched.)
The lazy one-liner:
(-2..-1).lazy.map { |cut| ending_hash[word[cut..]] }.find(&:itself)
The functional version:
(-2..-1).inject(nil) { |a, e| a || ending_hash[word[e..]] }
The blunt but moist version:
ending_hash[word[-2..]] || ending_hash[word[-1..]]

Check whether a string contains all the characters of another string in Ruby

Let's say I have a string, like string= "aasmflathesorcerersnstonedksaottersapldrrysaahf". If you haven't noticed, you can find the phrase "harry potter and the sorcerers stone" in there (minus the space).
I need to check whether string contains all the elements of the string.
string.include? ("sorcerer") #=> true
string.include? ("harrypotterandtheasorcerersstone") #=> false, even though it contains all the letters to spell harrypotterandthesorcerersstone
Include does not work on shuffled string.
How can I check if a string contains all the elements of another string?

Sets and array intersection don't account for repeated chars, but a histogram / frequency counter does:
require 'facets'
s1 = "aasmflathesorcerersnstonedksaottersapldrrysaahf"
s2 = "harrypotterandtheasorcerersstone"
freq1 = s1.chars.frequency
freq2 = s2.chars.frequency
freq2.all? { |char2, count2| freq1[char2] >= count2 }
#=> true
Write your own Array#frequency if you don't want to the facets dependency.
class Array
def frequency
Hash.new(0).tap { |counts| each { |v| counts[v] += 1 } }
end
end

I presume that if the string to be checked is "sorcerer", string must include, for example, three "r"'s. If so you could use the method Array#difference, which I've proposed be added to the Ruby core.
class Array
def difference(other)
h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
reject { |e| h[e] > 0 && h[e] -= 1 }
end
end
str = "aasmflathesorcerersnstonedksaottersapldrrysaahf"
target = "sorcerer"
target.chars.difference(str.chars).empty?
#=> true
target = "harrypotterandtheasorcerersstone"
target.chars.difference(str.chars).empty?
#=> true
If the characters of target must not only be in str, but must be in the same order, we could write:
target = "sorcerer"
r = Regexp.new "#{ target.chars.join "\.*" }"
#=> /s.*o.*r.*c.*e.*r.*e.*r/
str =~ r
#=> 2 (truthy)
(or !!(str =~ r) #=> true)
target = "harrypotterandtheasorcerersstone"
r = Regexp.new "#{ target.chars.join "\.*" }"
#=> /h.*a.*r.*r.*y* ... o.*n.*e/
str =~ r
#=> nil

A different albeit not necessarily better solution using sorted character arrays and sub-strings:
Given your two strings...
subject = "aasmflathesorcerersnstonedksaottersapldrrysaahf"
search = "harrypotterandthesorcerersstone"
You can sort your subject string using .chars.sort.join...
subject = subject.chars.sort.join # => "aaaaaaacddeeeeeffhhkllmnnoooprrrrrrssssssstttty"
And then produce a list of substrings to search for:
search = search.chars.group_by(&:itself).values.map(&:join)
# => ["hh", "aa", "rrrrrr", "y", "p", "ooo", "tttt", "eeeee", "nn", "d", "sss", "c"]
You could alternatively produce the same set of substrings using this method
search = search.chars.sort.join.scan(/((.)\2*)/).map(&:first)
And then simply check whether every search sub-string appears within the sorted subject string:
search.all? { |c| subject[c] }

Create a 2 dimensional array out of your string letter bank, to associate the count of letters to each letter.
Create a 2 dimensional array out of the harry potter string in the same way.
Loop through both and do comparisons.
I have no experience in Ruby but this is how I would start to tackle it in the language I know most, which is Java.

Ruby gsub match concatenation

Given a string of digits, I am trying to insert '-' between odd numbers and '*' between even numbers. The solution below:
def DashInsertII(num)
num = num.chars.map(&:to_i)
groups = num.slice_when {|x,y| x.odd? && y.even? || x.even? && y.odd?}.to_a
puts groups.to_s
groups.map! do |array|
if array[0].odd?
array.join(" ").gsub(" ", "-")
else
array.join(" ").gsub(" ", "*")
end
end
d = %w{- *}
puts groups.join.chars.to_s
groups = groups.join.chars
# Have to account for 0 because Coderbyte thinks 0 is neither even nor odd, which is false.
groups.each_with_index do |char,index|
if d.include? char
if (groups[index-1] == "0" || groups[index+1] == "0")
groups.delete_at(index)
end
end
end
groups.join
end
is very convoluted, and I was wondering if I could do something like this:
"99946".gsub(/[13579][13579]/) {|s,x| s+"-"+x}
where s is the first odd, x the second. Usually when I substitute, I replace the matched term, but here I want to keep the matched term and insert a character between the pattern. This would make this problem much simpler.

This will work for you:
"99946".gsub(/[13579]+/) {|s| s.split("").join("-") }
# => "9-9-946"
It's roughly similar to what you tried. It captures multiple consecutive odd digits, and uses the gsub block to split and then join them separated by the "-".
This will include both solutions working together:
"99946".gsub(/[13579]+/) {|s| s.split("").join("-") }.gsub(/[02468]+/) {|s| s.split("").join("*") }
# => "9-9-94*6"

The accepted answer illustrates well the logic required to solve the problem. However, I'd like to suggest that in production code that it be simplified somewhat so that it is easier to read and understand.
In particular, we are doing the same thing twice with different arguments, so it would be helpful to the reader to make that obvious, by writing a method or lambda that both uses call. For example:
do_pair = ->(string, regex, delimiter) do
string.gsub(regex) { |s| s.chars.join(delimiter) }
end
Then, one can call it like this:
do_pair.(do_pair.('999434432', /[13579]+/, '-'), /['02468']+/, '*')
This could be simplified even further:
do_pair = ->(string, odd_or_even) do
regex = (odd_or_even == :odd) ? /[13579]+/ : /['02468']+/
delimiter = (odd_or_even == :odd) ? '-' : '*'
string.gsub(regex) { |s| s.chars.join(delimiter) }
end
One advantage to this approach is that it makes obvious both the fact that we are processing two cases, odd and even, and the values we are using for those two cases. It can then be called like this:
do_pair.(do_pair.('999434432', :odd), :even)
This could also be done in a method, of course, and that would be fine. The reason I suggested a lambda is that it's pretty minimal logic and it is used in only one (albeit compound) expression in a single method.
This is admittedly more verbose, but breaks down the logic for the reader into more easily digestible chunks, reducing the cognitive cost of understanding it.

The ordinary way to do that is:
"99946"
.gsub(/(?<=[13579])(?=[13579])/, "-")
.gsub(/(?<=[2468])(?=[2468])/, "*")
# => "9-9-94*6"
or
"99946".gsub(/(?<=[13579])()(?=[13579])|(?<=[2468])()(?=[2468])/){$1 ? "-" : "*"}
# => "9-9-94*6"

"2899946".each_char.chunk { |c| c.to_i.even? }.map { |even, arr|
arr.join(even ? '*' : '-') }.join
#=> "2*89-9-94*6"
The steps:
enum0 = "2899946".each_char
#=> #<Enumerator: "2899946":each_char>
We can convert enum0 to an array to see the elements it will generate:
enum0.to_a
#=> ["2", "8", "9", "9", "9", "4", "6"]
Continuing,
enum1 = enum0.chunk { |c| c.to_i.even? }
#=> #<Enumerator: #<Enumerator::Generator:0x007fa733024b58>:each>
enum1.to_a
#=> [[true, ["2", "8"]], [false, ["9", "9", "9"]], [true, ["4", "6"]]]
a = enum1.map { |even, arr| arr.join(even ? '*' : '-') }
#=> ["2*8", "9-9-9", "4*6"]
a.join
#=> "2*89-9-94*6"

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Find multiple longest common prefixes from list of string - ruby

I'm trying to find all possible prefixes from a list of strings. We can remove "/" or ":" from prefixes to make it more readable. input = ["item1", "item2", "product1", "product2", "variant:123", "variant:789"] Expected output item product variant

Related

Why does ruby's IO readlines method behave differently when followed by a filter

Unscrambling a string given the number of splits and words that the sentence can be comprised of

How can I compare the ending of a word with a hash in Ruby?

Check whether a string contains all the characters of another string in Ruby

Ruby gsub match concatenation

Categories

Resources