Determining if a prefix exists in a set - ruby

Given a set of strings, say:
"Alice"
"Bob"
"C"
"Ca"
"Car"
"Carol"
"Caroling"
"Carousel"
and given a single string, say:
"Carolers"
I would like a function that returns the smallest prefix not already inside the array.
For the above example, the function should return: "Caro". (A subsequent call would return "Carole")
I am very new to Ruby, and although I could probably hack out something ugly (using my C/C++/Objective-C brain), I would like to learn how to properly (elegantly?) code this up.

There's a little known magical module in Ruby called Abbrev.
require 'abbrev'
abbreviations = Abbrev::abbrev([
"Alice",
"Bob",
"C",
"Ca",
"Car",
"Carol",
"Caroling",
"Carousel"
])
carolers = Abbrev::abbrev(%w[Carolers])
(carolers.keys - abbreviations.keys).sort.first # => "Caro"
Above I took the first element but this shows what else would be available.
pp (carolers.keys - abbreviations.keys).sort
# >> ["Caro", "Carole", "Caroler", "Carolers"]
Wrap all the above in a function, compute the resulting missing elements, and then iterate over them yielding them to a block, or use an enumerator to return them one-by-one.
This is what is generated for a single word. For an array it is more complex.
require 'pp'
pp Abbrev::abbrev(['cat'])
# >> {"ca"=>"cat", "c"=>"cat", "cat"=>"cat"}
pp Abbrev::abbrev(['cat', 'car', 'cattle', 'carrier'])
# >> {"cattl"=>"cattle",
# >> "catt"=>"cattle",
# >> "cat"=>"cat",
# >> "carrie"=>"carrier",
# >> "carri"=>"carrier",
# >> "carr"=>"carrier",
# >> "car"=>"car",
# >> "cattle"=>"cattle",
# >> "carrier"=>"carrier"}

Your question still doesn't match what you are expecting as a result. It seems that you need prefixes, not the substrings (as "a" would be the shortest substring not already in the array). For searching the prefix, this should suffice:
array = [
"Alice",
"Bob",
"C",
"Ca",
"Car",
"Carol",
"Caroling",
"Carousel",
]
str = 'Carolers'
(0..str.length).map{|i|
str[0..i]
}.find{|s| !array.member?(s)}

I am not a Ruby expert, but I think you may want to approach this problem by converting your set into a trie. Once you have the trie constructed, your problem can be solved simply by walking down from the root of the trie, following all of the edges for the letters in the word, until you either find a node that is not marked as a word or walk off the trie. In either case, you've found a node that isn't part of any word, and you have the shortest prefix of your word in question that doesn't already exist inside of the set. Moreover, this would let you run any number of prefix checks quickly, since after you've built up the trie the algorithm takes time at most linear in the length of the string.
Hope this helps!

I'm not really sure what you're asking for other than an example of some Ruby code to find common prefixes. I'll assume you want to find the smallest string which is a prefix of the most number of strings in the given set. Here's an example implementation:
class PrefixFinder
def initialize(words)
#words = Hash[*words.map{|x|[x,x]}.flatten]
end
def next_prefix
max=0; biggest=nil
#words.keys.sort.each do |word|
0.upto(word.size-1) do |len|
substr=word[0..len]; regex=Regexp.new("^" + substr)
next if #words[substr]
count = #words.keys.find_all {|x| x=~regex}.size
max, biggest = [count, substr] if count > max
#puts "OK: s=#{substr}, biggest=#{biggest.inspect}"
end
end
#words[biggest] = biggest if biggest
biggest
end
end
pf = PrefixFinder.new(%w(C Ca Car Carol Caroled Carolers))
pf.next_prefix # => "Caro"
pf.next_prefix # => "Carole"
pf.next_prefix # => "Caroler"
pf.next_prefix # => nil
No comment on the performance (or correctness) of this code but it does show some Ruby idioms (instance variables, iteration, hashing, etc).

=> inn = ["Alice","Bob","C","Ca","Car","Carol","Caroling","Carousel"]
=> y = Array.new
=> str="Carolers"
Split the given string to an array
=> x=str.split('')
# ["C","a","r","o","l","e","r","s"]
Form all the combination
=> x.each_index {|i| y << x.take(i+1)}
# [["c"], ["c", "a"], ["c", "a", "r"], ["c", "a", "r", "o"], ["c", "a", "r", "o", "l"], ["c", "a", "r", "o", "l", "e"], ["c", "a", "r", "o", "l", "e", "r"], ["c", "a", "r", "o", "l", "e", "r", "s"]]
Using Join to concatenate the
=> y = y.map {|s| s.join }
# ["c", "ca", "car", "caro", "carol", "carole", "caroler", "carolers"]
Select the first item from the y thats not available in the input Array
=> y.select {|item| !inn.include? item}.first
You will get "caro"
Putting together all
def FindFirstMissingItem(srcArray,strtocheck)
y=Array.new
x=strtocheck.split('')
x.each_index {|i| y << x.take(i+1)}
y=y.map {|s| s.join}
y.select {|item| !srcArray.include? item}.first
end
And call
=> inn = ["Alice","Bob","C","Ca","Car","Carol","Caroling","Carousel"]
=> str="Carolers"
FindFirstMissingItem inn,str

Very simple version (but not very Rubyish):
str = 'Carolers'
ar = %w(Alice Bob C Ca Car Carol Caroling Carousel)
substr = str[0, n=1]
substr = str[0, n+=1] while ar.include? substr
puts substr

Related

Ruby non consistent results with scanned string's length

I may not be having the whole picture here but I am getting inconsistent results with a calculation: I am trying to solve the run length encoding problem so that if you get an input string like "AAABBAAACCCAA" the encoding will be: "3A2B3A3C2A" so the functions is:
def encode(input)
res = ""
input.scan(/(.)\1*/i) do |match|
res << input[/(?<bes>#{match}+)/, "bes"].length.to_s << match[0].to_s
end
res
end
The results I am getting are:
irb(main):049:0> input = "AAABBBCCCDDD"
=> "AAABBBCCCDDD"
irb(main):050:0> encode(input)
(a) => "3A3B3C3D"
irb(main):051:0> input = "AAABBBCCCAAA"
=> "AAABBBCCCAAA"
irb(main):052:0> encode(input)
(b) => "3A3B3C3A"
irb(main):053:0> input = "AAABBBCCAAA"
=> "AAABBBCCAAA"
irb(main):054:0> encode(input)
(c) => "3A3B2C3A"
irb(main):055:0> input = "AAABBBCCAAAA"
=> "AAABBBCCAAAA"
irb(main):056:0> encode(input)
(d) => "3A3B2C3A"
irb(main):057:0> input = 'WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWB'
=> "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWB"
irb(main):058:0> encode(input)
(e) => "12W1B12W1B12W1B"
As you can see, results (a) through (c) are correct, but results (d) and (e) are missing some repetitions and the resulting code is several letters short, can you give a hint as to where to check, please? (I am learning to use 'pry' right now)
Regular expressions are great, but they're not the golden hammer for every problem.
str = "AAABBAAACCCAA"
str.chars.chunk_while { |i, j| i == j }.map { |a| "#{a.size}#{a.first}" }.join
Breaking down what it does:
str = "AAABBAAACCCAA"
str.chars # => ["A", "A", "A", "B", "B", "A", "A", "A", "C", "C", "C", "A", "A"]
.chunk_while { |i, j| i == j } # => #<Enumerator: #<Enumerator::Generator:0x007fc1998ac020>:each>
.to_a # => [["A", "A", "A"], ["B", "B"], ["A", "A", "A"], ["C", "C", "C"], ["A", "A"]]
.map { |a| "#{a.size}#{a.first}" } # => ["3A", "2B", "3A", "3C", "2A"]
.join # => "3A2B3A3C2A"
to_a is there for illustration, but isn't necessary:
str = "AAABBAAACCCAA"
str.chars
.chunk_while { |i, j| i == j }
.map { |a| "#{a.size}#{a.first}" }
.join # => "3A2B3A3C2A"
how do you get to know such methods as Array#chunk_while? I am using Ruby 2.3.1 but cannot find it in the API docs, I mean, where is the compendium list of all the methods available? certainly not here ruby-doc.org/core-2.3.1/Array.html
Well, this is off-topic to the question but it's useful information to know:
Remember that Array includes the Enumerable module, which contains chunk_while. Use the search functionality of http://ruby-doc.org to find where things live. Also, get familiar with using ri at the command line, and try running gem server at the command-line to get the help for all the gems you've installed.
If you look at the Array documentation page, on the left you can see that Array has a parent class of Object, so it'll have the methods from Object, and that it also inherits from Enumerable, so it'll also pull in whatever is implemented in Enumerable.
You only get the count of the matched symbol repetitions that occur first. You need to perform a replacement within a gsub and pass the match object to a block where you can perform the necessary manipulations:
def encode(input)
input.gsub(/(.)\1*/) { |m| m.length.to_s << m[0] }
end
See the online Ruby test.
Results:
"AAABBBCCCDDD" => 3A3B3C3D
"AAABBBCCCAAA" => 3A3B3C3A
"AAABBBCCAAA" => 3A3B2C3A
"AAABBBCCAAAA" => 3A3B2C4A
"WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWB" => 12W1B12W3B24W1B

How to extract each individual combination from a flat_map?

I'm fairly new to ruby and it's my first question here on stackoverflow so pardon me if I'm being a complete noob.
The code which i am working with contains this line -
puts (6..6).flat_map{|n| ('a'..'z').to_a.combination(n).map(&:join)}
What the code does is that its starts printing each of the combinations starting from "abcdef" and continues till the end (which i have never seen as it has 26^6 combinations).
Of course having an array of that size (26^6) is unimaginable hence I was wondering if there is any way by which i can get next combination in a variable, work with it, and then continue on to the next combination ?
For example I calculate the first combination as "abcdef" and store it in a variable 'combo' and use that variable somewhere and then the next combination is calculated and "abcdeg" is stored in 'combo' and hence the loop continues ?
Thanks
(6..6).flat_map { |n| ... } doesn't do much. Your code is equivalent to:
puts ('a'..'z').to_a.combination(6).map(&:join)
To process the values one by one, you can pass a block to combination:
('a'..'z').to_a.combination(6) do |combo|
puts combo.join
end
If no block is given, combination returns an Enumerator that can be iterated by calling next:
enum = ('a'..'z').to_a.combination(6)
#=> #<Enumerator: ["a", "b", "c", ..., "w", "x", "y", "z"]:combination(6)>
enum.next
#=> ["a", "b", "c", "d", "e", "f"]
enum.next
#=> ["a", "b", "c", "d", "e", "g"]
enum.next
#=> ["a", "b", "c", "d", "e", "h"]
Note that ('a'..'z').to_a.combination(6) will "only" yield 230,230 combinations:
('a'..'z').to_a.combination(6).size
#=> 230230
As opposed to 26 ^ 6 = 308,915,776. You are probably looking for repeated_permutation:
('a'..'z').to_a.repeated_permutation(6).size
#=> 308915776
Another way to iterate from "aaaaaa" to "zzzzzz" is a simple range:
('aaaaaa'..'zzzzzz').each do |combo|
puts combo
end
Or manually by calling String#succ: (this is what Range#each does under the hood)
'aaaaaa'.succ #=> "aaaaab"
'aaaaab'.succ #=> "aaaaac"
'aaaaaz'.succ #=> "aaaaba"

How to grab all values in a hash without specifying individual values in Ruby?

This is a add on for a question I asked yesterday but felt it warranted a new question.
I am taking a JSON response and want to extract all the values per iteration and put them into an array
#response = { "0"=>{"forename_1"=>"John", "surname_1"=>"Smith", forename_2"=>"Josephine", "surname_2"=>"Bradley", "middle_1"=>""},
"1"=>{"forename_1"=>"Chris", "surname_1"=>"Jenkins", forename_2"=>"Christine", "surname_2"=>"Sugar", "middle_1"=>""},
"2"=>{"forename_1"=>"Billy", "surname_1"=>"Bob", forename_2"=>"Brenda", "surname_2"=>"Goodyear", "middle_1"=>""},
"Status" => 100
}
At present this method takes specific values that I want and puts them into the array I want.
col = #response.values.grep(Hash).map { |h| "#{h['forename_1']} #{h['surname_1']} #{h['forename_2']} #{h['surname_2']} #{h['middle_1']}" }
Is there a way however to say grab ALL the values and place them into an array (I have a response where over 25 key/value pairs are returned).
At the moment if middle_1 has no value then a " " gets put into the array, ideally I would like to remove these.
Ideally I would like my newly formed array to look like
["John Smith Josephine Bradley", "Chris Jenkins Christine Sugar", "Billy Bob Brenda Goodyear"]
Even though no middle_1 is supplied there is are no double spaces in the array. I would like to learn how to tackle this.
Maybe will provide example of "cracking" the hash and extracting what you would need:
h = {a1: "a", b2: "b", c3: "", d4: nil, e5: "e"}
values = h.values.map(&:to_s).reject(&:empty?)
# => ["a", "b", "e"]
values.join(" ")
# => "a b e"
Let's consider the h.values.map(&:to_s).reject(&:empty?):
values = h.values
# => ["a", "b", "", nil, "e"]
values = values.map(&:to_s)
# => ["a", "b", "", "" "e"]
values = values.reject(&:empty?)
# => ["a", "b", "e"]
Hope that gives you some idea how you can proceed.
Good luck!
UPDATE
For provided hash you can quite easily reuse what I have described above like:
col = #response.values
.grep(Hash)
.map { |h| h.values.map(&:to_s).reject(&:empty?).join(" ") }
p col
# => ["John Smith Josephine Bradley", "Chris Jenkins Christine Sugar", "Billy Bob Brenda Goodyear"]

Format data in string to array?

I need to convert data from a string to an array. The string looks like this:
{a,b,c{1,2,3},d,e,f{11,22,33},g}
The array that I want to receive should look like this:
[a, b, c1, c2, c3, d, e, f11, f22, f33, g]
I tried to use the split method but it works poorly.
arr = str.split(' ');
keys = arr[0][2..-2]
keys = keys.split(',')
Do you have any ideas how it could be implemented?
Here's what I'd use:
string = '{a,b,c{1,2,3},d,e,f{11,22,33},g}'
array = string.scan(/[a-z](?:{.+?})?/).flat_map{ |s|
if s['{']
prefix = s[0]
values = s.scan(/\d+/)
([prefix] * values.size).zip(values).map(&:join)
else
s
end
}
array # => ["a", "b", "c1", "c2", "c3", "d", "e", "f11", "f22", "f33", "g"]
Here's how it works:
string.scan(/[a-z](?:{.+?})?/) # => ["a", "b", "c{1,2,3}", "d", "e", "f{11,22,33}", "g"]
returns the string broken into chunks, looking for a single letter followed by an optional string of { with some text then }.
values = s.scan(/\d+/) # => ["1", "2", "3"], ["11", "22", "33"]
As it's running in flat_map, if { is found, the numbers are scanned out.
([prefix] * values.size).zip(values).map(&:join) # => ["c1", "c2", "c3"], ["f11", "f22", "f33"]
And then an array of the prefix, with the same number of elements as there are values is created and zipped together, resulting in:
[["c", "1"], ["c", "2"], ["c", "3"]], [["f", "11"], ["f", "22"], ["f", "33"]]
The join glues those sub-arrays together. And flat_map flattens any subarrays created so the resulting output is a single array.
You need to arr = str.split(',') in the first step, because there is no whitespace between the values.
Also keep in mind you have {} to handle too.
This worked for me with simple regex and gsubing (though Tin Man's solution is better ruby):
def my_string_to_array(input_string)
groups = input_string.scan(/\w+\{.*?\}/)
groups.each do |group|
modified = group.gsub(',', ",#{group.match(/\w+/)[0]}").delete("{}")
input_string.gsub!(group, modified)
end
created_array = input_string.delete("{}").split(',')
end
string = '{a,b,c{1,2,3},d,e,f{11,22,33},g}'
my_string_to_array(string)
=> ["a", "b", "c1", "c2", "c3", "d", "e", "f11", "f22", "f33", "g"]
The way it works is that it first finds the groups having alphabets followed by braces and digits (like c{1,2,3})
For each such group, it modifies it by gsubing ',' with ',<alphabet>' and removing the braces.
Next, it replaces these groups with the modified ones in the original string.
And finally it removes the starting and ending braces in the original string, and converts it into an array.

Enumerator `Array#each` 's {block} can't always change array values?

Ok maybe this is simple but...
given this:
arr = ("a".."z").to_a
arr
=> ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"]
..and that I'm trying to change all "arr" values to "bad"
why isn't this working ?
arr.each { |v| v = "bad" }
arr
=> ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"]
Answers suggested that "v" is a local variable to the block (a "copy" of the array value) and I fully understand that (and never puzzled me before) but then
.. why it is working if array elements are objects ?
class Person
def initialize
#age = 0
end
attr_accessor :age
end
kid = Person.new
man = Person.new
arr = [kid, man]
arr.each { |p| p.age = 50 }
arr[0]
=> #<Person:0xf98298 #age=50>
isn't here "p" still local to the block here?
but then it really affects the objects, how come ?
I'll expand upon #pst's comment:
why isn't this working ?
arr.each { |v| v = "bad" }
Because each iterates through the array and puts each item into the block you've given as a local variable v, as v is not a reference to the array arr.
new_arr = arr.each { |v| v = "bad" }
each does not give back an array, for that you would use map (see #benjaminbenben's answer). Therefore assigning it does not "work".
arr.each { |v| arr[arr.index v] = "bad" }
Here you put each item in arr into the local variable v, but you've also referred to the array itself in the block, hence you are able to assign to the array and use the local variable v to find an index that corresponds to the contents of v (but you may find this wouldn't work as you expect when the items are not all unique).
arr.each { |p| p.age = 50 }
kid.age #-> 50
Here, again you've filled the local variable p with each item/object in arr, but then you've accessed each item via a method, so you are able to change that item - you are not changing the array. It's different because the reference is to the contents of the local variable, which you've mixed up with being a reference to the array. They are separate things.
In response to the comment below:
arr[0]
# => #<Person:0xf98298 #age=50>
It's all about who's referring to whom when.
Try this:
v = Person.new
# => #<Person:0x000001008de248 #age=0>
w = Person.new
# => #<Person:0x000001008d8050 #age=0>
x = v
# => #<Person:0x000001008de248 #age=0>
v = Person.new
# => #<Person:0x00000100877e80 #age=0>
arr = [v,w,x]
# => [#<Person:0x00000100877e80 #age=0>, #<Person:0x000001008d8050 #age=0>, #<Person:0x000001008de248 #age=0>]
v referred to 2 different objects there. v is not a fixed thing, it's a name. At first it refers to #<Person:0x000001008de248 #age=0>, then it refers to #<Person:0x00000100877e80 #age=0>.
Now try this:
arr.each { |v| v = "bad" }
# => [#<Person:0x00000100877e80 #age=0>, #<Person:0x000001008d8050 #age=0>, #<Person:0x000001008de248 #age=0>]
They are all objects but nothing was updated or "worked". Why? Because when the block is first entered, v refers to the item in the array that was yielded (given). So on first iteration v is #<Person:0x00000100877e80 #age=0>.
But, we then assign "bad" to v. We are not assigning "bad" to the first index of the array because we aren't referencing the array at all. arr is the reference to the array. Put arr inside the block and you can alter it:
arr.each { |v|
arr[0] = "bad" # yes, a bad idea!
}
Why then does arr.each { |p| p.age = 50 } update the items in the array? Because p refers to the objects that also happen to be in the array. On first iteration p refers to the object also known as kid, and kid has an age= method and you stick 50 in it. kid is also the first item in the array, but you're talking about kid not the array. You could do this:
arr.each { |p| p = "bad"; p.age }
NoMethodError: undefined method `age' for "bad":String
At first, p referred to the object that also happened to be in the array (that's where it was yielded from), but then p was made to refer to "bad".
each iterates over the array and yields a value on each iteration. You only get the value not the array. If you want to update an array you either do:
new_arr = arr.map{|v| v = "bad" }
new_arr = arr.map{|v| "bad" } # same thing
or
arr.map!{|v| v = "bad"}
arr.map!{|v| "bad"} # same thing
as map returns an array filled with the return value of the block. map! will update the reference you called it on with an array filled with the return value of the block. Generally, it's a bad idea to update an object when iterating over it anyway. I find it's always better to think of it as creating a new array, and then you can use the ! methods as a shortcut.
In example
arr.each { |v| v = "bad" }
"v" is just reference to string, when you do v = "bad", you reassign local variable. To make everything bad you can do like that:
arr.each { |v| v.replace "bad" }
Next time you can play with Object#object_id
puts arr[0].object_id #will be save as object_id in first iteration bellow
arr.each { |v| puts v.object_id }
You might be looking for .map - which returns a new array with the the return value of the block for each element.
arr.map { "bad" }
=> ["bad", "bad", "bad", "bad", …]
using .map! will alter the contents of the original array rather than return a new one.
How about this
arry = Array.new(arry.length,"bad")
This will set the a default value of "bad" to the arry.length

Resources