Ruby : Split string on boundaries - ruby

I have a string of arbitrary characters, some of which are digits. I would like to break the strings into fields consisting of digits and non-digits. For example, if my string has the value 'abc34d-f9', I would like to get an array
['abc','34','d-f','9']
I'm nearly there, using look-behind and look-ahead expressions:
s.split(/( (?<=\D)(?=\d) | (?<=\d)(?=\D) )/x)
This splits on transitions between boundaries digit->nondigit and vice versa. However, I also get empty elements, i.e. this would return
['abc','','34','','d-f','','9']
Of course it is trivial to filter out the nullstrings from the array. I just wonder: Why do I get them, and how can I do it better?

Use string.scan function to return an array of matched strings.
> 'abc34d-f9'.scan(/\D+|\d+/)
=> ["abc", "34", "d-f", "9"]
\D+ matches one or more non-digit characters where \d+ matches one or more digit characters.
Your regex also works fine if you remove the capturing group. Because capturing group would also return the delimiter(boundary on which the input string was splitted) to the final output.
> 'abc34d-f9'.split(/(?<=\D)(?=\d)|(?<=\d)(?=\D)/)
=> ["abc", "34", "d-f", "9"]
> 'abc34d-f9'.split(/ (?<=\D)(?=\d) | (?<=\d)(?=\D) /x)
=> ["abc", "34", "d-f", "9"]

Though I prefer #AvinashRaj's solution, it's always fun (and sometimes often instructive) to try to find other ways:
str = 'abc34d-f9'
a = str.split(/(\d+)/) #=> ["abc", "34", "d-f", "9"]
a.shift if a.first.empty? #=> nil
a #=> ["abc", "34", "d-f", "9"]
a = str.split(/(\D+)/) #=> ["", "abc", "34", "d-f", "9"]
a.shift if a.first.empty? #=> ""
a #=> ["abc", "34", "d-f", "9"]
str.each_char.chunk { |c| !!(c =~ /\d/) }.map { |_,a| a.join }
#=> ["abc", "34", "d-f", "9"]
str[1..-1].each_char.with_object([str[0]]) { |c,a|
((c + a.last[0]) =~ /\d{2}|\D{2}/) ? a.last << c : a << c }
#=> ["abc", "34", "d-f", "9"]
(Ruby 2.2)
str.each_char.slice_when { |a,b| (a+b) =~ /\d\D|\D\d/ }.map(&:join)
#=> ["abc", "34", "d-f", "9"]

Related

Ruby hash with multiple comma separated values to array of hashes with same keys

What is the most efficient and pretty way to map this:
{name:"cheese,test", uid:"1,2"}
to this:
[ {name:"cheese", uid:"1"}, {name:"test", uid:"2"} ]
should work dinamically for example with: { name:"cheese,test,third", uid:"1,2,3" } or {name:"cheese,test,third,fourth", uid:"1,2,3,4", age:"9,8,7,6" }
Finally I made this:
hash = {name:"cheese,test", uid:"1,2"}
results = []
length = hash.values.first.split(',').length
length.times do |i|
results << hash.map {|k,v| [k, v.split(',')[i]]}
end
results.map{|e| e.to_h}
It is working, but i am not pleased with it, has to be a cleaner and more 'rubyst' way to do this
def splithash(h)
# Transform each element in the Hash...
h.map do |k, v|
# ...by splitting the values on commas...
v.split(',').map do |vv|
# ...and turning these into individual { k => v } entries.
{ k => vv }
end
end.inject do |a,b|
# Then combine these by "zip" combining each list A to each list B...
a.zip(b)
# ...which will require a subsequent .flatten to eliminate nesting
# [ [ 1, 2 ], 3 ] -> [ 1, 2, 3 ]
end.map(&:flatten).map do |s|
# Then combine all of these { k => v } hashes into one containing
# all the keys with associated values.
s.inject(&:merge)
end
end
Which can be used like this:
splithash(name:"cheese,test", uid:"1,2", example:"a,b")
# => [{:name=>"cheese", :uid=>"1", :example=>"a"}, {:name=>"test", :uid=>"2", :example=>"b"}]
It looks a lot more convoluted at first glance, but this handles any number of keys.
I would likely use transpose and zip like so:
hash = {name:"cheese,test,third,fourth", uid:"1,2,3,4", age:"9,8,7,6" }
hash.values.map{|x| x.split(",")}.transpose.map{|v| hash.keys.zip(v).to_h}
#=> [{:name=>"cheese", :uid=>"1", :age=>"9"}, {:name=>"test", :uid=>"2", :age=>"8"}, {:name=>"third", :uid=>"3", :age=>"7"}, {:name=>"fourth", :uid=>"4", :age=>"6"}]
To break it down a bit (code slightly modified for operational clarity):
hash.values
#=> ["cheese,test,third,fourth", "1,2,3,4", "9,8,7,6"]
.map{|x| x.split(",")}
#=> [["cheese", "test", "third", "fourth"], ["1", "2", "3", "4"], ["9", "8", "7", "6"]]
.transpose
#=> [["cheese", "1", "9"], ["test", "2", "8"], ["third", "3", "7"], ["fourth", "4", "6"]]
.map do |v|
hash.keys #=> [[:name, :uid, :age], [:name, :uid, :age], [:name, :uid, :age], [:name, :uid, :age]]
.zip(v) #=> [[[:name, "cheese"], [:uid, "1"], [:age, "9"]], [[:name, "test"], [:uid, "2"], [:age, "8"]], [[:name, "third"], [:uid, "3"], [:age, "7"]], [[:name, "fourth"], [:uid, "4"], [:age, "6"]]]
.to_h #=> [{:name=>"cheese", :uid=>"1", :age=>"9"}, {:name=>"test", :uid=>"2", :age=>"8"}, {:name=>"third", :uid=>"3", :age=>"7"}, {:name=>"fourth", :uid=>"4", :age=>"6"}]
end
Input
hash={name:"cheese,test,third,fourth", uid:"1,2,3,4", age:"9,8,7,6" }
Code
p hash
.transform_values { |v| v.split(',') }
.map { |k, v_arr| v_arr.map { |v| [k, v] }
}
.transpose
.map { |array| array.to_h }
Output
[{:name=>"cheese", :uid=>"1", :age=>"9"}, {:name=>"test", :uid=>"2", :age=>"8"}, {:name=>"third", :uid=>"3", :age=>"7"}, {:name=>"fourth", :uid=>"4", :age=>"6"}]
We are given
h = { name: "cheese,test", uid: "1,2" }
Here are two ways to create the desired array. Neither construct arrays that are then converted to hashes.
#1
First compute
g = h.transform_values { |s| s.split(',') }
#=> {:name=>["cheese", "test"], :uid=>["1", "2"]}
then compute
g.first.last.size.times.map { |i| g.transform_values { |v| v[i] } }
#=> [{:name=>"cheese", :uid=>"1"}, {:name=>"test", :uid=>"2"}]
Note
a = g.first
#=> [:name, ["cheese", "test"]]
b = a.last
#=> ["cheese", "test"]
b.size
#=> 2
#2
This approach does not convert the values of the hash to arrays.
(h.first.last.count(',')+1).times.map do |i|
h.transform_values { |s| s[/(?:\w+,){#{i}}\K\w+/] }
end
#=> [{:name=>"cheese", :uid=>"1"}, {:name=>"test", :uid=>"2"}]
We have
a = h.first
#=> [:name, "cheese,test"]
s = a.last
#=> "cheese,test"
s.count(',')+1
#=> 2
We can express the regular expression in free-spacing mode to make it self-documenting.
/
(?: # begin a non-capture group
\w+, # match one or more word characters followed by a comma
) # end the non-capture group
{#{i}} # execute the preceding non-capture group i times
\K # discard all matches so far and reset the start of the match
\w+ # match one or more word characters
/x # invoke free-spacing regex definition mode

Convert range to pattern

I have several ranges of numbers and I wonder if there are any algorithms to convert these ranges to patterns, like this:
Range: 5710000-5716999
Patterns: 5710*, 5711*, 5712*, 5713*, 5714*, 5715*, 5716*
Range: 5003070-5003089
Patterns: 500307*, 500308*
Range: 7238908-7238909
Patterns: 7238908*, 7238909*
I'm using Ruby, if it matters.
UPDATE 1:
More examples:
Range: 1668659-1668671
Patterns: 1668659*, 166866*, 1668670*, 1668671*
Range: 9505334305-9505334472
Patterns: 9505334305*, 9505334306*, 9505334307*, 9505334308*, 9505334309*, 950533431*, 950533432*, ..., 950533446*, 9505334470*, 9505334471*, 9505334472*
def doit(range)
b, e = range.begin.to_s, range.end.to_s
idx = b.chars.zip(e.chars).index { |a,b| a!=b }
return "#{b}*" if idx.nil?
(b[idx]..e[idx]).map { |c| b[0,idx] + c + '*' }
end
doit(5710000..5716999)
#=> ["5710*", "5711*", "5712*", "5713*", "5714*", "5715*", "5716*"]
doit(5003070..5003089)
#=> ["500307*", "500308*"]
doit(7238908..7238909)
#=> ["7238908*", "7238909*"]
doit(123..123)
#=> "123*"
The steps are as follows.
range = 5003070..5003089
b, e = range.begin.to_s, range.end.to_s
#=> ["5003070", "5003089"]
b #=> "5003070"
e #=> "5003089"
ab = b.chars
#=> ["5", "0", "0", "3", "0", "7", "0"]
ae = e.chars
#=> ["5", "0", "0", "3", "0", "8", "9"]
c = ab.zip(ae)
#=> [["5", "5"], ["0", "0"], ["0", "0"], ["3", "3"],
# ["0", "0"], ["7", "8"], ["0", "9"]]
idx = c.index { |a,b| a!=b }
#=> 5
return "#{b}*" if idx.nil?
#=> return "5003070*" if 5.nil?
r = b[idx]..e[idx]
#=> "7".."8"
r.map { |c| b[0,idx] + c + '*' }
#=> ["500307*", "500308*"]
It seems, I figured out how to make such converting using group_by method on the full range. I suppose, this algorithm kinda slow and inefficient, but it's very straightforward and works fine, so I'll stick to it.
def range_to_pattern(range)
values = range.to_a
while values.group_by{ |x| x.to_s[0...-1] }.any? { |_, v| v.size == 10 }
patterns = []
values.group_by{ |x| x.to_s[0...-1] }.each_pair{ |k, v| v.size == 10 ? patterns << k.to_i : patterns += v }
values = patterns
end
values
end
Results:
irb(main):072:0> range_to_pattern(5710000..5716999)
=> [5710, 5711, 5712, 5713, 5714, 5715, 5716]
irb(main):073:0> range_to_pattern(5003070..5003089)
=> [500307, 500308]
irb(main):074:0> range_to_pattern(7238908..7238909)
=> [7238908, 7238909]
irb(main):075:0> range_to_pattern(1668659..1668671)
=> [1668659, 166866, 1668670, 1668671]
irb(main):076:0> range_to_pattern(9505334305..9505334472)
=> [9505334305, 9505334306, 9505334307, 9505334308, 9505334309, 950533431, 950533432, 950533433, 950533434, 950533435, 950533436, 950533437, 950533438, 950533439, 950533440, 950533441, 950533442, 950533443, 950533444, 950533445, 950533446, 9505334470, 9505334471, 9505334472]

Find few adjacent identical numbers

Could you help me?
I need a regex that splits strings like
"11231114"
to
['11', '2', '3', '111', '4']
You could implement String#scan as follows:
"11231114".scan(/((\d)\2*)/).map(&:first)
#=> ["11", "2", "3", "111", "4"]
You could pass a block to String#scan pushing the match group to an array.
matches = []
"11231114".scan(/((\d)\2*)/) do |n,r| matches << n end
In Javascript you can do:
var m = "11231114".match(/(\d)\1*/g)
//=> ["11", "2", "3", "111", "4"]
You can use similar approach in whatever language/tool you're using.
Approach is to capture a digit using (\d) and then match all the back-references for the same using \1*.
You could do something like this,
> str = "11231114"
=> "11231114"
> str1 = str.gsub(/(?<=(\d))(?!\1)/, "*")
=> "11*2*3*111*4*"
> str1.split('*')
=> ["11", "2", "3", "111", "4"]
There is slice_when in Ruby 2.2:
"11231114".chars.slice_when { |x, y| x != y }.map(&:join)

Ruby search for word in string

Given input = "helloworld"
The output should be output = ["hello", "world"]
Given I have a method called is_in_dict? which returns true if there's a word given
So far i tried:
ar = []
input.split("").each do |f|
ar << f if is_in_dict? f
// here need to check given char
end
How to achieve it in Ruby?
Instead of splitting the input into characters, you have to inspect all combinations, i.e. "h", "he", "hel", ... "helloworld", "e", "el" , "ell", ... "elloworld" and so on.
Something like this should work:
(0..input.size).to_a.combination(2).each do |a, b|
word = input[a...b]
ar << word if is_in_dict?(word)
end
#=> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
ar
#=> ["hello", "world"]
Or, using each_with_object, which returns the array:
(0..input.size).to_a.combination(2).each_with_object([]) do |(a, b), array|
word = input[a...b]
array << word if is_in_dict?(word)
end
#=> ["hello", "world"]
Another approach is to build a custom Enumerator:
class String
def each_combination
return to_enum(:each_combination) unless block_given?
(0..size).to_a.combination(2).each do |a, b|
yield self[a...b]
end
end
end
String#each_combination yields all combinations (instead of just the indices):
input.each_combination.to_a
#=> ["h", "he", "hel", "hell", "hello", "hellow", "hellowo", "hellowor", "helloworl", "helloworld", "e", "el", "ell", "ello", "ellow", "ellowo", "ellowor", "elloworl", "elloworld", "l", "ll", "llo", "llow", "llowo", "llowor", "lloworl", "lloworld", "l", "lo", "low", "lowo", "lowor", "loworl", "loworld", "o", "ow", "owo", "owor", "oworl", "oworld", "w", "wo", "wor", "worl", "world", "o", "or", "orl", "orld", "r", "rl", "rld", "l", "ld", "d"]
It can be used with select to easily filter specific words:
input.each_combination.select { |word| is_in_dict?(word) }
#=> ["hello", "world"]
This seems to be a task for recursion. In short you want to take letters one by one until you get a word which is in dictionary. This however will not guarantee that the result is correct, as the remaining letters may not form a words ('hell' + 'oworld'?). This is what I would do:
def split_words(string)
return [[]] if string == ''
chars = string.chars
word = ''
(1..string.length).map do
word += chars.shift
next unless is_in_dict?(word)
other_splits = split_words(chars.join)
next if other_splits.empty?
other_splits.map {|split| [word] + split }
end.compact.inject([], :+)
end
split_words('helloworld') #=> [['hello', 'world']] No hell!
It will also give you all possible splits, so pages with urls like penisland can be avoided
split_words('penisland') #=> [['pen', 'island'], [<the_other_solution>]]

How to combination/permutation in ruby?

I've this familiar question that looks like permutation/combination of the Math world.
How can I achieve the following via ruby?
badges = "1-2-3"
badge_cascade = []
badges.split("-").each do |b|
badge_cascade << b
end
Gives: => ["1", "2", "3"]
But I want it to be is:
=> ["1", "2", "3",
"1-2", "2-3", "3-1", "2-1", "3-2", "1-3",
"1-2-3", "2-3-1", "3-1-2"]
Functional approach:
bs = "1-2-3".split("-")
strings = 1.upto(bs.size).flat_map do |n|
bs.permutation(n).map { |vs| vs.join("-") }
end
#=> ["1", "2", "3", "1-2", "1-3", "2-1", "2-3", "3-1", "3-2", "1-2-3", "1-3-2", "2-1-3", "2-3-1", "3-1-2", "3-2-1"]
You ned to use Array#permutation method in order to get all permutations:
arr = "1-2-3".split '-' # => ["1", "2", "3"]
res = (1..arr.length).reduce([]) { |res, length|
res += arr.permutation(length).to_a
}.map {|arr| arr.join('-')}
puts res.inspect
# => ["1", "2", "3", "1-2", "1-3", "2-1", "2-3", "3-1", "3-2", "1-2-3", "1-3-2", "2-1-3", "2-3-1", "3-1-2", "3-2-1"]
Let me explain the code:
You split string into array passing separator '-' to String#split method
You need all permutations of length 1, 2, 3. Range 1..arr.length represents all these lengths.
You collect an array of all permutations using Enumerable#reduce.
You will get array of arrays here:
[["1"], ["2"], ["3"], ["1", "2"], ["1", "3"], ["2", "1"], ["2", "3"], ["3", "1"], ["3", "2"], ["1", "2", "3"], ["1", "3", "2"], ["2", "1", "3"], ["2", "3", "1"], ["3", "1", "2"], ["3", "2", "1"]]
You transform all subarrays of this array into strings using Array#join with your '-' separator inside of Enumerable#map
Array#permutation(n) will give you all the permutations of length n as an Array of Arrays so you can call this with each length between 1 and the number of digits in badges. The final step is to map these all back into strings delimited with -.
badges = "1-2-3"
badges_split = badges.split('-')
permutations = []
(1..badges_split.size).each do |n|
permutations += badges_split.permutation(n).to_a
end
result = permutations.map { |permutation| permutation.join('-') }
Update: I think Alex's use of reduce is a more elegant approach but I'll leave this answer here for now in case it is useful.

Resources