How to "sum" enumerables in Ruby - ruby

Is it possible to "sum" diverse enumerables when they are string mode?
per example like this? (well, I know this doesn't work.)
(( 'a'..'z') + ('A'..'Z')).to_a
note:
I am asking about getting an array of string chars from a to z and from A to Z all together.
About string mode I mean that the chars will appears like ["a", "b", ..... , "Y", "Z"]

You can use the splat operator:
[*('A'..'Z'), *( 'a'..'z')]

Like this?
[('a'..'z'), ('A'..'Z')].map(&:to_a).flatten
Or this?
('a'..'z').to_a + ('A'..'Z').to_a

Not answer but benchmarking of answers:
require 'benchmark'
n = 100000
Benchmark.bm do |x|
x.report("flat_map : ") { n.times do ; [('A'..'Z'), ('a'..'z')].flat_map(&:to_a) ; end }
x.report("map.flatten: ") { n.times do ; [('A'..'Z'), ('a'..'z')].map(&:to_a).flatten ; end }
x.report("splat : ") { n.times do ; [*('A'..'Z'), *( 'a'..'z')] ; end }
x.report("concat arr : ") { n.times do ; ('A'..'Z').to_a + ('a'..'z').to_a ; end }
end
Result:
#=> user system total real
#=> flat_map : 0.858000 0.000000 0.858000 ( 0.883630)
#=> map.flatten: 1.170000 0.016000 1.186000 ( 1.200421)
#=> splat : 0.858000 0.000000 0.858000 ( 0.857728)
#=> concat arr : 0.812000 0.000000 0.812000 ( 0.822861)

Since you want the elements from the first Range to be at the end of the output Array and the elements of the last Range to be at the beginning of the output Array, but still keep the same order within each Range, I would do it like this (which also generalizes nicely to more than two Enumerables):
def backwards_concat(*enums)
enums.reverse.map(&:to_a).inject([], &:concat)
end
backwards_concat('A'..'Z', 'a'..'z')

['a'..'z'].concat(['A'..'Z'])
This is probably the quickest way to do this.

About string mode I mean that the chars will appears like ["a", "b", ..... , "Y", "Z"]
To answer the above:
Array('a'..'z').concat Array('A'..'Z')

Related

How can I improve the performance of this small Ruby function?

I am currently doing a Ruby challenge and get the error Terminated due to timeout
for some testcases where the string input is very long (10.000+ characters).
How can I improve my code?
Ruby challenge description
You are given a string containing characters A and B only. Your task is to change it into a string such that there are no matching adjacent characters. To do this, you are allowed to delete zero or more characters in the string.
Your task is to find the minimum number of required deletions.
For example, given the string s = AABAAB, remove A an at positions 0 and 3 to make s = ABAB in 2 deletions.
My function
def alternatingCharacters(s)
counter = 0
s.chars.each_with_index { |char, idx| counter += 1 if s.chars[idx + 1] == char }
return counter
end
Thank you!
This could be faster returning the count:
str.size - str.chars.chunk_while{ |a, b| a == b }.to_a.size
The second part uses String#chars method in conjunction with Enumerable#chunk_while.
This way the second part groups in subarrays:
'aababbabbaab'.chars.chunk_while{ |a, b| a == b}.to_a
#=> [["a", "a"], ["b"], ["a"], ["b", "b"], ["a"], ["b", "b"], ["a", "a"], ["b"]]
Trivial if you can use squeeze:
str.length - str.squeeze.length
Otherwise, you could try a regular expression that matches those A (or B) that are preceded by another A (or B):
str.enum_for(:scan, /(?<=A)A|(?<=B)B/).count
Using enum_for avoids the creation of the intermediate array.
The main issue with:
s.chars.each_with_index { |char, idx| counter += 1 if s.chars[idx + 1] == char }
Is the fact that you don't save chars into a variable. s.chars will rip apart the string into an array of characters. The first s.chars call outside the loop is fine. However there is no reason to do this for each character in s. This means if you have a string of 10.000 characters, you'll instantiate 10.001 arrays of size 10.000.
Re-using the characters array will give you a huge performance boost:
require 'benchmark'
s = ''
options = %w[A B]
10_000.times { s << options.sample }
Benchmark.bm do |x|
x.report do
counter = 0
s.chars.each_with_index { |char, idx| counter += 1 if s.chars[idx + 1] == char }
# create a character array for each iteration ^
end
x.report do
counter = 0
chars = s.chars # <- only create a character array once
chars.each_with_index { |char, idx| counter += 1 if chars[idx + 1] == char }
end
end
user system total real
8.279767 0.000001 8.279768 ( 8.279655)
0.002188 0.000003 0.002191 ( 0.002191)
You could also make use of enumerator methods like each_cons and count to simplify the code, this doesn't increase performance cost a lot, but makes the code a lot more readable.
Benchmark.bm do |x|
x.report do
counter = 0
chars = s.chars
chars.each_with_index { |char, idx| counter += 1 if chars[idx + 1] == char }
end
x.report do
s.each_char.each_cons(2).count { |a, b| a == b }
# ^ using each_char instead of chars to avoid
# instantiating a character array
end
end
user system total real
0.002923 0.000000 0.002923 ( 0.002920)
0.003995 0.000000 0.003995 ( 0.003994)

Sorting array of string by numbers

I want to sort an array like the following:
["10a","10b","9a","9b","8a","8b"]
When I call,
a = a.sort {|a,b| a <=> b}
it will sort like the following:
["10a","10b","8a","8b","9a","9b"]
The 10 is a string and is not handled as a number. When I first sort by integer and then by string, it will just do the same. Does anybody know how I can handle the 10 as a 10 without making it into an integer? That would mess up the letters a, b etc.
When I first sort by integer and then by string, it will just do the same.
That would have been my first instinct, and it seems to work perfectly:
%w[10a 10b 9a 9b 8a 8b].sort_by {|el| [el.to_i, el] }
# => ['8a', '8b', '9a', '9b', '10a', '10b']
I'd do something like this:
ary = ["10a","10b","9a","9b","8a","8b"]
sorted_ary = ary.sort_by{ |e|
/(?<digit>\d+)(?<alpha>\D+)/ =~ e
[digit.to_i, alpha]
}
ary # => ["10a", "10b", "9a", "9b", "8a", "8b"]
sorted_ary # => ["8a", "8b", "9a", "9b", "10a", "10b"]
sorted_by is going to be faster than sort for this sort of problem. Because the value being sorted isn't a direct comparison and we need to dig into it to get the values to use for collation, a normal sort would have to do it multiple times for each element. Instead, using sort_by caches the computed value, and then sorts based on it.
/(?<digit>\d+)(?<alpha>\D+)/ =~ e isn't what you'll normally see for a regular expression. The named-captures ?<digit> and ?<alpha> define the names of local variables that can be accessed immediately, when used in that form.
[digit.to_i, alpha] returns an array consisting of the leading numeric convert to an integer, followed by the character. That array is then used for comparison by sort_by.
Benchmarking sort vs. sort_by using Fruity: I added some length to the array being sorted to push the routines a bit harder for better time resolution.
require 'fruity'
ARY = (%w[10a 10b 9a 9b 8a 8b] * 1000).shuffle
compare do
cary_to_i_sort_by { ARY.sort_by { |s| s.to_i(36) } }
cary_to_i_sort { ARY.map { |s| s.to_i(36) }.sort.map {|i| i.to_s(36)} }
end
compare do
jorge_sort_by { ARY.sort_by {|el| [el.to_i, el] } }
jorg_sort { ARY.map {|el| [el.to_i, el] }.sort.map(&:last) }
end
# >> Running each test 2 times. Test will take about 1 second.
# >> cary_to_i_sort_by is faster than cary_to_i_sort by 19.999999999999996% ± 1.0%
# >> Running each test once. Test will take about 1 second.
# >> jorge_sort_by is faster than jorg_sort by 10.000000000000009% ± 1.0%
Ruby's sort_by uses a Schwartzian Transform, which can make a major difference in sort speed when dealing with objects where we have to compute the value to be sorted.
Could you run your benchmark for 100_000 instead of 1_000 in the definition of ARY?
require 'fruity'
ARY = (%w[10a 10b 9a 9b 8a 8b] * 100_000).shuffle
compare do
cary_to_i_sort_by { ARY.sort_by { |s| s.to_i(36) } }
cary_to_i_sort { ARY.map { |s| s.to_i(36) }.sort.map {|i| i.to_s(36)} }
end
compare do
jorge_sort_by { ARY.sort_by {|el| [el.to_i, el] } }
jorg_sort { ARY.map {|el| [el.to_i, el] }.sort.map(&:last) }
end
# >> Running each test once. Test will take about 10 seconds.
# >> cary_to_i_sort_by is faster than cary_to_i_sort by 2x ± 1.0
# >> Running each test once. Test will take about 26 seconds.
# >> jorg_sort is similar to jorge_sort_by
The Wikepedia article has a good efficiency analysis and example that explains why sort_by is preferred for costly comparisons.
Ruby's sort_by documentation also covers this well.
I don't think the size of the array will make much difference. If anything, as the array size grows, if the calculation for the intermediate value is costly, sort_by will still be faster because of its caching. Remember, sort_by is all compiled code, whereas using a Ruby-script-based transform is subject to slower execution as the array is transformed, handed off to sort and then the original object is plucked from the sub-arrays. A larger array means it just has to be done more times.
▶ a = ["10a","10b","9a","9b","8a","8b"]
▶ a.sort { |a,b| a.to_i == b.to_i ? a <=> b : a.to_i <=> b.to_i }
#=> [
# [0] "8a",
# [1] "8b",
# [2] "9a",
# [3] "9b",
# [4] "10a",
# [5] "10b"
#]
Hope it helps.
Two ways that don't use String#to_i (but rely on the assumption that each string consists of one or more digits followed by one lower case letter).
ary = ["10a","10b","9a","9b","8a","8b","100z", "96b"]
#1
mx = ary.map(&:size).max
ary.sort_by { |s| s.rjust(mx) }
#=> ["8a", "8b", "9a", "9b", "10a", "10b", "96b", "100z"]
#2
ary.sort_by { |s| s.to_i(36) }
#=> ["8a", "8b", "9a", "9b", "10a", "10b", "96b", "100z"]
Hmmm, I wonder if:
ary.map { |s| s.rjust(mx) }.sort.map(&:lstrip)
or
ary.map { |s| s.to_i(36) }.sort.map {|i| i.to_s(36)}
would be faster.
["10a","10b","9a","9b","8a","8b"]
.sort_by{|s| s.split(/(\D+)/).map.with_index{|s, i| i.odd? ? s : s.to_i}}
#=> ["8a", "8b", "9a", "9b", "10a", "10b"]
["10a3","10a4", "9", "9aa","9b","8a","8b"]
.sort_by{|s| s.split(/(\D+)/).map.with_index{|s, i| i.odd? ? s : s.to_i}}
#=> ["8a", "8b", "9", "9aa", "9b", "10a3", "10a4"]
Gentlemen, start your engines!
I decided to benchmark the various solutions that have been offered. One of the things I was curious about was the effect of converting sort_by solutions to sort solutions. For example, I compared my method
def cary_to_i(a)
a.sort_by { |s| s.to_i(36) }
end
to
def cary_to_i_sort(a)
a.map { |s| s.to_i(36) }.sort.map {|i| i.to_s(36)}
end
This always involves mapping the original array to the transformed values within the sort_by block, sorting that array, then mapping the results back to the elements in the original array (when that can be done).
I tried this sort_by-to-sort conversion with some of the methods that use sort_by. Not surprisingly, the conversion to sort was generally faster, though the amount of improvement varied quite a bit.
Methods compared
module Methods
def mudasobwa(a)
a.sort { |a,b| a.to_i == b.to_i ? a <=> b : a.to_i <=> b.to_i }
end
def jorg(a)
a.sort_by {|el| [el.to_i, el] }
end
def jorg_sort(a)
a.map {|el| [el.to_i, el] }.sort.map(&:last)
end
def the(a)
a.sort_by {|e| /(?<digit>\d+)(?<alpha>\D+)/ =~ e
[digit.to_i, alpha] }
end
def the_sort(a)
a.map {|e| /(?<digit>\d+)(?<alpha>\D+)/ =~ e
[digit.to_i, alpha]}.sort.map {|d,a| d.to_s+a }
end
def engineer(a) a.sort_by { |s|
s.scan(/(\d+)(\D+)/).flatten.tap{ |a| a[0] = a[0].to_i } }
end
def sawa(a) a.sort_by { |s|
s.split(/(\D+)/).map.with_index { |s, i| i.odd? ? s : s.to_i } }
end
def cary_rjust(a)
mx = a.map(&:size).max
a.sort_by {|s| s.rjust(mx)}
end
def cary_rjust_sort(a)
mx = a.map(&:size).max
a.map { |s| s.rjust(mx) }.sort.map(&:lstrip)
end
def cary_to_i(a)
a.sort_by { |s| s.to_i(36) }
end
def cary_to_i_sort(a)
a.map { |s| s.to_i(36) }.sort.map {|i| i.to_s(36)}
end
end
include Methods
methods = Methods.instance_methods(false)
#=> [:mudasobwa, :jorg, :jorg_sort, :the, :the_sort,
# :cary_rjust, :cary_rjust_sort, :cary_to_i, :cary_to_i_sort]
Test data and helper
def test_data(n)
a = 10_000.times.to_a.map(&:to_s)
b = [*'a'..'z']
n.times.map { a.sample + b.sample }
end
def compute(m,a)
send(m,a)
end
Confirm methods return the same values
a = test_data(1000)
puts "All methods correct: #{methods.map { |m| compute(m,a) }.uniq.size == 1}"
Benchmark code
require 'benchmark'
indent = methods.map { |m| m.to_s.size }.max
n = 500_000
a = test_data(n)
puts "\nSort random array of size #{n}"
Benchmark.bm(indent) do |bm|
methods.each do |m|
bm.report m.to_s do
compute(m,a)
end
end
end
Test
Sort random array of size 500000
user system total real
mudasobwa 4.760000 0.000000 4.760000 ( 4.765170)
jorg 2.870000 0.020000 2.890000 ( 2.892359)
jorg_sort 2.980000 0.020000 3.000000 ( 3.010344)
the 9.040000 0.100000 9.140000 ( 9.160944)
the_sort 4.570000 0.090000 4.660000 ( 4.668146)
engineer 10.110000 0.070000 10.180000 ( 10.198117)
sawa 27.310000 0.160000 27.470000 ( 27.504958)
cary_rjust 1.080000 0.010000 1.090000 ( 1.087788)
cary_rjust_sort 0.740000 0.000000 0.740000 ( 0.746132)
cary_to_i 0.570000 0.000000 0.570000 ( 0.576570)
cary_to_i_sort 0.460000 0.020000 0.480000 ( 0.477372)
Addendum
#theTinMan demonstrated that the comparisons between the sort_by and sort methods is sensitive to the choice of test data. Using the data he used:
def test_data(n)
(%w[10a 10b 9a 9b 8a 8b] * (n/6)).shuffle
end
I got these results:
Sort random array of size 500000
user system total real
mudasobwa 0.620000 0.000000 0.620000 ( 0.622566)
jorg 0.620000 0.010000 0.630000 ( 0.636018)
jorg_sort 0.640000 0.010000 0.650000 ( 0.638493)
the 8.790000 0.090000 8.880000 ( 8.886725)
the_sort 2.670000 0.070000 2.740000 ( 2.743085)
engineer 3.150000 0.040000 3.190000 ( 3.184534)
sawa 3.460000 0.040000 3.500000 ( 3.506875)
cary_rjust 0.360000 0.010000 0.370000 ( 0.367094)
cary_rjust_sort 0.480000 0.010000 0.490000 ( 0.499956)
cary_to_i 0.190000 0.010000 0.200000 ( 0.187136)
cary_to_i_sort 0.200000 0.000000 0.200000 ( 0.203509)
Notice that the absolute times are also affected.
Can anyone explain the reason for the difference in the benchmarks?

How to tokenize a simple mixed string into either ints or symbols?

Trying to convert a simple math string "1 2 3 * + 4 5 - /" to an array of integers and/or symbols like [1, 2, 3, :*, :+, 4, 5, :-, :/].
Is there a more elegant (and extendable) solution than this?
def tokenize s
arr = s.split(/ /)
symbols = %w{ + - / * }
arr.collect! do |c|
if symbols.include?(c)
c.to_sym
else
c.to_i
end
end
end
def tokenize(str)
str.split.map! { |t| t[/\d/] ? t.to_i : t.to_sym }
end
Instead of checking if the token is in a set of operations, you can just check if it contains a numeral digit (for your use case). So it's either an integer, or an operation.
Also note that Array#collect! and Array#map! are identical, and String#split by default splits on white space.
How about...
h = Hash[*%w{+ - / *}.map { |x| [x, x.to_sym] }.flatten]
And then,
arr.map { |c| h[c] || c.to_i }
So together:
def tokenize s
h = Hash[*%w{+ - / *}.map { |x| [x, x.to_sym] }.flatten]
s.split.map { |c| h[c] || c.to_i }
end
def tokenize s
s.scan(/(\d+)|(\S+)/)
.map{|num, sym| num && num.to_i || sym.to_sym}
end

Insert Something Every X Number of Characters Without Regex

In this question, the asker requests a solution that would insert a space every x number of characters. The answers both involve using a regular expression. How might you achieve this without a regex?
Here's what I came up with, but it's a bit of a mouthful. Any more concise solutions?
string = "12345678123456781234567812345678"
new_string = string.each_char.map.with_index {|c,i| if (i+1) % 8 == 0; "#{c} "; else c; end}.join.strip
=> "12345678 12345678 12345678 12345678"
class String
def in_groups_of(n)
chars.each_slice(n).map(&:join).join(' ')
end
end
'12345678123456781234567812345678'.in_groups_of(8)
# => '12345678 12345678 12345678 12345678'
class Array
# This method is from
# The Poignant Guide to Ruby:
def /(n)
r = []
each_with_index do |x, i|
r << [] if i % n == 0
r.last << x
end
r
end
end
s = '1234567890'
n = 3
join_str = ' '
(s.split('') / n).map {|x| x.join('') }.join(join_str)
#=> "123 456 789 0"
This is slightly shorter but requires two lines:
new_string = ""
s.split(//).each_slice(8) { |a| new_string += a.join + " " }

Find most common string in an array

I have this array, for example (the size is variable):
x = ["1.111", "1.122", "1.250", "1.111"]
and I need to find the most commom value ("1.111" in this case).
Is there an easy way to do that?
Tks in advance!
EDIT #1: Thank you all for the answers!
EDIT #2: I've changed my accepted answer based on Z.E.D.'s information. Thank you all again!
Ruby < 2.2
#!/usr/bin/ruby1.8
def most_common_value(a)
a.group_by do |e|
e
end.values.max_by(&:size).first
end
x = ["1.111", "1.122", "1.250", "1.111"]
p most_common_value(x) # => "1.111"
Note: Enumberable.max_by is new with Ruby 1.9, but it has been backported to 1.8.7
Ruby >= 2.2
Ruby 2.2 introduces the Object#itself method, with which we can make the code more concise:
def most_common_value(a)
a.group_by(&:itself).values.max_by(&:size).first
end
As a monkey patch
Or as Enumerable#mode:
Enumerable.class_eval do
def mode
group_by do |e|
e
end.values.max_by(&:size).first
end
end
["1.111", "1.122", "1.250", "1.111"].mode
# => "1.111"
One pass through the hash to accumulate the counts. Use .max() to find the hash entry with the largest value.
#!/usr/bin/ruby
a = Hash.new(0)
["1.111", "1.122", "1.250", "1.111"].each { |num|
a[num] += 1
}
a.max{ |a,b| a[1] <=> b[1] } # => ["1.111", 2]
or, roll it all into one line:
ary.inject(Hash.new(0)){ |h,i| h[i] += 1; h }.max{ |a,b| a[1] <=> b[1] } # => ["1.111", 2]
If you only want the item back add .first():
ary.inject(Hash.new(0)){ |h,i| h[i] += 1; h }.max{ |a,b| a[1] <=> b[1] }.first # => "1.111"
The first sample I used is how it would be done in Perl usually. The second is more Ruby-ish. Both work with older versions of Ruby. I wanted to compare them, plus see how Wayne's solution would speed things up so I tested with benchmark:
#!/usr/bin/env ruby
require 'benchmark'
ary = ["1.111", "1.122", "1.250", "1.111"] * 1000
def most_common_value(a)
a.group_by { |e| e }.values.max_by { |values| values.size }.first
end
n = 1000
Benchmark.bm(20) do |x|
x.report("Hash.new(0)") do
n.times do
a = Hash.new(0)
ary.each { |num| a[num] += 1 }
a.max{ |a,b| a[1] <=> b[1] }.first
end
end
x.report("inject:") do
n.times do
ary.inject(Hash.new(0)){ |h,i| h[i] += 1; h }.max{ |a,b| a[1] <=> b[1] }.first
end
end
x.report("most_common_value():") do
n.times do
most_common_value(ary)
end
end
end
Here's the results:
user system total real
Hash.new(0) 2.150000 0.000000 2.150000 ( 2.164180)
inject: 2.440000 0.010000 2.450000 ( 2.451466)
most_common_value(): 1.080000 0.000000 1.080000 ( 1.089784)
You could sort the array and then loop over it once. In the loop just keep track of the current item and the number of times it is seen. Once the list ends or the item changes, set max_count == count if count > max_count. And of course keep track of which item has the max_count.
You could create a hashmap that stores the array items as keys with their values being the number of times that element appears in the array.
Pseudo Code:
["1.111", "1.122", "1.250", "1.111"].each { |num|
count=your_hash_map.get(num)
if(item==nil)
hashmap.put(num,1)
else
hashmap.put(num,count+1)
}
As already mentioned, sorting might be faster.
Using the default value feature of hashes:
>> x = ["1.111", "1.122", "1.250", "1.111"]
>> h = Hash.new(0)
>> x.each{|i| h[i] += 1 }
>> h.max{|a,b| a[1] <=> b[1] }
["1.111", 2]
It will return most popular value in array
x.group_by{|a| a }.sort_by{|a,b| b.size<=>a.size}.first[0]
IE:
x = ["1.111", "1.122", "1.250", "1.111"]
# Most popular
x.group_by{|a| a }.sort_by{|a,b| b.size<=>a.size}.first[0]
#=> "1.111
# How many times
x.group_by{|a| a }.sort_by{|a,b| b.size<=>a.size}.first[1].size
#=> 2

Resources