Why is `inject` very slow? - ruby

Here is the benchmark
require 'benchmark'
# create random array
arr = 40000.times.map { rand(100000).to_s }
r1 = ''
r2 = ''
r3 = ''
Benchmark.bm do |x|
x.report {
r1 = (arr.map { |s|
"[#{s}]"
}).join
}
x.report {
r2 = arr.inject('') { |memo, s|
memo + "[#{s}]"
}
}
x.report {
r3 = ''
arr.each { |s|
r3 << "[#{s}]"
}
}
end
# confirm result is same
puts r1 == r2
puts r2 == r3
Here is result
user system total real
0.047000 0.000000 0.047000 ( 0.046875)
5.031000 0.844000 5.875000 ( 5.875000)
0.031000 0.000000 0.031000 ( 0.031250)
true
true
Has any way to make inject faster ?

Here's my guess: unlike the two other methods, approach with inject keeps creating bigger and bigger strings. All of them (except the last) are temporary and will have to be garbage-collected. That's wasted memory and CPU right there. This is also a good example of Shlemiel the Painter's algorithm.
... The inefficiency to which Spolsky was drawing an analogy was the poor programming practice of repeated concatenation of C-style null-terminated character arrays (that is, strings) in which the position of the destination string has to be recomputed from the beginning of the string each time because it is not carried over from a previous concatenation. ...
Approach with map creates many small strings, so, at least, it doesn't spend as much time allocating memory.
Update
As pointed out by Yevgeniy Anfilofyev in the comments, you can avoid creation of many big strings by not creating any. Just keep appending to the memo.
r2 = arr.inject('') { |memo, s|
memo << "[#{s}]"
}
This works because both String#+ and String#<< return a new value for the string.

Related

How can I improve the performance of this small Ruby function?

I am currently doing a Ruby challenge and get the error Terminated due to timeout
for some testcases where the string input is very long (10.000+ characters).
How can I improve my code?
Ruby challenge description
You are given a string containing characters A and B only. Your task is to change it into a string such that there are no matching adjacent characters. To do this, you are allowed to delete zero or more characters in the string.
Your task is to find the minimum number of required deletions.
For example, given the string s = AABAAB, remove A an at positions 0 and 3 to make s = ABAB in 2 deletions.
My function
def alternatingCharacters(s)
counter = 0
s.chars.each_with_index { |char, idx| counter += 1 if s.chars[idx + 1] == char }
return counter
end
Thank you!
This could be faster returning the count:
str.size - str.chars.chunk_while{ |a, b| a == b }.to_a.size
The second part uses String#chars method in conjunction with Enumerable#chunk_while.
This way the second part groups in subarrays:
'aababbabbaab'.chars.chunk_while{ |a, b| a == b}.to_a
#=> [["a", "a"], ["b"], ["a"], ["b", "b"], ["a"], ["b", "b"], ["a", "a"], ["b"]]
Trivial if you can use squeeze:
str.length - str.squeeze.length
Otherwise, you could try a regular expression that matches those A (or B) that are preceded by another A (or B):
str.enum_for(:scan, /(?<=A)A|(?<=B)B/).count
Using enum_for avoids the creation of the intermediate array.
The main issue with:
s.chars.each_with_index { |char, idx| counter += 1 if s.chars[idx + 1] == char }
Is the fact that you don't save chars into a variable. s.chars will rip apart the string into an array of characters. The first s.chars call outside the loop is fine. However there is no reason to do this for each character in s. This means if you have a string of 10.000 characters, you'll instantiate 10.001 arrays of size 10.000.
Re-using the characters array will give you a huge performance boost:
require 'benchmark'
s = ''
options = %w[A B]
10_000.times { s << options.sample }
Benchmark.bm do |x|
x.report do
counter = 0
s.chars.each_with_index { |char, idx| counter += 1 if s.chars[idx + 1] == char }
# create a character array for each iteration ^
end
x.report do
counter = 0
chars = s.chars # <- only create a character array once
chars.each_with_index { |char, idx| counter += 1 if chars[idx + 1] == char }
end
end
user system total real
8.279767 0.000001 8.279768 ( 8.279655)
0.002188 0.000003 0.002191 ( 0.002191)
You could also make use of enumerator methods like each_cons and count to simplify the code, this doesn't increase performance cost a lot, but makes the code a lot more readable.
Benchmark.bm do |x|
x.report do
counter = 0
chars = s.chars
chars.each_with_index { |char, idx| counter += 1 if chars[idx + 1] == char }
end
x.report do
s.each_char.each_cons(2).count { |a, b| a == b }
# ^ using each_char instead of chars to avoid
# instantiating a character array
end
end
user system total real
0.002923 0.000000 0.002923 ( 0.002920)
0.003995 0.000000 0.003995 ( 0.003994)

Returning a boolean if an element occurs more than three times in an array

I need to take an array and return true if it contains an element three times or more. If the array contains an element that occurs less than three times, it needs to return false. I am guessing I need to use the count method, but I am stuck as to what to put in my block. I have written the following code:
def got_three?(array)
array.count {|x|}
end
got_three? ([1, 2, 2, 2, 4, 5, 4, 6, 7])
Can anybody help?
You can use #count, but you'll also need another primitive, since you want to check for each element. In this case, #any? will suffice.
def got_three?(array)
array.any? { |y| array.count(y) >= 3 }
end
This returns whether any element which appears in the array appears at least three times.
Silvio's answer will work, but as Myst pointed out, it will iterate over the array multiple times with the same arguments.
This should be a little better, without the complexity of Myst's solution:
def has_three?(array)
array.uniq.any? { |x| array.count(x) >= 3 }
end
This slight difference means that you will only iterate over the array once for each unique element, rather than running an additional check for every duplicate element.
Although I love Silvio's answer, it does, potentially, iterate over the array multiple times.
Depending on the array's size and how often this method is used, it could hurt performance - especially if used for a server application, such as within Rails.
The following might be less readable, but should perform better:
def has_three?(array)
h = (Thread.current[:_has_three_buffer] ||= Hash.new(0)).clear
array.each do |i|
h[i] += 1
return true if h[i] >= 3
end
false
end
Edit
A quick explanation about the optimizations in the method.
Speed:
The method uses a Hash to map how many times each element is repeated. This allows for less iterations.
The method iterates over the array no more then once, stopping the moment a true response is available.
This method would read, at the very least 3 elements from the array. At the most, this method will iterate once through the array.
Memory and object allocation (overhead):
Creating a Hash to map the array is an expensive overhead that can (and should) be minimized.
The overhead is cause by the performance price related to the object creation and initialization (including the initial memory allocation).
Benchmarking shows that the overhead is well worth the price when the triple object is deeper within the array (i.e., located after the third position).
This method created a single Hash per thread, that will be used whenever the method is called. This is a better approach than using a Mutex which might provide thread safety but prevent parallel concurrency (i.e., when using JRuby).
The Hash is stored in the current thread using Thread.current's memory store.
The Hash: Thread.current[:_has_three_buffer] ||= Hash.new(0) will remain in the thread's memory until the thread exits, so that main bulk of the overhead is avoided and replaced with the smaller overhead of accessing the Hash and "clearing" the it's memory using clear.
The Hash.new(0) means that hash will return 0 instead of nil for missing elements.
Updated Benchmarks
Benchmarking the different solutions surprised me in a positive way. Silvio's solution wasn't as slow as I feared and Jon's solution was slower than I thought...
I was also amazed by the memory used by the solutions. My approach used more memory than I thought, but it was still optimized when compared to Wand's approach
Here are the measurements for 100,000 iterations using some sample Arrays.
The first two benchmarks are a 10 items long array with the triplet (the object occurring three times) in either the beginning (where my answer is at a disadvantage due to it's overhead) or the end (where some answers are at a disadvantage due to iterations).
-- Array (10 items): [1, 1, 1, 2, 3, 4, 5, 6, 7, 8]
silvio 0.040000 0.000000 0.040000 ( 0.043519)
myst 0.080000 0.000000 0.080000 ( 0.074734)
jon 0.240000 0.010000 0.250000 ( 0.252569)
wand 0.260000 0.030000 0.290000 ( 0.289101)
-- Array (10 items): [8, 7, 6, 5, 4, 3, 2, 1, 1, 1]
silvio 0.310000 0.000000 0.310000 ( 0.313565)
myst 0.300000 0.000000 0.300000 ( 0.298405)
jon 0.530000 0.010000 0.540000 ( 0.534697)
wand 0.310000 0.030000 0.340000 ( 0.350704)
Next I measured a 10 items long array with the triplet at the approximate middle (starting on item number 5).
I also reviewed approximate memory usage for the 10 item long array review for 100K iterations. The memory review includes the memory used to benchmark and build the report and it isn't accurate in any way... but it probably shows something...
-- Array (10 items): [1, 2, 3, 4, 5, 5, 5, 8, 9, 10]
Memory used (by printing the Array): 4KB - showing this isn't accurate
silvio 0.200000 0.000000 0.200000 ( 0.200012)
Memory used (silvio): 4KB
myst 0.210000 0.010000 0.220000 ( 0.214936)
Memory used (myst): 5,944KB
jon 0.420000 0.000000 0.420000 ( 0.422354)
Memory used (jon): 18,364KB
wand 0.300000 0.030000 0.330000 ( 0.342656)
Memory used (wand): 118,496KB (~118MB!)
As you can see, when reviewing short arrays (10 items), the differences are very small and might not be worth the memory used for optimizing the method for longer arrays.
Next I did the same (triplet in the middle) with a 50 items long array... this was when the difference in iterations really showed up. i.e., while my answer used ~1.5 seconds, Silvio's approach took ~7.5 seconds - about 5 times slower.
-- Array (50 items): [1...47, 50, 50, 50]
silvio 7.560000 0.010000 7.570000 ( 7.570565)
myst 1.480000 0.010000 1.490000 ( 1.487640)
jon 8.290000 0.020000 8.310000 ( 8.316714)
wand 1.550000 0.150000 1.700000 ( 1.700455)
However, Jon pointed out that the benchmarks were very sterile, with no duplicate objects except the triplet, so next I benchmarked an array with 100 items, where each item was duplicated twice (except the triplet). The triplet was in the middle.
This is where Jon's overhead (using unique) shows it's advantage. While Jon's approach took ~8.6 seconds, Silvio's approach climbed to ~14 seconds. Both Wand and I remained in the single digits, with my answer being the fastest - ~1.2 seconds.
I also reviewed the approximate memory growth caused by both the iterations and the Benchmarking itself...
Some of the solutions used a LOT of memory. i.e., Wand's beautiful solution built up to ~0.5GB(!) for the 100 items long array (100K iterations), while Jon's solution used ~50MB, my solution built up to ~5MB and Silvio's solution seems to have used no memory at all (which is probably a glitch that shows how inaccurate the memory testing is).
Again, memory usage isn't accurate.
-- Array (100 items): [1..23,1..23,24, 25,25,25, 26..50,26..50]
Memory used (by printing the Array): 4KB - showing this isn't accurate
silvio 14.090000 0.010000 14.100000 ( 14.100915)
Memory used (silvio): 0KB
myst 1.170000 0.010000 1.180000 ( 1.182665)
Memory used (myst): 5,992KB
jon 8.570000 0.020000 8.590000 ( 8.592982)
Memory used (jon): 51720KB
wand 1.940000 0.160000 2.100000 ( 2.093121)
Memory used (wand): 569,264KB (~569MB!)
Code for benchmark:
require 'benchmark'
def silvio_got_three?(array)
array.any? { |y| array.count(y) >= 3 }
end
def myst_has_three?(array)
h = (Thread.current[:_has_three_buffer] ||= Hash.new(0)).clear
# h = Hash.new(0)
array.each do |i|
h[i] += 1
return true if h[i] >= 3
end
false
end
def jon_has_three?(array)
array.uniq.any? { |x| array.count(x) >= 3 }
end
def wand_has_three?(array)
array.group_by {|i| i }.any? {|_, v| v.size >= 3}
end
def get_total_memory_used
`ps -o rss= -p #{Process.pid}`.to_i
end
# warm-up
myst_has_three? [0]
Benchmark.bm do |bm|
a = [1,1,1] + (2..8).to_a
repeat = 100_000
GC.disable
puts "Array (#{a.length} items): #{a.to_s}"
bm.report('silvio') { repeat.times { silvio_got_three?(a) } }
bm.report('myst') { repeat.times { myst_has_three?(a) } }
bm.report('jon') { repeat.times { jon_has_three?(a) } }
bm.report('wand') { repeat.times { wand_has_three?(a) } }
a.reverse!
puts "Array (#{a.length} items): #{a.to_s}"
bm.report('silvio') { repeat.times { silvio_got_three?(a) } }
bm.report('myst') { repeat.times { myst_has_three?(a) } }
bm.report('jon') { repeat.times { jon_has_three?(a) } }
bm.report('wand') { repeat.times { wand_has_three?(a) } }
a = (1..4).to_a + [5,5,5] + (8..10).to_a
mem = get_total_memory_used
puts "Array (#{a.length} items): #{a.to_s}"
puts "Memory used (by printing the Array): #{get_total_memory_used - mem}KB - showing this isn't accurate"
mem = get_total_memory_used
bm.report('silvio') { repeat.times { silvio_got_three?(a) } }
puts "Memory used (silvio): #{get_total_memory_used - mem}KB"
mem = get_total_memory_used
bm.report('myst') { repeat.times { myst_has_three?(a) } }
puts "Memory used (myst): #{get_total_memory_used - mem}KB"
mem = get_total_memory_used
bm.report('jon') { repeat.times { jon_has_three?(a) } }
puts "Memory used (jon): #{get_total_memory_used - mem}KB"
mem = get_total_memory_used
bm.report('wand') { repeat.times { wand_has_three?(a) } }
puts "Memory used (wand): #{get_total_memory_used - mem}KB"
mem = get_total_memory_used
a = (1..47).to_a + [50,50,50]
puts "Array (#{a.length} items): #{a.to_s}"
bm.report('silvio') { repeat.times { silvio_got_three?(a) } }
bm.report('myst') { repeat.times { myst_has_three?(a) } }
bm.report('jon') { repeat.times { jon_has_three?(a) } }
bm.report('wand') { repeat.times { wand_has_three?(a) } }
a = (1..23).to_a*2 + [24,25,25,25] + (26..50).to_a*2
mem = get_total_memory_used
puts "Array (#{a.length} items): [1..23,1..23,24, 25,25,25, 26..50,26..50]"
puts "Memory used (by printing the Array): #{get_total_memory_used - mem}KB - showing this isn't accurate"
mem = get_total_memory_used
bm.report('silvio') { repeat.times { silvio_got_three?(a) } }
puts "Memory used (silvio): #{get_total_memory_used - mem}KB"
mem = get_total_memory_used
bm.report('myst') { repeat.times { myst_has_three?(a) } }
puts "Memory used (myst): #{get_total_memory_used - mem}KB"
mem = get_total_memory_used
bm.report('jon') { repeat.times { jon_has_three?(a) } }
puts "Memory used (jon): #{get_total_memory_used - mem}KB"
mem = get_total_memory_used
bm.report('wand') { repeat.times { wand_has_three?(a) } }
puts "Memory used (wand): #{get_total_memory_used - mem}KB"
mem = get_total_memory_used
GC.start
end; nil
You could do something like this as well:
[1, 2, 2, 2, 4, 5, 4, 6, 7].group_by {|i| i }.any? {|_, v| v.size >= 3}

Sorting array of string by numbers

I want to sort an array like the following:
["10a","10b","9a","9b","8a","8b"]
When I call,
a = a.sort {|a,b| a <=> b}
it will sort like the following:
["10a","10b","8a","8b","9a","9b"]
The 10 is a string and is not handled as a number. When I first sort by integer and then by string, it will just do the same. Does anybody know how I can handle the 10 as a 10 without making it into an integer? That would mess up the letters a, b etc.
When I first sort by integer and then by string, it will just do the same.
That would have been my first instinct, and it seems to work perfectly:
%w[10a 10b 9a 9b 8a 8b].sort_by {|el| [el.to_i, el] }
# => ['8a', '8b', '9a', '9b', '10a', '10b']
I'd do something like this:
ary = ["10a","10b","9a","9b","8a","8b"]
sorted_ary = ary.sort_by{ |e|
/(?<digit>\d+)(?<alpha>\D+)/ =~ e
[digit.to_i, alpha]
}
ary # => ["10a", "10b", "9a", "9b", "8a", "8b"]
sorted_ary # => ["8a", "8b", "9a", "9b", "10a", "10b"]
sorted_by is going to be faster than sort for this sort of problem. Because the value being sorted isn't a direct comparison and we need to dig into it to get the values to use for collation, a normal sort would have to do it multiple times for each element. Instead, using sort_by caches the computed value, and then sorts based on it.
/(?<digit>\d+)(?<alpha>\D+)/ =~ e isn't what you'll normally see for a regular expression. The named-captures ?<digit> and ?<alpha> define the names of local variables that can be accessed immediately, when used in that form.
[digit.to_i, alpha] returns an array consisting of the leading numeric convert to an integer, followed by the character. That array is then used for comparison by sort_by.
Benchmarking sort vs. sort_by using Fruity: I added some length to the array being sorted to push the routines a bit harder for better time resolution.
require 'fruity'
ARY = (%w[10a 10b 9a 9b 8a 8b] * 1000).shuffle
compare do
cary_to_i_sort_by { ARY.sort_by { |s| s.to_i(36) } }
cary_to_i_sort { ARY.map { |s| s.to_i(36) }.sort.map {|i| i.to_s(36)} }
end
compare do
jorge_sort_by { ARY.sort_by {|el| [el.to_i, el] } }
jorg_sort { ARY.map {|el| [el.to_i, el] }.sort.map(&:last) }
end
# >> Running each test 2 times. Test will take about 1 second.
# >> cary_to_i_sort_by is faster than cary_to_i_sort by 19.999999999999996% ± 1.0%
# >> Running each test once. Test will take about 1 second.
# >> jorge_sort_by is faster than jorg_sort by 10.000000000000009% ± 1.0%
Ruby's sort_by uses a Schwartzian Transform, which can make a major difference in sort speed when dealing with objects where we have to compute the value to be sorted.
Could you run your benchmark for 100_000 instead of 1_000 in the definition of ARY?
require 'fruity'
ARY = (%w[10a 10b 9a 9b 8a 8b] * 100_000).shuffle
compare do
cary_to_i_sort_by { ARY.sort_by { |s| s.to_i(36) } }
cary_to_i_sort { ARY.map { |s| s.to_i(36) }.sort.map {|i| i.to_s(36)} }
end
compare do
jorge_sort_by { ARY.sort_by {|el| [el.to_i, el] } }
jorg_sort { ARY.map {|el| [el.to_i, el] }.sort.map(&:last) }
end
# >> Running each test once. Test will take about 10 seconds.
# >> cary_to_i_sort_by is faster than cary_to_i_sort by 2x ± 1.0
# >> Running each test once. Test will take about 26 seconds.
# >> jorg_sort is similar to jorge_sort_by
The Wikepedia article has a good efficiency analysis and example that explains why sort_by is preferred for costly comparisons.
Ruby's sort_by documentation also covers this well.
I don't think the size of the array will make much difference. If anything, as the array size grows, if the calculation for the intermediate value is costly, sort_by will still be faster because of its caching. Remember, sort_by is all compiled code, whereas using a Ruby-script-based transform is subject to slower execution as the array is transformed, handed off to sort and then the original object is plucked from the sub-arrays. A larger array means it just has to be done more times.
▶ a = ["10a","10b","9a","9b","8a","8b"]
▶ a.sort { |a,b| a.to_i == b.to_i ? a <=> b : a.to_i <=> b.to_i }
#=> [
# [0] "8a",
# [1] "8b",
# [2] "9a",
# [3] "9b",
# [4] "10a",
# [5] "10b"
#]
Hope it helps.
Two ways that don't use String#to_i (but rely on the assumption that each string consists of one or more digits followed by one lower case letter).
ary = ["10a","10b","9a","9b","8a","8b","100z", "96b"]
#1
mx = ary.map(&:size).max
ary.sort_by { |s| s.rjust(mx) }
#=> ["8a", "8b", "9a", "9b", "10a", "10b", "96b", "100z"]
#2
ary.sort_by { |s| s.to_i(36) }
#=> ["8a", "8b", "9a", "9b", "10a", "10b", "96b", "100z"]
Hmmm, I wonder if:
ary.map { |s| s.rjust(mx) }.sort.map(&:lstrip)
or
ary.map { |s| s.to_i(36) }.sort.map {|i| i.to_s(36)}
would be faster.
["10a","10b","9a","9b","8a","8b"]
.sort_by{|s| s.split(/(\D+)/).map.with_index{|s, i| i.odd? ? s : s.to_i}}
#=> ["8a", "8b", "9a", "9b", "10a", "10b"]
["10a3","10a4", "9", "9aa","9b","8a","8b"]
.sort_by{|s| s.split(/(\D+)/).map.with_index{|s, i| i.odd? ? s : s.to_i}}
#=> ["8a", "8b", "9", "9aa", "9b", "10a3", "10a4"]
Gentlemen, start your engines!
I decided to benchmark the various solutions that have been offered. One of the things I was curious about was the effect of converting sort_by solutions to sort solutions. For example, I compared my method
def cary_to_i(a)
a.sort_by { |s| s.to_i(36) }
end
to
def cary_to_i_sort(a)
a.map { |s| s.to_i(36) }.sort.map {|i| i.to_s(36)}
end
This always involves mapping the original array to the transformed values within the sort_by block, sorting that array, then mapping the results back to the elements in the original array (when that can be done).
I tried this sort_by-to-sort conversion with some of the methods that use sort_by. Not surprisingly, the conversion to sort was generally faster, though the amount of improvement varied quite a bit.
Methods compared
module Methods
def mudasobwa(a)
a.sort { |a,b| a.to_i == b.to_i ? a <=> b : a.to_i <=> b.to_i }
end
def jorg(a)
a.sort_by {|el| [el.to_i, el] }
end
def jorg_sort(a)
a.map {|el| [el.to_i, el] }.sort.map(&:last)
end
def the(a)
a.sort_by {|e| /(?<digit>\d+)(?<alpha>\D+)/ =~ e
[digit.to_i, alpha] }
end
def the_sort(a)
a.map {|e| /(?<digit>\d+)(?<alpha>\D+)/ =~ e
[digit.to_i, alpha]}.sort.map {|d,a| d.to_s+a }
end
def engineer(a) a.sort_by { |s|
s.scan(/(\d+)(\D+)/).flatten.tap{ |a| a[0] = a[0].to_i } }
end
def sawa(a) a.sort_by { |s|
s.split(/(\D+)/).map.with_index { |s, i| i.odd? ? s : s.to_i } }
end
def cary_rjust(a)
mx = a.map(&:size).max
a.sort_by {|s| s.rjust(mx)}
end
def cary_rjust_sort(a)
mx = a.map(&:size).max
a.map { |s| s.rjust(mx) }.sort.map(&:lstrip)
end
def cary_to_i(a)
a.sort_by { |s| s.to_i(36) }
end
def cary_to_i_sort(a)
a.map { |s| s.to_i(36) }.sort.map {|i| i.to_s(36)}
end
end
include Methods
methods = Methods.instance_methods(false)
#=> [:mudasobwa, :jorg, :jorg_sort, :the, :the_sort,
# :cary_rjust, :cary_rjust_sort, :cary_to_i, :cary_to_i_sort]
Test data and helper
def test_data(n)
a = 10_000.times.to_a.map(&:to_s)
b = [*'a'..'z']
n.times.map { a.sample + b.sample }
end
def compute(m,a)
send(m,a)
end
Confirm methods return the same values
a = test_data(1000)
puts "All methods correct: #{methods.map { |m| compute(m,a) }.uniq.size == 1}"
Benchmark code
require 'benchmark'
indent = methods.map { |m| m.to_s.size }.max
n = 500_000
a = test_data(n)
puts "\nSort random array of size #{n}"
Benchmark.bm(indent) do |bm|
methods.each do |m|
bm.report m.to_s do
compute(m,a)
end
end
end
Test
Sort random array of size 500000
user system total real
mudasobwa 4.760000 0.000000 4.760000 ( 4.765170)
jorg 2.870000 0.020000 2.890000 ( 2.892359)
jorg_sort 2.980000 0.020000 3.000000 ( 3.010344)
the 9.040000 0.100000 9.140000 ( 9.160944)
the_sort 4.570000 0.090000 4.660000 ( 4.668146)
engineer 10.110000 0.070000 10.180000 ( 10.198117)
sawa 27.310000 0.160000 27.470000 ( 27.504958)
cary_rjust 1.080000 0.010000 1.090000 ( 1.087788)
cary_rjust_sort 0.740000 0.000000 0.740000 ( 0.746132)
cary_to_i 0.570000 0.000000 0.570000 ( 0.576570)
cary_to_i_sort 0.460000 0.020000 0.480000 ( 0.477372)
Addendum
#theTinMan demonstrated that the comparisons between the sort_by and sort methods is sensitive to the choice of test data. Using the data he used:
def test_data(n)
(%w[10a 10b 9a 9b 8a 8b] * (n/6)).shuffle
end
I got these results:
Sort random array of size 500000
user system total real
mudasobwa 0.620000 0.000000 0.620000 ( 0.622566)
jorg 0.620000 0.010000 0.630000 ( 0.636018)
jorg_sort 0.640000 0.010000 0.650000 ( 0.638493)
the 8.790000 0.090000 8.880000 ( 8.886725)
the_sort 2.670000 0.070000 2.740000 ( 2.743085)
engineer 3.150000 0.040000 3.190000 ( 3.184534)
sawa 3.460000 0.040000 3.500000 ( 3.506875)
cary_rjust 0.360000 0.010000 0.370000 ( 0.367094)
cary_rjust_sort 0.480000 0.010000 0.490000 ( 0.499956)
cary_to_i 0.190000 0.010000 0.200000 ( 0.187136)
cary_to_i_sort 0.200000 0.000000 0.200000 ( 0.203509)
Notice that the absolute times are also affected.
Can anyone explain the reason for the difference in the benchmarks?

How to "sum" enumerables in Ruby

Is it possible to "sum" diverse enumerables when they are string mode?
per example like this? (well, I know this doesn't work.)
(( 'a'..'z') + ('A'..'Z')).to_a
note:
I am asking about getting an array of string chars from a to z and from A to Z all together.
About string mode I mean that the chars will appears like ["a", "b", ..... , "Y", "Z"]
You can use the splat operator:
[*('A'..'Z'), *( 'a'..'z')]
Like this?
[('a'..'z'), ('A'..'Z')].map(&:to_a).flatten
Or this?
('a'..'z').to_a + ('A'..'Z').to_a
Not answer but benchmarking of answers:
require 'benchmark'
n = 100000
Benchmark.bm do |x|
x.report("flat_map : ") { n.times do ; [('A'..'Z'), ('a'..'z')].flat_map(&:to_a) ; end }
x.report("map.flatten: ") { n.times do ; [('A'..'Z'), ('a'..'z')].map(&:to_a).flatten ; end }
x.report("splat : ") { n.times do ; [*('A'..'Z'), *( 'a'..'z')] ; end }
x.report("concat arr : ") { n.times do ; ('A'..'Z').to_a + ('a'..'z').to_a ; end }
end
Result:
#=> user system total real
#=> flat_map : 0.858000 0.000000 0.858000 ( 0.883630)
#=> map.flatten: 1.170000 0.016000 1.186000 ( 1.200421)
#=> splat : 0.858000 0.000000 0.858000 ( 0.857728)
#=> concat arr : 0.812000 0.000000 0.812000 ( 0.822861)
Since you want the elements from the first Range to be at the end of the output Array and the elements of the last Range to be at the beginning of the output Array, but still keep the same order within each Range, I would do it like this (which also generalizes nicely to more than two Enumerables):
def backwards_concat(*enums)
enums.reverse.map(&:to_a).inject([], &:concat)
end
backwards_concat('A'..'Z', 'a'..'z')
['a'..'z'].concat(['A'..'Z'])
This is probably the quickest way to do this.
About string mode I mean that the chars will appears like ["a", "b", ..... , "Y", "Z"]
To answer the above:
Array('a'..'z').concat Array('A'..'Z')

In Ruby how do I generate a long string of repeated text?

What is the best way to generate a long string quickly in ruby? This works, but is very slow:
str = ""
length = 100000
(1..length).each {|i| str += "0"}
I've also noticed that creating a string of a decent length and then appending that to an existing string up to the desired length works much faster:
str = ""
incrementor = ""
length = 100000
(1..1000).each {|i| incrementor += "0"}
(1..100).each {|i| str += incrementor}
Any other suggestions?
str = "0" * 999999
Another relatively quick option is
str = '%0999999d' % 0
Though benchmarking
require 'benchmark'
Benchmark.bm(9) do |x|
x.report('format :') { '%099999999d' % 0 }
x.report('multiply:') { '0' * 99999999 }
end
Shows that multiplication is still faster
user system total real
format : 0.300000 0.080000 0.380000 ( 0.405345)
multiply: 0.080000 0.080000 0.160000 ( 0.172504)

Resources