Why is this array building method so slow? - ruby

This method is taking over 7 seconds with 50 markets and 2,500 flows (~250,000 iterations). Why so slow?
def matrix
[:origin, :destination].collect do |location|
markets.collect do |market|
network.flows.collect { |flow| flow[location] == market ? 1 : 0 }
end
end.flatten
end
I know that the slowness comes from the comparison of one market to another market based on benchmarks that I've run.
Here are the relevant parts of the class that's being compared.
module FreightFlow
class Market
include ActiveAttr::Model
attribute :coordinates
def ==(value)
coordinates == value.coordinates
end
end
end
What's the best way to make this faster?

You are constructing 100 intermediate collections (2*50) comprising of a total of 250,000 (2*50*2500) elements, and then flattening it at the end. I would try constructing the whole data structure in one pass. Make sure that markets and network.flows are stored in a hash or set. Maybe something like:
def matrix
network.flows.collect do |flow|
(markets.h­as_key? flow[:origin] or
marke­ts.has_key­? flow[:destination]) ? 1 : 0
end
end

This is a simple thing but it can help...
In your innermost loop you're doing:
network.flows.collect { |flow| flow[location] == market ? 1 : 0 }
Instead of using the ternary statement to convert to 1 or 0, use true and false Booleans instead:
network.flows.collect { |flow| flow[location] == market }
This isn't a big difference in speed, but over the course of that many nested loops it adds up.
In addition, it allows you to simplify your tests using the matrix being generated. Instead of having to compare to 1 or 0, you can simplify your conditional tests to if flow[location], if !flow[location] or unless flow[location], again speeding up your application a little bit for each test. If those are deeply nested in loops, which is very likely, that little bit can add up again.
Something that is important to do, when speed is important, is use Ruby's Benchmark class to test various ways of doing the same task. Then, instead of guessing, you KNOW what works. You'll find lots of questions on Stack Overflow where I've supplied an answer that consists of a benchmark showing the speed differences between various ways of doing something. Sometimes the differences are very big. For instance:
require 'benchmark'
puts `ruby -v`
def test1()
true
end
def test2(p1)
true
end
def test3(p1, p2)
true
end
N = 10_000_000
Benchmark.bm(5) do |b|
b.report('?:') { N.times { (1 == 1) ? 1 : 0 } }
b.report('==') { N.times { (1 == 1) } }
b.report('if') {
N.times {
if (1 == 1)
1
else
0
end
}
}
end
Benchmark.bm(5) do |b|
b.report('test1') { N.times { test1() } }
b.report('test2') { N.times { test2('foo') } }
b.report('test3') { N.times { test3('foo', 'bar') } }
b.report('test4') { N.times { true } }
end
And the results:
ruby 1.9.3p392 (2013-02-22 revision 39386) [x86_64-darwin10.8.0]
user system total real
?: 1.880000 0.000000 1.880000 ( 1.878676)
== 1.780000 0.000000 1.780000 ( 1.785718)
if 1.920000 0.000000 1.920000 ( 1.914225)
user system total real
test1 2.760000 0.000000 2.760000 ( 2.760861)
test2 4.800000 0.000000 4.800000 ( 4.808184)
test3 6.920000 0.000000 6.920000 ( 6.915318)
test4 1.640000 0.000000 1.640000 ( 1.637506)
ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin10.8.0]
user system total real
?: 2.280000 0.000000 2.280000 ( 2.285408)
== 2.090000 0.010000 2.100000 ( 2.087504)
if 2.350000 0.000000 2.350000 ( 2.363972)
user system total real
test1 2.900000 0.010000 2.910000 ( 2.899922)
test2 7.070000 0.010000 7.080000 ( 7.092513)
test3 11.010000 0.030000 11.040000 ( 11.033432)
test4 1.660000 0.000000 1.660000 ( 1.667247)
There are two different sets of tests. The first is looking to see what the differences are with simple conditional tests vs. using == without a ternary to get just the Booleans. The second is to test the effect of calling a method, a method with a single parameter, and with two parameters, vs. "inline-code" to find out the cost of the setup and tear-down when calling a method.
Modern C compilers do some amazing things when they analyze the code prior to emitting the assembly language to be compiled. We can fine-tune them to write for size or speed. When we go for speed, the program grows as the compiler looks for loops it can unroll and places it can "inline" code, to avoid making the CPU jump around and throwing away stuff that's in the cache.
Ruby is much higher up the language chain, but some of the same ideas still apply. We can write in a very DRY manner, and avoid repetition and use methods and classes to abstract our data, but the cost is reduced processing speed. The answer is to write your code intelligently and don't waste CPU time and unroll/inline where necessary to gain speed and other times be DRY to make your code more maintainable.
It's all a balancing act, and there's a time for writing both ways.

Memoizing the indexes of the markets within the flows was way faster than any other solution. Time reduced from ~30 seconds when the question was asked to 0.6 seconds.
First, I added a flow_index in the Network class. It stores the indexes of the flows that contain the markets.
def flow_index
#flow_index ||= begin
flow_index = {}
[:origin, :destination].each do |location|
flow_index[location] = {}
flows.each { |flow| flow_index[location][flow[location]] = [] }
flows.each_with_index { |flow, i| flow_index[location][flow[location]] << i }
end
flow_index
end
end
Then, I refactored the matrix method to use the flow index.
def matrix
base_row = network.flows.count.times.collect { 0 }
[:origin, :destination].collect do |location|
markets.collect do |market|
row = base_row.dup
network.flow_index[location][market].each do |i|
row[i] = 1
end
row
end
end.flatten
end
The base_row is created with all 0s and you just replace with 1s at the locations from the flow_index for that market.

Related

Why is `str.reverse == str` faster than `str[0] != str[-1]`?

I ran some benchmarks and was wondering why reversing a string and comparing it to itself seems to be faster than comparing individual characters.
Reversing a string is worst case O(n) and comparison O(n), resulting in O(n), unless comparing the same object, which should be O(1). But
str = "test"
str.reverse.object_id == str.object_id # => false
Is character comparison worst case O(1)? What am I missing?
Edit
I extracted and simplified for the question but here's the code I was running.
def reverse_compare(str)
str.reverse == str
end
def iterate_compare(str)
# can test with just str[0] != str[-1]
(str.length/2).times do |i|
return false if str[i] != str[-i-1]
end
end
require "benchmark"
n = 2000000
Benchmark.bm do |x|
str = "cabbbbbba" # best case single comparison
x.report("reverse_compare") { n.times do reverse_compare(str) ; a = "1"; end }
x.report("iterate_compare") { n.times do iterate_compare(str) ; a = "1"; end }
end
user system total real
reverse_compare 0.760000 0.000000 0.760000 ( 0.769463)
iterate_compare 1.840000 0.010000 1.850000 ( 1.855031)
There are two factors in favour of the reverse method:
Both String#reverse and String#== are written in pure C instead of ruby. Their inner loop already uses the fact that the length of the string is known, so there are no unnecessarry boundary checks.
String#[] however needs to check the string boundaries at every call. Also the main loop is written in ruby thereby being a tad bit slower as well. Also it will always create a new (one character long) string object as well to return which needs to be handled, GCd, etc.
It looks like these two factors have a bigger performance gain than what you get by a better algorithm, but which is done in ruby.
Also note that in your test you are not testing a random string, but a specific one, which is really short as well. If you woud try a larger one, then it is possible that the ruby implementaion will be quicker.
To proof some thoughts of #SztupY. If you change a bit your code, like this:
def reverse_compare(str)
str.reverse == str
end
def iterate_compare(a, b)
a != b
end
require "benchmark"
n = 2_000_000
Benchmark.bm do |x|
str = "cabbbbbba" # best case single comparison
a = str[0]; b = str[-1]
x.report("reverse_compare") { n.times do reverse_compare(str) ; end }
x.report("iterate_compare") { n.times do iterate_compare(a, b) ; end }
end
You will get a bit different result:
#> user system total real
#> reverse_compare 0.359000 0.000000 0.359000 ( 0.361493)
#> iterate_compare 0.187000 0.000000 0.187000 ( 0.201590)
So, you could guess now that it takes some time to create 2 string objects from String.

How to create a method that checks if string1 can be rearranged to equal string2?

I've taken a stab at writing a method, but when my code isn't running and I'm not sure why.
str1 = "cored"
str2 = "coder"
def StringScramble(str1,str2)
numCombos = str1.length.downto(1).inject(:*)
arr = []
until arr.length == numCombos
shuffled = str1.split('').join
unless arr.include?(shuffled)
arr << shuffled
end
end
if arr.include?(str1)
return true
else
return false
end
end
Update: As #eugen pointed out in the comment, there's a much more efficient way:
str1.chars.sort == str2.chars.sort # => true
Original answer:
str1.chars.permutation.include?(str2.chars) # => true
Most efficient method?
Comparing sorted strings is certainly the easiest way, but you can one do better if efficiency is paramount? Last month #raph posted a comment that suggested an approach that sounded pretty good to me. I intended to benchmark it against the standard test, but never got around to it. The purpose of my answer is to benchmark the suggested approach against the standard one.
The challenger
The idea is create a counting hash h for the characters in one of the strings, so that h['c'] equals the number of times 'c' appears in the string. One then goes through the characters of the second string. Suppose 'c' is one of those characters. Then false is returned by the method if h.key?('c') => false or h['c'] == 0 (which can also be written h['c'].to_i == 0, as nil.to_i => 0); otherwise, the next character of the second string is checked against the hash. Assuming the strings are of equal length, they are anagrams of each other if and only if false has not been returned after all the characters of the second string have been checked. Creating the hash for the shorter of the two strings probably offers a further improvement. Here is my code for the method:
def hcompare(s1,s2)
return false unless s1.size == s2.size
# set `ss` to the shorter string, `sl` to the other.
ss, sl = (s1.size < s2.size) ? [s1, s2] : [s2, s1]
# create hash `h` with letter counts for the shorter string:
h = ss.chars.each_with_object(Hash.new(0)) { |c,h| h[c] += 1}
#decrement counts in `h` for characters in `sl`
sl.each_char { |c| return false if h[c].to_i == 0; h[c] -= 1 }
true
end
The incumbent
def scompare(s1,s2)
s1.chars.sort == s2.chars.sort
end
Helpers
methods = [:scompare, :hcompare]
def compute(m,s1,s2)
send(m,s1,s2)
end
def shuffle_chars(s)
s.chars.shuffle.join
end
Test data
reps = 20
ch = [*'b'..'z']
The benchmark
require 'benchmark'
[50000, 100000, 500000].each do |n|
t1 = Array.new(reps) { (Array.new(n) {ch.sample(1) }).join}
test_strings = { true=>t1.zip(t1.map {|s| shuffle_chars(s)})}
test_strings[false] = t1.zip(t1.map {|t| shuffle_chars((t[1..-1] << 'a'))})
puts "\nString length #{n}, #{reps} repetitions"
[true, false].each do |same|
puts "\nReturn #{same} "
Benchmark.bm(10) do |bm|
methods.each do |m|
bm.report m.to_s do
test_strings[same].each { |s1,s2| compute(m,s1,s2) }
end
end
end
end
end
Comparisons performed
I compared the two methods, scompare (uses sort) and hcompare (uses hash), performing the benchmark for three string lengths: 50,000, 100,000 and 500,000 characters. For each string length I created the first of two strings by selecting each character randomly from [*('b'..'z')]. I then created two strings to be compared with the first. One was merely a shuffling of the characters of the first string, so the methods would return true when those two strings are compared. In the second case I did the same, except I replaced a randomly selected character with 'a', so the methods would return false. These two cases are labelled true and false below.
Results
String length 50000, 20 repetitions
Return true
user system total real
scompare 0.620000 0.010000 0.630000 ( 0.625711)
hcompare 0.840000 0.010000 0.850000 ( 0.845548)
Return false
user system total real
scompare 0.530000 0.000000 0.530000 ( 0.532666)
hcompare 1.370000 0.000000 1.370000 ( 1.366293)
String length 100000, 20 repetitions
Return true
user system total real
scompare 1.420000 0.100000 1.520000 ( 1.516580)
hcompare 2.280000 0.010000 2.290000 ( 2.284189)
Return false
user system total real
scompare 1.020000 0.010000 1.030000 ( 1.034887)
hcompare 1.960000 0.000000 1.960000 ( 1.962655)
String length 500000, 20 repetitions
Return true
user system total real
scompare 10.310000 0.540000 10.850000 ( 10.850988)
hcompare 9.960000 0.180000 10.140000 ( 10.153366)
Return false
user system total real
scompare 8.120000 0.570000 8.690000 ( 8.687847)
hcompare 9.160000 0.030000 9.190000 ( 9.189997)
Conclusions
As you see, the method using the counting hash was superior to using sort in only one true case, when n => 500,000. Even there, the margin of victory was pretty small, much smaller than the relative differences in most of the other benchmark comparisons, where the standard method cruised to victory. While the hash counting method might have fared better with different tests, it seems that the conventional sorting method is hard to beat.
Was this answer of interest? I'm not sure, but since I had already done most of the work before seeing the results (which I expected would favour the counting hash), I decided to go ahead and put it out.

Ruby convention for accessing first/last element in array [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
This is a question about conventions. The two sets of commands below return identical results.
a = [1, 2, 3]
a.first # => 1
a[0] # => 1
a.last # => 3
a[-1] # => 3
Which of these is preferred in Ruby, the explicit index or the functions? Assuming, of course, that this is in code which always accesses the first or last element.
Note: I've been thinking about the cycles each would take. Because first and last accept parameters, they will have a little more overhead, but I don't know if that affects what the community prefers.
Thanks!
EDIT
If you read the comments on this post, there was a big debate about my last paragraph. While I failed to remember that [x] is equivalent to .[](x), I was correct in my conclusion that first and last have a bit more overhead. Considering the nature of both, I believe that this is due to the argument check for first/last. These need to check if there are arguments whereas [] can assume that they exist.
CODE
require 'benchmark'
a = [1..1000]
MAX = 1000000
Benchmark.bm(15) do |b|
b.report("small first") { MAX.times do; a.first; end }
b.report("small [0]") { MAX.times do; a[0]; end }
b.report("small last") { MAX.times do; a.last; end }
b.report("small [-1]") { MAX.times do; a[-1]; end }
end
a = [1..100000000000]
Benchmark.bm(15) do |b|
b.report("large first") { MAX.times do; a.first; end }
b.report("large [0]") { MAX.times do; a[0]; end }
b.report("large last") { MAX.times do; a.last; end }
b.report("large [-1]") { MAX.times do; a[-1]; end }
end
RESULTS
user system total real
small first 0.350000 0.000000 0.350000 ( 0.901497)
small [0] 0.330000 0.010000 0.340000 ( 0.857786)
small last 0.370000 0.000000 0.370000 ( 1.054216)
small [-1] 0.370000 0.000000 0.370000 ( 1.137655)
user system total real
large first 0.340000 0.010000 0.350000 ( 0.897581)
large [0] 0.320000 0.010000 0.330000 ( 0.889725)
large last 0.350000 0.000000 0.350000 ( 1.071135)
large [-1] 0.380000 0.000000 0.380000 ( 1.119587)
Code is read more than it is written, and first and last take less effort to understand, especially for a less experienced Ruby programmer or someone from a language with different indexing semantics.
While most programmers will immediately know that these are the same:
a.first
a[0]
the first still reads more easily. There isn't a marked difference in how hard it is to read, but it's there.
last is another issue. Accessing the index 0 will get you the first element of an array in almost any language. But negative indexing is only available in some languages. If a C programmer with minimal Ruby experience is trying to read my code, which will they understand faster?:
a.last
a[-1]
The negative index will probably force them to do a Google search.
Since Matz designed Ruby after a few other languages, I think the conventions come from those other languages.
In Lisp, one of Ruby's inspirational parents, you would use something close to the last and first methods so I'll say last and first is convention.
I only really use first and last. I see many programs out there that use those methods but ultimately it is your choice. That's the beauty of Ruby ;)
From the point of view of the speed, for larger arrays, first and last are faster than []. For smaller arrays it is the other way around.
Large array:
array = (0..100000000).to_a
t = Time.now
10.times{array[0]}
puts Time.now - t
# => 0.000225356
t = Time.now
10.times{array.first}
puts Time.now - t
# => 2.9736e-05
t = Time.now
10.times{array[-1]}
puts Time.now - t
# => 7.847e-06
t = Time.now
10.times{array.last}
puts Time.now - t
# => 6.174e-06
Small array:
array = (0..100).to_a
t = Time.now
10.times{array[0]}
puts Time.now - t
# => 4.403e-06
t = Time.now
10.times{array.first}
puts Time.now - t
# => 5.933e-06
t = Time.now
10.times{array[-1]}
puts Time.now - t
# => 4.982e-06
t = Time.now
10.times{array.last}
puts Time.now - t
# => 5.411e-06
For ease of writing/reading, first and last can be used without arguments, unlike [], so they are simpler.
Sometimes, using first and last makes things easier, while it is difficult with []: e.g., array.each_slice(3).map(&:first).

XML data scanner with Ruby: case statement or dynamic dispatch in term of performance?

I'm writing a XML data scanner which read XML text using some XML parser library like nokogiri or such,
and generate a tree of nodes. I need to create an object per a XML element.
So, I need a method which creates an object according to given element name and attributes, like this,
regardless of kind of the parser library options (either SAX or DOM) I'm using:
create_node(name, attributes_hash)
This method need to branch according to the name. Implementation possibilities are:
Case statement
Method dispatch and pre-defined methods
Since this method possibly become a bottleneck, I wrote a benchmark script to check how Ruby perform. (The benchmark script attached at last part of this question. I don't like some part of the script -- particularly how to create case statement --, so comments to how I can improve this is also welcome, but please provide it as comments not an answer... I probably need to create a question for that too..).
The script measures following four cases, in two range sizes:
method dispatch with constant name
method dispatch with name concatenate with #{}
method dispatch with name concatenate with +
using case statement, call the same methods
Results:
user system total real
a to z: method_calls (with const name) 0.090000 0.000000 0.090000 ( 0.092516)
a to z: method_calls (with dynamic name) 1 0.180000 0.000000 0.180000 ( 0.181793)
a to z: method_calls (with dynamic name) 2 0.200000 0.000000 0.200000 ( 0.202818)
a to z: switch_calls 0.130000 0.000000 0.130000 ( 0.132633)
user system total real
a to zz: method_calls (with const name) 2.900000 0.000000 2.900000 ( 2.894273)
a to zz: method_calls (with dynamic name) 1 6.500000 0.010000 6.510000 ( 6.507099)
a to zz: method_calls (with dynamic name) 2 6.980000 0.000000 6.980000 ( 6.987534)
a to zz: switch_calls 4.750000 0.000000 4.750000 ( 4.742448)
I observe const name based method dispatch is faster than using case statement, however, if string operation is involved when determine the method name, the costs to determine the method name costs more than actual method call costs, effectively make these options(2 and 3) slower than option 4. Also, the difference between option 2 and 3 are negligible.
To make the scanner secure, I prefer to have some prefix to the methods, since without that, it is possible to craft a XML to invoke some methods, which I don't want to happen. But the cost to determine the method name is not negligible.
How do you write these scanner? I want to know an answer to following questions:
Is there any good scheme other than above?
If not, which (case-when or method dispatch) scheme you choose?
If I don't compute method name, it is faster. Is there any good way to do method dispatch securely? (by limiting node name to be called, for example.)
The benchmark script
# Benchmark to measure the difference of
# use of case statement and message passing
require 'benchmark'
def bench(title, tobj, count)
Benchmark.bmbm do |b|
b.report "#{title}: method_calls (with const name)" do
(1..count).each do |c|
tobj.run_send_using_const
end
end
b.report "#{title}: method_calls (with dynamic name) 1" do
(1..count).each do |c|
tobj.run_send_using_dynamic_1
end
end
b.report "#{title}: method_calls (with dynamic name) 2" do
(1..count).each do |c|
tobj.run_send_using_dynamic_2
end
end
b.report "#{title}: switch_calls" do
(1..count).each do |c|
tobj.run_switch
end
end
end
end
class Switcher
def initialize(names)
#method_names = { }
#names = names
names.each do |n|
#method_names[n] = "dynamic_#{n}"
##n = n
class << self
mname = "dynamic_#{##n}"
define_method(mname) do
mname
end
end
end
swst = ""
names.each do |n|
swst << "when \"#{n}\" then dynamic_#{n}\n"
end
st = "
def run_switch_each(n)
case n
#{swst}
end
end
"
eval(st)
end
def run_send_using_const
#method_names.each_value do |n|
self.send n
end
end
def run_send_using_dynamic_1
#names.each do |n|
self.send "dynamic_#{n}"
end
end
def run_send_using_dynamic_2
#names.each do |n|
self.send "dynamic_" + n
end
end
def run_switch
#names.each do |n|
run_switch_each(n)
end
end
end
sw1 = Switcher.new('a'..'z')
sw2 = Switcher.new('a'..'zz')
bench("a to z", sw1, 10000)
bench("a to zz", sw2, 10000)
I believe this is a case of premature optimization.
But the cost to determine the method name is not negligible.
Non-negligible compared to what? The approaches here have different performance numbers, but will the time taken to dispatch one node be comparable to what it takes to parse the node (with Nokogiri or etc), to construct the specialized node object, and do whatever you need with it?
I believe it won't. I don't have a benchmark to prove that statement (you need actual code for that), but the fact that string concatenation vs string interpolation actually makes a noticeable difference in the results (dynamic1 vs dynamic2) is a good indicator that you're measuring something trivial.
Or that adding one string concatenation per dispatch increases the resulting time 2-2.5 times (const vs dynamic2).

Ruby: Comparing two Arrays of Hashes

I'm definitely a newbie to ruby (and using 1.9.1), so any help is appreciated. Everything I've learned about Ruby has been from using google. I'm trying to compare two arrays of hashes and due to the sizes, it's taking way to long and flirts with running out of memory. Any help would be appreciated.
I have a Class (ParseCSV) with multiple methods (initialize, open, compare, strip, output).
The way I have it working right now is as follows (and this does pass the tests I've written, just using a much smaller data set):
file1 = ParseCSV.new(“some_file”)
file2 = ParseCSV.new(“some_other_file”)
file1.open #this reads the file contents into an Array of Hash’s through the CSV library
file1.strip #This is just removing extra hash’s from each array index. So normally there are fifty hash’s in each array index, this is just done to help reduce memory consumption.
file2.open
file2.compare(“file1.storage”) ##storage is The array of hash’s from the open method
file2.output
Now what I’m struggling with is the compare method. Working on smaller data sets it’s not a big deal at all, works fast enough. However in this case I’m comparing about 400,000 records (all read into the array of hashes) against one that has about 450,000 records. I’m trying to speed this up. Also I can’t run the strip method on file2. Here is how I’m doing it now:
def compare(x)
#obviously just a verbose message
puts "Comparing and leaving behind non matching entries"
x.each do |row|
##storage is the array of hashes
#storage.each_index do |y|
if row[#opts[:field]] == #storage[y][#opts[:field]]
#storage.delete_at(y)
end
end
end
end
Hopefully that makes sense. I know it’s going to be a slow process just because it has to iterate 400,000 rows 440,000 times each. But do you have any other ideas on how to speed it up and possibly reduce memory consumption?
Yikes, that'll be O(n^2) runtime. Nasty.
A better bet would be to use the built in Set class.
Code would look something like:
require 'set'
file1_content = load_file_content_into_array_here("some_file")
file2_content = load_file_content_into_array_here("some_other_file")
file1_set = Set[file1_content]
unique_elements = file1_set - file2_content
That assumes that the files themselves have unique content. Should work in the generic case, but may have quirks depending on what your data looks like and how you parse it, but as long as the lines can be compared with == it should help you out.
Using a set will be MUCH faster than doing a nested loop to iterate over the file content.
(and yes, I have actually done this to process files with ~2 million lines, so it should be able to handle your case - eventually. If you're doing heavy data munging, Ruby may not be the best choice of tool though)
Here's a script comparing two ways of doing it: Your original compare() and a new_compare(). The new_compare uses more of the built in Enumerable methods. Since they are implemented in C, they'll be faster.
I created a constant called Test::SIZE to try out the benchmarks with different hash sizes. Results at the bottom. The difference is huge.
require 'benchmark'
class Test
SIZE = 20000
attr_accessor :storage
def initialize
file1 = []
SIZE.times { |x| file1 << { :field => x, :foo => x } }
#storage = file1
#opts = {}
#opts[:field] = :field
end
def compare(x)
x.each do |row|
#storage.each_index do |y|
if row[#opts[:field]] == #storage[y][#opts[:field]]
#storage.delete_at(y)
end
end
end
end
def new_compare(other)
other_keys = other.map { |x| x[#opts[:field]] }
#storage.reject! { |s| other_keys.include? s[#opts[:field]] }
end
end
storage2 = []
# We'll make 10 of them match
10.times { |x| storage2 << { :field => x, :foo => x } }
# And the rest wont
(Test::SIZE-10).times { |x| storage2 << { :field => x+100000000, :foo => x} }
Benchmark.bm do |b|
b.report("original compare") do
t1 = Test.new
t1.compare(storage2)
end
end
Benchmark.bm do |b|
b.report("new compare") do
t1 = Test.new
t1.new_compare(storage2)
end
end
Results:
Test::SIZE = 500
user system total real
original compare 0.280000 0.000000 0.280000 ( 0.285366)
user system total real
new compare 0.020000 0.000000 0.020000 ( 0.020458)
Test::SIZE = 1000
user system total real
original compare 28.140000 0.110000 28.250000 ( 28.618907)
user system total real
new compare 1.930000 0.010000 1.940000 ( 1.956868)
Test::SIZE = 5000
ruby test.rb
user system total real
original compare113.100000 0.440000 113.540000 (115.041267)
user system total real
new compare 7.680000 0.020000 7.700000 ( 7.739120)
Test::SIZE = 10000
user system total real
original compare453.320000 1.760000 455.080000 (460.549246)
user system total real
new compare 30.840000 0.110000 30.950000 ( 31.226218)

Resources