Ruby convention for accessing first/last element in array [closed] - ruby

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
This is a question about conventions. The two sets of commands below return identical results.
a = [1, 2, 3]
a.first # => 1
a[0] # => 1
a.last # => 3
a[-1] # => 3
Which of these is preferred in Ruby, the explicit index or the functions? Assuming, of course, that this is in code which always accesses the first or last element.
Note: I've been thinking about the cycles each would take. Because first and last accept parameters, they will have a little more overhead, but I don't know if that affects what the community prefers.
Thanks!
EDIT
If you read the comments on this post, there was a big debate about my last paragraph. While I failed to remember that [x] is equivalent to .[](x), I was correct in my conclusion that first and last have a bit more overhead. Considering the nature of both, I believe that this is due to the argument check for first/last. These need to check if there are arguments whereas [] can assume that they exist.
CODE
require 'benchmark'
a = [1..1000]
MAX = 1000000
Benchmark.bm(15) do |b|
b.report("small first") { MAX.times do; a.first; end }
b.report("small [0]") { MAX.times do; a[0]; end }
b.report("small last") { MAX.times do; a.last; end }
b.report("small [-1]") { MAX.times do; a[-1]; end }
end
a = [1..100000000000]
Benchmark.bm(15) do |b|
b.report("large first") { MAX.times do; a.first; end }
b.report("large [0]") { MAX.times do; a[0]; end }
b.report("large last") { MAX.times do; a.last; end }
b.report("large [-1]") { MAX.times do; a[-1]; end }
end
RESULTS
user system total real
small first 0.350000 0.000000 0.350000 ( 0.901497)
small [0] 0.330000 0.010000 0.340000 ( 0.857786)
small last 0.370000 0.000000 0.370000 ( 1.054216)
small [-1] 0.370000 0.000000 0.370000 ( 1.137655)
user system total real
large first 0.340000 0.010000 0.350000 ( 0.897581)
large [0] 0.320000 0.010000 0.330000 ( 0.889725)
large last 0.350000 0.000000 0.350000 ( 1.071135)
large [-1] 0.380000 0.000000 0.380000 ( 1.119587)

Code is read more than it is written, and first and last take less effort to understand, especially for a less experienced Ruby programmer or someone from a language with different indexing semantics.
While most programmers will immediately know that these are the same:
a.first
a[0]
the first still reads more easily. There isn't a marked difference in how hard it is to read, but it's there.
last is another issue. Accessing the index 0 will get you the first element of an array in almost any language. But negative indexing is only available in some languages. If a C programmer with minimal Ruby experience is trying to read my code, which will they understand faster?:
a.last
a[-1]
The negative index will probably force them to do a Google search.

Since Matz designed Ruby after a few other languages, I think the conventions come from those other languages.
In Lisp, one of Ruby's inspirational parents, you would use something close to the last and first methods so I'll say last and first is convention.
I only really use first and last. I see many programs out there that use those methods but ultimately it is your choice. That's the beauty of Ruby ;)

From the point of view of the speed, for larger arrays, first and last are faster than []. For smaller arrays it is the other way around.
Large array:
array = (0..100000000).to_a
t = Time.now
10.times{array[0]}
puts Time.now - t
# => 0.000225356
t = Time.now
10.times{array.first}
puts Time.now - t
# => 2.9736e-05
t = Time.now
10.times{array[-1]}
puts Time.now - t
# => 7.847e-06
t = Time.now
10.times{array.last}
puts Time.now - t
# => 6.174e-06
Small array:
array = (0..100).to_a
t = Time.now
10.times{array[0]}
puts Time.now - t
# => 4.403e-06
t = Time.now
10.times{array.first}
puts Time.now - t
# => 5.933e-06
t = Time.now
10.times{array[-1]}
puts Time.now - t
# => 4.982e-06
t = Time.now
10.times{array.last}
puts Time.now - t
# => 5.411e-06
For ease of writing/reading, first and last can be used without arguments, unlike [], so they are simpler.
Sometimes, using first and last makes things easier, while it is difficult with []: e.g., array.each_slice(3).map(&:first).

Related

How to create a method that checks if string1 can be rearranged to equal string2?

I've taken a stab at writing a method, but when my code isn't running and I'm not sure why.
str1 = "cored"
str2 = "coder"
def StringScramble(str1,str2)
numCombos = str1.length.downto(1).inject(:*)
arr = []
until arr.length == numCombos
shuffled = str1.split('').join
unless arr.include?(shuffled)
arr << shuffled
end
end
if arr.include?(str1)
return true
else
return false
end
end
Update: As #eugen pointed out in the comment, there's a much more efficient way:
str1.chars.sort == str2.chars.sort # => true
Original answer:
str1.chars.permutation.include?(str2.chars) # => true
Most efficient method?
Comparing sorted strings is certainly the easiest way, but you can one do better if efficiency is paramount? Last month #raph posted a comment that suggested an approach that sounded pretty good to me. I intended to benchmark it against the standard test, but never got around to it. The purpose of my answer is to benchmark the suggested approach against the standard one.
The challenger
The idea is create a counting hash h for the characters in one of the strings, so that h['c'] equals the number of times 'c' appears in the string. One then goes through the characters of the second string. Suppose 'c' is one of those characters. Then false is returned by the method if h.key?('c') => false or h['c'] == 0 (which can also be written h['c'].to_i == 0, as nil.to_i => 0); otherwise, the next character of the second string is checked against the hash. Assuming the strings are of equal length, they are anagrams of each other if and only if false has not been returned after all the characters of the second string have been checked. Creating the hash for the shorter of the two strings probably offers a further improvement. Here is my code for the method:
def hcompare(s1,s2)
return false unless s1.size == s2.size
# set `ss` to the shorter string, `sl` to the other.
ss, sl = (s1.size < s2.size) ? [s1, s2] : [s2, s1]
# create hash `h` with letter counts for the shorter string:
h = ss.chars.each_with_object(Hash.new(0)) { |c,h| h[c] += 1}
#decrement counts in `h` for characters in `sl`
sl.each_char { |c| return false if h[c].to_i == 0; h[c] -= 1 }
true
end
The incumbent
def scompare(s1,s2)
s1.chars.sort == s2.chars.sort
end
Helpers
methods = [:scompare, :hcompare]
def compute(m,s1,s2)
send(m,s1,s2)
end
def shuffle_chars(s)
s.chars.shuffle.join
end
Test data
reps = 20
ch = [*'b'..'z']
The benchmark
require 'benchmark'
[50000, 100000, 500000].each do |n|
t1 = Array.new(reps) { (Array.new(n) {ch.sample(1) }).join}
test_strings = { true=>t1.zip(t1.map {|s| shuffle_chars(s)})}
test_strings[false] = t1.zip(t1.map {|t| shuffle_chars((t[1..-1] << 'a'))})
puts "\nString length #{n}, #{reps} repetitions"
[true, false].each do |same|
puts "\nReturn #{same} "
Benchmark.bm(10) do |bm|
methods.each do |m|
bm.report m.to_s do
test_strings[same].each { |s1,s2| compute(m,s1,s2) }
end
end
end
end
end
Comparisons performed
I compared the two methods, scompare (uses sort) and hcompare (uses hash), performing the benchmark for three string lengths: 50,000, 100,000 and 500,000 characters. For each string length I created the first of two strings by selecting each character randomly from [*('b'..'z')]. I then created two strings to be compared with the first. One was merely a shuffling of the characters of the first string, so the methods would return true when those two strings are compared. In the second case I did the same, except I replaced a randomly selected character with 'a', so the methods would return false. These two cases are labelled true and false below.
Results
String length 50000, 20 repetitions
Return true
user system total real
scompare 0.620000 0.010000 0.630000 ( 0.625711)
hcompare 0.840000 0.010000 0.850000 ( 0.845548)
Return false
user system total real
scompare 0.530000 0.000000 0.530000 ( 0.532666)
hcompare 1.370000 0.000000 1.370000 ( 1.366293)
String length 100000, 20 repetitions
Return true
user system total real
scompare 1.420000 0.100000 1.520000 ( 1.516580)
hcompare 2.280000 0.010000 2.290000 ( 2.284189)
Return false
user system total real
scompare 1.020000 0.010000 1.030000 ( 1.034887)
hcompare 1.960000 0.000000 1.960000 ( 1.962655)
String length 500000, 20 repetitions
Return true
user system total real
scompare 10.310000 0.540000 10.850000 ( 10.850988)
hcompare 9.960000 0.180000 10.140000 ( 10.153366)
Return false
user system total real
scompare 8.120000 0.570000 8.690000 ( 8.687847)
hcompare 9.160000 0.030000 9.190000 ( 9.189997)
Conclusions
As you see, the method using the counting hash was superior to using sort in only one true case, when n => 500,000. Even there, the margin of victory was pretty small, much smaller than the relative differences in most of the other benchmark comparisons, where the standard method cruised to victory. While the hash counting method might have fared better with different tests, it seems that the conventional sorting method is hard to beat.
Was this answer of interest? I'm not sure, but since I had already done most of the work before seeing the results (which I expected would favour the counting hash), I decided to go ahead and put it out.

Is there any case to prefer for instead of each in Ruby?

I know that for i in arr slightly differs from arr.each scope and everyone keeps saying that iterators are preferrable, but I wonder if there is any case when cycle is preferrable and why it is (since iterators are more idiomatic)?
TL;DR
Use for loop for performance in Ruby 1.8
Use for loop to standards in existing projects
Use each loop to minimize side effects
Prefer each loop.
each minimizes side effects
The primary difference between for and each is scoping.
The each function takes a block. Blocks create a new lexical scope. This means that any variables declared within the scope of the function will no longer be available after the function.
[1, 2, 3].each do |i|
a = i
end
puts a
# => NameErrror: undefined local variable or method `a' for main:Object
Whereas:
for i in [1, 2, 3]
a = i
end
puts a
# => 3
Thus, using the each syntax minimizes the risk of side effects.
Determing exit point?
This said, there are special instances where the for loop may be helpful. Specifically, when finding out where a loop exited.
for i in 1..3
a = i
break if i % 2 == 0
end
puts a
# => 0
There is a better way to do this though:
a = (1..3).each do |i|
break i if i % 2 == 0
end
Each is faster (in Ruby 2.0)
Benchmark.bm(8) do |x|
x.report "For" do
max.times do
for i in 1..100
1 + 1
end
end
end
x.report "Each" do
max.times do
(1..100).each do |t|
1+1
end
end
end
end
Ruby 2.0
user system total real
For 6.420000 0.000000 6.420000 ( 6.419870)
Each 5.830000 0.000000 5.830000 ( 5.829911)
Ruby 1.8.6 (Slower Machine)
user system total real
For 17.360000 0.000000 17.360000 ( 17.409992)
Each 21.130000 0.000000 21.130000 ( 21.250754)
Benchmarks 2
If you read the comment trail, there is a discussion about the speed of creating objects in for vs each. The link provided had the following benchmarks (although, I have cleaned up the formatting and fixed the syntax errors).
b = 1..10e5
Benchmark.bmbm (10) do |x|
x.report "each {}" do
b.each { |r| r + 1 }
end
x.report "each do end" do
b.each do |r|
r + 1
end
end
x.report "for do end" do
for r in b do
r + 1
end
end
end
Ruby 2.0
user system total real
each {} 0.150000 0.000000 0.150000 ( 0.144643)
each do end 0.140000 0.000000 0.140000 ( 0.143244)
for do end 0.150000 0.000000 0.150000 ( 0.147112)
Ruby 1.8.6
user system total real
each {} 0.840000 0.000000 0.840000 ( 0.851634)
each do end 0.730000 0.000000 0.730000 ( 0.732737)
for do end 0.650000 0.000000 0.650000 ( 0.647186)
I have written a few plugins for SketchUp's Ruby API and I found that when iterating large collections (of geometry entities) I would get better performance with a for in loop over an each block.
I believe this to be because the for in loop doesn't create it's local scope and objects are reused instead of being created for every iteration as it would in the each loop.
EDIT: The speed gain depends on Ruby version. Using the test snippet used in this article http://blog.shingara.fr/each-vs-for.html:
Ruby 1.8.6:
user system total real
For 14.742000 0.000000 14.742000 ( 14.777000)
Each 18.190000 0.000000 18.190000 ( 18.194000)
Ruby 2.0.0
user system total real
For 5.975000 0.000000 5.975000 ( 5.990000)
Each 5.444000 0.000000 5.444000 ( 5.438000)
Things has greatly improved since the old 1.8.6. (Though, SketchUp extension developers still need to optimize against this version.)

Why is this array building method so slow?

This method is taking over 7 seconds with 50 markets and 2,500 flows (~250,000 iterations). Why so slow?
def matrix
[:origin, :destination].collect do |location|
markets.collect do |market|
network.flows.collect { |flow| flow[location] == market ? 1 : 0 }
end
end.flatten
end
I know that the slowness comes from the comparison of one market to another market based on benchmarks that I've run.
Here are the relevant parts of the class that's being compared.
module FreightFlow
class Market
include ActiveAttr::Model
attribute :coordinates
def ==(value)
coordinates == value.coordinates
end
end
end
What's the best way to make this faster?
You are constructing 100 intermediate collections (2*50) comprising of a total of 250,000 (2*50*2500) elements, and then flattening it at the end. I would try constructing the whole data structure in one pass. Make sure that markets and network.flows are stored in a hash or set. Maybe something like:
def matrix
network.flows.collect do |flow|
(markets.h­as_key? flow[:origin] or
marke­ts.has_key­? flow[:destination]) ? 1 : 0
end
end
This is a simple thing but it can help...
In your innermost loop you're doing:
network.flows.collect { |flow| flow[location] == market ? 1 : 0 }
Instead of using the ternary statement to convert to 1 or 0, use true and false Booleans instead:
network.flows.collect { |flow| flow[location] == market }
This isn't a big difference in speed, but over the course of that many nested loops it adds up.
In addition, it allows you to simplify your tests using the matrix being generated. Instead of having to compare to 1 or 0, you can simplify your conditional tests to if flow[location], if !flow[location] or unless flow[location], again speeding up your application a little bit for each test. If those are deeply nested in loops, which is very likely, that little bit can add up again.
Something that is important to do, when speed is important, is use Ruby's Benchmark class to test various ways of doing the same task. Then, instead of guessing, you KNOW what works. You'll find lots of questions on Stack Overflow where I've supplied an answer that consists of a benchmark showing the speed differences between various ways of doing something. Sometimes the differences are very big. For instance:
require 'benchmark'
puts `ruby -v`
def test1()
true
end
def test2(p1)
true
end
def test3(p1, p2)
true
end
N = 10_000_000
Benchmark.bm(5) do |b|
b.report('?:') { N.times { (1 == 1) ? 1 : 0 } }
b.report('==') { N.times { (1 == 1) } }
b.report('if') {
N.times {
if (1 == 1)
1
else
0
end
}
}
end
Benchmark.bm(5) do |b|
b.report('test1') { N.times { test1() } }
b.report('test2') { N.times { test2('foo') } }
b.report('test3') { N.times { test3('foo', 'bar') } }
b.report('test4') { N.times { true } }
end
And the results:
ruby 1.9.3p392 (2013-02-22 revision 39386) [x86_64-darwin10.8.0]
user system total real
?: 1.880000 0.000000 1.880000 ( 1.878676)
== 1.780000 0.000000 1.780000 ( 1.785718)
if 1.920000 0.000000 1.920000 ( 1.914225)
user system total real
test1 2.760000 0.000000 2.760000 ( 2.760861)
test2 4.800000 0.000000 4.800000 ( 4.808184)
test3 6.920000 0.000000 6.920000 ( 6.915318)
test4 1.640000 0.000000 1.640000 ( 1.637506)
ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin10.8.0]
user system total real
?: 2.280000 0.000000 2.280000 ( 2.285408)
== 2.090000 0.010000 2.100000 ( 2.087504)
if 2.350000 0.000000 2.350000 ( 2.363972)
user system total real
test1 2.900000 0.010000 2.910000 ( 2.899922)
test2 7.070000 0.010000 7.080000 ( 7.092513)
test3 11.010000 0.030000 11.040000 ( 11.033432)
test4 1.660000 0.000000 1.660000 ( 1.667247)
There are two different sets of tests. The first is looking to see what the differences are with simple conditional tests vs. using == without a ternary to get just the Booleans. The second is to test the effect of calling a method, a method with a single parameter, and with two parameters, vs. "inline-code" to find out the cost of the setup and tear-down when calling a method.
Modern C compilers do some amazing things when they analyze the code prior to emitting the assembly language to be compiled. We can fine-tune them to write for size or speed. When we go for speed, the program grows as the compiler looks for loops it can unroll and places it can "inline" code, to avoid making the CPU jump around and throwing away stuff that's in the cache.
Ruby is much higher up the language chain, but some of the same ideas still apply. We can write in a very DRY manner, and avoid repetition and use methods and classes to abstract our data, but the cost is reduced processing speed. The answer is to write your code intelligently and don't waste CPU time and unroll/inline where necessary to gain speed and other times be DRY to make your code more maintainable.
It's all a balancing act, and there's a time for writing both ways.
Memoizing the indexes of the markets within the flows was way faster than any other solution. Time reduced from ~30 seconds when the question was asked to 0.6 seconds.
First, I added a flow_index in the Network class. It stores the indexes of the flows that contain the markets.
def flow_index
#flow_index ||= begin
flow_index = {}
[:origin, :destination].each do |location|
flow_index[location] = {}
flows.each { |flow| flow_index[location][flow[location]] = [] }
flows.each_with_index { |flow, i| flow_index[location][flow[location]] << i }
end
flow_index
end
end
Then, I refactored the matrix method to use the flow index.
def matrix
base_row = network.flows.count.times.collect { 0 }
[:origin, :destination].collect do |location|
markets.collect do |market|
row = base_row.dup
network.flow_index[location][market].each do |i|
row[i] = 1
end
row
end
end.flatten
end
The base_row is created with all 0s and you just replace with 1s at the locations from the flow_index for that market.

Why is the shovel operator (<<) preferred over plus-equals (+=) when building a string in Ruby?

I am working through Ruby Koans.
The test_the_shovel_operator_modifies_the_original_string Koan in about_strings.rb includes the following comment:
Ruby programmers tend to favor the shovel operator (<<) over the plus
equals operator (+=) when building up strings. Why?
My guess is it involves speed, but I don't understand the action under the hood that would cause the shovel operator to be faster.
Would someone be able to please explain the details behind this preference?
Proof:
a = 'foo'
a.object_id #=> 2154889340
a << 'bar'
a.object_id #=> 2154889340
a += 'quux'
a.object_id #=> 2154742560
So << alters the original string rather than creating a new one. The reason for this is that in ruby a += b is syntactic shorthand for a = a + b (the same goes for the other <op>= operators) which is an assignment. On the other hand << is an alias of concat() which alters the receiver in-place.
Performance proof:
#!/usr/bin/env ruby
require 'benchmark'
Benchmark.bmbm do |x|
x.report('+= :') do
s = ""
10000.times { s += "something " }
end
x.report('<< :') do
s = ""
10000.times { s << "something " }
end
end
# Rehearsal ----------------------------------------
# += : 0.450000 0.010000 0.460000 ( 0.465936)
# << : 0.010000 0.000000 0.010000 ( 0.009451)
# ------------------------------- total: 0.470000sec
#
# user system total real
# += : 0.270000 0.010000 0.280000 ( 0.277945)
# << : 0.000000 0.000000 0.000000 ( 0.003043)
A friend who is learning Ruby as his first programming language asked me this same question while going through Strings in Ruby on the Ruby Koans series. I explained it to him using the following analogy;
You have a glass of water that is half full and you need to refill your glass.
First way you do it by taking a new glass, filling it halfway with water from a tap and then using this second half-full glass to refill your drinking glass. You do this every time you need to refill your glass.
The second way you take your half full glass and just refill it with water straight from the tap.
At the end of the day, you would have more glasses to clean if you choose to pick a new glass every time you needed to refill your glass.
The same applies to the shovel operator and the plus equal operator. Plus equal operator picks a new 'glass' every time it needs to refill its glass while the shovel operator just takes the same glass and refills it. At the end of the day more 'glass' collection for the Plus equal operator.
This is an old question, but I just ran across it and I'm not fully satisfied with the existing answers. There are lots of good points about the shovel << being faster than concatenation +=, but there is also a semantic consideration.
The accepted answer from #noodl shows that << modifies the existing object in place, whereas += creates a new object. So you need to consider if you want all references to the string to reflect the new value, or do you want to leave the existing references alone and create a new string value to use locally. If you need all references to reflect the updated value, then you need to use <<. If you want to leave other references alone, then you need to use +=.
A very common case is that there is only a single reference to the string. In this case, the semantic difference does not matter and it is natural to prefer << because of its speed.
Because it's faster / does not create a copy of the string <-> garbage collector does not need to run.
While a majority of answers cover += is slower because it creates a new copy, it's important to keep in mind that += and << are not interchangeable! You want to use each in different cases.
Using << will also alter any variables that are pointed to b. Here we also mutate a when we may not want to.
2.3.1 :001 > a = "hello"
=> "hello"
2.3.1 :002 > b = a
=> "hello"
2.3.1 :003 > b << " world"
=> "hello world"
2.3.1 :004 > a
=> "hello world"
Because += makes a new copy, it also leaves any variables that are pointing to it unchanged.
2.3.1 :001 > a = "hello"
=> "hello"
2.3.1 :002 > b = a
=> "hello"
2.3.1 :003 > b += " world"
=> "hello world"
2.3.1 :004 > a
=> "hello"
Understanding this distinction can save you a lot of headaches when you're dealing with loops!
While not a direct answer to your question, why's The Fully Upturned Bin always has been one of my favorite Ruby articles. It also contains some info on strings in regards to garbage collection.

Ruby: Comparing two Arrays of Hashes

I'm definitely a newbie to ruby (and using 1.9.1), so any help is appreciated. Everything I've learned about Ruby has been from using google. I'm trying to compare two arrays of hashes and due to the sizes, it's taking way to long and flirts with running out of memory. Any help would be appreciated.
I have a Class (ParseCSV) with multiple methods (initialize, open, compare, strip, output).
The way I have it working right now is as follows (and this does pass the tests I've written, just using a much smaller data set):
file1 = ParseCSV.new(“some_file”)
file2 = ParseCSV.new(“some_other_file”)
file1.open #this reads the file contents into an Array of Hash’s through the CSV library
file1.strip #This is just removing extra hash’s from each array index. So normally there are fifty hash’s in each array index, this is just done to help reduce memory consumption.
file2.open
file2.compare(“file1.storage”) ##storage is The array of hash’s from the open method
file2.output
Now what I’m struggling with is the compare method. Working on smaller data sets it’s not a big deal at all, works fast enough. However in this case I’m comparing about 400,000 records (all read into the array of hashes) against one that has about 450,000 records. I’m trying to speed this up. Also I can’t run the strip method on file2. Here is how I’m doing it now:
def compare(x)
#obviously just a verbose message
puts "Comparing and leaving behind non matching entries"
x.each do |row|
##storage is the array of hashes
#storage.each_index do |y|
if row[#opts[:field]] == #storage[y][#opts[:field]]
#storage.delete_at(y)
end
end
end
end
Hopefully that makes sense. I know it’s going to be a slow process just because it has to iterate 400,000 rows 440,000 times each. But do you have any other ideas on how to speed it up and possibly reduce memory consumption?
Yikes, that'll be O(n^2) runtime. Nasty.
A better bet would be to use the built in Set class.
Code would look something like:
require 'set'
file1_content = load_file_content_into_array_here("some_file")
file2_content = load_file_content_into_array_here("some_other_file")
file1_set = Set[file1_content]
unique_elements = file1_set - file2_content
That assumes that the files themselves have unique content. Should work in the generic case, but may have quirks depending on what your data looks like and how you parse it, but as long as the lines can be compared with == it should help you out.
Using a set will be MUCH faster than doing a nested loop to iterate over the file content.
(and yes, I have actually done this to process files with ~2 million lines, so it should be able to handle your case - eventually. If you're doing heavy data munging, Ruby may not be the best choice of tool though)
Here's a script comparing two ways of doing it: Your original compare() and a new_compare(). The new_compare uses more of the built in Enumerable methods. Since they are implemented in C, they'll be faster.
I created a constant called Test::SIZE to try out the benchmarks with different hash sizes. Results at the bottom. The difference is huge.
require 'benchmark'
class Test
SIZE = 20000
attr_accessor :storage
def initialize
file1 = []
SIZE.times { |x| file1 << { :field => x, :foo => x } }
#storage = file1
#opts = {}
#opts[:field] = :field
end
def compare(x)
x.each do |row|
#storage.each_index do |y|
if row[#opts[:field]] == #storage[y][#opts[:field]]
#storage.delete_at(y)
end
end
end
end
def new_compare(other)
other_keys = other.map { |x| x[#opts[:field]] }
#storage.reject! { |s| other_keys.include? s[#opts[:field]] }
end
end
storage2 = []
# We'll make 10 of them match
10.times { |x| storage2 << { :field => x, :foo => x } }
# And the rest wont
(Test::SIZE-10).times { |x| storage2 << { :field => x+100000000, :foo => x} }
Benchmark.bm do |b|
b.report("original compare") do
t1 = Test.new
t1.compare(storage2)
end
end
Benchmark.bm do |b|
b.report("new compare") do
t1 = Test.new
t1.new_compare(storage2)
end
end
Results:
Test::SIZE = 500
user system total real
original compare 0.280000 0.000000 0.280000 ( 0.285366)
user system total real
new compare 0.020000 0.000000 0.020000 ( 0.020458)
Test::SIZE = 1000
user system total real
original compare 28.140000 0.110000 28.250000 ( 28.618907)
user system total real
new compare 1.930000 0.010000 1.940000 ( 1.956868)
Test::SIZE = 5000
ruby test.rb
user system total real
original compare113.100000 0.440000 113.540000 (115.041267)
user system total real
new compare 7.680000 0.020000 7.700000 ( 7.739120)
Test::SIZE = 10000
user system total real
original compare453.320000 1.760000 455.080000 (460.549246)
user system total real
new compare 30.840000 0.110000 30.950000 ( 31.226218)

Resources