I am having a very common refactor situation here with me, and after going through a few blogs I still didn't get any satisfactory comment on the same; so asking a question here.
h = {
a: 'a',
b: 'b'
}
new_hash = {}
new_hash[:a] = h[:a].upcase if h[:a].present?
According to my friend, this code can be refactored in following way to improve performance.
a = h[:a]
new_hash[:a] = a.upcase if a.present?
At first glance it looks a little optimized. But is it something that'll make a lot of difference or its an over-optimization? And which style should be preferred?
Looking for an expert advice :)
UPDATE with Benchmark n = 1000
user system total real
hash lookup 0.000000 0.000000 0.000000 ( 0.000014)
new var 0.000000 0.000000 0.000000 ( 0.000005)
AND op 0.000000 0.000000 0.000000 ( 0.000018)
try 0.000000 0.000000 0.000000 ( 0.000046)
UPDATE with Memory Benchmark using gem benchmark-memory
Calculating -------------------------------------
hash lookup 40.000 memsize ( 40.000 retained)
1.000 objects ( 1.000 retained)
1.000 strings ( 1.000 retained)
new var 0.000 memsize ( 0.000 retained)
0.000 objects ( 0.000 retained)
0.000 strings ( 0.000 retained)
AND op 40.000 memsize ( 40.000 retained)
1.000 objects ( 1.000 retained)
1.000 strings ( 1.000 retained)
try 200.000 memsize ( 40.000 retained)
5.000 objects ( 1.000 retained)
1.000 strings ( 1.000 retained)
Depending on your circumstances rails methods like present? can be dirty and definitely impact performance. If you are only concerned about a nil check and not things like empty Array or blank String then using pure ruby methods will be "much" faster (the quotes are to emphasize the fact that performance is completely inconsequential in this basic example)
Since we are benchmarking things.
Setup
h = {
a: 'a',
b: 'b'
}
class Object
def present?
!blank?
end
def blank?
respond_to?(:empty?) ? !!empty? : !self
end
end
def hash_lookup(h)
new_hash = {}
new_hash[:a] = h[:a].upcase if h[:a].present?
new_hash
end
def new_var(h)
new_hash = {}
a = h[:a]
new_hash[:a] = a.upcase if a.present?
new_hash
end
def hash_lookup_w_safe_nav(h)
new_hash = {}
new_hash[:a] = h[:a]&.upcase
new_hash
end
def hash_lookup_wo_rails(h)
new_hash = {}
new_hash[:a] = h[:a].upcase if h[:a]
new_hash
end
def new_var_wo_rails(h)
new_hash = {}
a = h[:a]
new_hash[:a] = a.upcase if a
new_hash
end
Benchmarks
N = [1_000,10_000,100_000]
require 'benchmark'
N.each do |n|
puts "OVER #{n} ITERATIONS"
Benchmark.bm do |x|
x.report(:new_var) { n.times {new_var(h)}}
x.report(:hash_lookup) { n.times {hash_lookup(h)}}
x.report(:hash_lookup_w_safe_nav) { n.times {hash_lookup_w_safe_nav(h)}}
x.report(:hash_lookup_wo_rails) { n.times {hash_lookup_wo_rails(h)}}
x.report(:new_var_wo_rails) { n.times {new_var_wo_rails(h)}}
end
end
Output
OVER 1000 ITERATIONS
user system total real
new_var 0.001075 0.000159 0.001234 ( 0.001231)
hash_lookup 0.002441 0.000000 0.002441 ( 0.002505)
hash_lookup_w_safe_nav 0.001077 0.000000 0.001077 ( 0.001077)
hash_lookup_wo_rails 0.001100 0.000000 0.001100 ( 0.001145)
new_var_wo_rails 0.001015 0.000000 0.001015 ( 0.001016)
OVER 10000 ITERATIONS
user system total real
new_var 0.010321 0.000000 0.010321 ( 0.010329)
hash_lookup 0.010104 0.000015 0.010119 ( 0.010123)
hash_lookup_w_safe_nav 0.007211 0.000000 0.007211 ( 0.007213)
hash_lookup_wo_rails 0.007508 0.000000 0.007508 ( 0.017302)
new_var_wo_rails 0.008186 0.000026 0.008212 ( 0.016679)
OVER 100000 ITERATIONS
user system total real
new_var 0.099400 0.000249 0.099649 ( 0.192481)
hash_lookup 0.101419 0.000009 0.101428 ( 0.199788)
hash_lookup_w_safe_nav 0.078156 0.000010 0.078166 ( 0.140796)
hash_lookup_wo_rails 0.078743 0.000000 0.078743 ( 0.166815)
new_var_wo_rails 0.073271 0.000000 0.073271 ( 0.125869)
Optimization wears different shoes, there's memory optimization, performance optimization, and there's the readability and how the code is structured.
Performance: There's nearly no effect whatsoever on the speed and performance because the hash is being accessed in O(1). Try using benchmark to see yourself that there's nearly no difference
You can check this article about hash lookup and why it's so fast
Memory: Your friend's Code is less optimized than yours because he initialized another object a while yours doesn't.
Readability and Style: At the first glance your friend's code looks like fewer lines and more descriptive. But keep in mind that you may need to do this for every key/value in the hash, so you may need to have a, b, and it goes on as your hash goes on (When it goes like that it's better to iterate over the hash of course). Not too much to look on here tbh
I ran a benchmark to see whether memoizing attributes was faster than reading from a configuration hash. The code below is an example. Can anybody explain them?
The Test
require 'benchmark'
class MyClassWithStuff
DEFAULT_VALS = { one: '1', two: 2, three: 3, four: 4 }
def memoized_fetch
#value ||= DEFAULT_VALS[:one]
end
def straight_fetch
DEFAULT_VALS[:one]
end
end
TIMES = 10000
CALL_TIMES = 1000
Benchmark.bmbm do |test|
test.report("Memoized") do
TIMES.times do
instance = MyClassWithStuff.new
CALL_TIMES.times { |i| instance.memoized_fetch }
end
end
test.report("Fetched") do
TIMES.times do
instance = MyClassWithStuff.new
CALL_TIMES.times { |i| instance.straight_fetch }
end
end
end
Results
Rehearsal --------------------------------------------
Memoized 1.500000 0.010000 1.510000 ( 1.510230)
Fetched 1.330000 0.000000 1.330000 ( 1.342800)
----------------------------------- total: 2.840000sec
user system total real
Memoized 1.440000 0.000000 1.440000 ( 1.456937)
Fetched 1.260000 0.000000 1.260000 ( 1.269904)
I recently installed Ruby 2.0.0 and found that it now has a lazy method for the Enumerable mixin. From previous experience in functional languages, I know that this makes for more efficient code.
I did a benchmark (not sure if it is moot) of lazy versus eager and found that lazy was continually faster. Why is this? What makes lazy evaluation better for large input?
Benchmark code:
#!/usr/bin/env ruby
require 'benchmark'
num = 1000
arr = (1..50000).to_a
Benchmark.bm do |rep|
rep.report('lazy') { num.times do ; arr.lazy.map { |x| x * 2 }; end }
rep.report('eager') { num.times do ; arr.map { |x| x * 2}; end }
end
Benchmark report sample:
user system total real
lazy 0.000000 0.000000 0.000000 ( 0.009502)
eager 5.550000 0.480000 6.030000 ( 6.231269)
It's so lazy it's not even doing the work - probably because you're not actually using the results of the operation. Put a sleep() in there to confirm:
> Benchmark.bm do |rep|
rep.report('lazy') { num.times do ; arr.lazy.map { |x| sleep(5) }; end }
rep.report('notlazy') { 1.times do ; [0,1].map { |x| sleep(5) } ; end }
end
user system total real
lazy 0.010000 0.000000 0.010000 ( 0.007130)
notlazy 0.000000 0.000000 0.000000 ( 10.001788)
I have learned two array sorting methods in Ruby:
array = ["one", "two", "three"]
array.sort.reverse!
or:
array = ["one", "two", "three"]
array.sort { |x,y| y<=>x }
And I am not able to differentiate between the two. Which method is better and how exactly are they different in execution?
Both lines do the same (create a new array, which is reverse sorted). The main argument is about readability and performance. array.sort.reverse! is more readable than array.sort{|x,y| y<=>x} - I think we can agree here.
For the performance part, I created a quick benchmark script, which gives the following on my system (ruby 1.9.3p392 [x86_64-linux]):
user system total real
array.sort.reverse 1.330000 0.000000 1.330000 ( 1.334667)
array.sort.reverse! 1.200000 0.000000 1.200000 ( 1.198232)
array.sort!.reverse! 1.200000 0.000000 1.200000 ( 1.199296)
array.sort{|x,y| y<=>x} 5.220000 0.000000 5.220000 ( 5.239487)
Run times are pretty constant for multiple executions of the benchmark script.
array.sort.reverse (with or without !) is way faster than array.sort{|x,y| y<=>x}. Thus, I recommend that.
Here is the script as a Reference:
#!/usr/bin/env ruby
require 'benchmark'
Benchmark.bm do|b|
master = (1..1_000_000).map(&:to_s).shuffle
a = master.dup
b.report("array.sort.reverse ") do
a.sort.reverse
end
a = master.dup
b.report("array.sort.reverse! ") do
a.sort.reverse!
end
a = master.dup
b.report("array.sort!.reverse! ") do
a.sort!.reverse!
end
a = master.dup
b.report("array.sort{|x,y| y<=>x} ") do
a.sort{|x,y| y<=>x}
end
end
There really is no difference here. Both methods return a new array.
For the purposes of this example, simpler is better. I would recommend array.sort.reverse because it is much more readable than the alternative. Passing blocks to methods like sort should be saved for arrays of more complex data structures and user-defined classes.
Edit: While destructive methods (anything ending in a !) are good for performance games, it was pointed out that they aren't required to return an updated array, or anything at all for that matter. It is important to keep this in mind because array.sort.reverse! could very likely return nil. If you wish to use a destructive method on a newly generated array, you should prefer calling .reverse! on a separate line instead of having a one-liner.
Example:
array = array.sort
array.reverse!
should be preferred to
array = array.sort.reverse!
Reverse! is Faster
There's often no substitute for benchmarking. While it probably makes no difference in shorter scripts, the #reverse! method is significantly faster than sorting using the "spaceship" operator. For example, on MRI Ruby 2.0, and given the following benchmark code:
require 'benchmark'
array = ["one", "two", "three"]
loops = 1_000_000
Benchmark.bmbm do |bm|
bm.report('reverse!') { loops.times {array.sort.reverse!} }
bm.report('spaceship') { loops.times {array.sort {|x,y| y<=>x} }}
end
the system reports that #reverse! is almost twice as fast as using the combined comparison operator.
user system total real
reverse! 0.340000 0.000000 0.340000 ( 0.344198)
spaceship 0.590000 0.010000 0.600000 ( 0.595747)
My advice: use whichever is more semantically meaningful in a given context, unless you're running in a tight loop.
With comparison as simple as your example, there is not much difference, but as the formula for comparison gets complicated, it is better to avoid using <=> with a block because the block you pass will be evaluated for each element of the array, causing redundancy. Consider this:
array.sort{|x, y| some_expensive_method(x) <=> some_expensive_method(y)}
In this case, some_expensive_method will be evaluated for each possible pair of element of array.
In your particular case, use of a block with <=> can be avoided with reverse.
array.sort_by{|x| some_expensive_method(x)}.reverse
This is called Schwartzian transform.
In playing with tessi's benchmarks on my machine, I've gotten some interesting results. I'm running ruby 2.0.0p195 [x86_64-darwin12.3.0], i.e., latest release of Ruby 2 on an OS X system. I used bmbm rather than bm from the Benchmark module. My timings are:
Rehearsal -------------------------------------------------------------
array.sort.reverse: 1.010000 0.000000 1.010000 ( 1.020397)
array.sort.reverse!: 0.810000 0.000000 0.810000 ( 0.808368)
array.sort!.reverse!: 0.800000 0.010000 0.810000 ( 0.809666)
array.sort{|x,y| y<=>x}: 0.300000 0.000000 0.300000 ( 0.291002)
array.sort!{|x,y| y<=>x}: 0.100000 0.000000 0.100000 ( 0.105345)
---------------------------------------------------- total: 3.030000sec
user system total real
array.sort.reverse: 0.210000 0.000000 0.210000 ( 0.208378)
array.sort.reverse!: 0.030000 0.000000 0.030000 ( 0.027746)
array.sort!.reverse!: 0.020000 0.000000 0.020000 ( 0.020082)
array.sort{|x,y| y<=>x}: 0.110000 0.000000 0.110000 ( 0.107065)
array.sort!{|x,y| y<=>x}: 0.110000 0.000000 0.110000 ( 0.105359)
First, note that in the Rehearsal phase that sort! using a comparison block comes in as the clear winner. Matz must have tuned the heck out of it in Ruby 2!
The other thing that I found exceedingly weird was how much improvement array.sort.reverse! and array.sort!.reverse! exhibited in the production pass. It was so extreme it made me wonder whether I had somehow screwed up and passed these already sorted data, so I added explicit checks for sorted or reverse-sorted data prior to performing each benchmark.
My variant of tessi's script follows:
#!/usr/bin/env ruby
require 'benchmark'
class Array
def sorted?
(1...length).each {|i| return false if self[i] < self[i-1] }
true
end
def reversed?
(1...length).each {|i| return false if self[i] > self[i-1] }
true
end
end
master = (1..1_000_000).map(&:to_s).shuffle
Benchmark.bmbm(25) do|b|
a = master.dup
puts "uh-oh!" if a.sorted?
puts "oh-uh!" if a.reversed?
b.report("array.sort.reverse:") { a.sort.reverse }
a = master.dup
puts "uh-oh!" if a.sorted?
puts "oh-uh!" if a.reversed?
b.report("array.sort.reverse!:") { a.sort.reverse! }
a = master.dup
puts "uh-oh!" if a.sorted?
puts "oh-uh!" if a.reversed?
b.report("array.sort!.reverse!:") { a.sort!.reverse! }
a = master.dup
puts "uh-oh!" if a.sorted?
puts "oh-uh!" if a.reversed?
b.report("array.sort{|x,y| y<=>x}:") { a.sort{|x,y| y<=>x} }
a = master.dup
puts "uh-oh!" if a.sorted?
puts "oh-uh!" if a.reversed?
b.report("array.sort!{|x,y| y<=>x}:") { a.sort!{|x,y| y<=>x} }
end
I have example this number: 5032
I want to get this: 5.0.32
How can I do this with ruby string manipulation?
I'd be curious to hear the pros and cons of these different solutions. What's the fastest ? What's the clearest ? Are regular expressions expensive ?
Here's yet another solution:
sprintf("%s.%s.%s%s", *5032.to_s.split(""))
Here's our results. Mine is slowest:
require 'benchmark'
n = 500000
Benchmark.bm do |x|
x.report { n.times {"5032".sub(/^(.)(.)/,"\\1.\\2.")}}
x.report { n.times {"5032".insert(2, ".").insert(1, ".")}}
x.report { n.times {sprintf("%s.%s.%s%s", *5032.to_s.split("")) }}
end
user system total real
0.610000 0.000000 0.610000 ( 0.607663)
0.320000 0.000000 0.320000 ( 0.325050)
3.030000 0.000000 3.030000 ( 3.029342)
Your question is a little bit vague but you can do this:
number = "5032"
number = "5032".insert(2, ".").insert(1, ".")
puts number
See the API doc for insert here.
> 5032.to_s.sub(/^(.)(.)/,"\\1.\\2.")
=> "5.0.32"