optimize this ruby code, switch arrays to sets/hash? - ruby

I need to optimize this code. Any suggestions to make it go faster, please tell me. I don't have a specific amount that I want it to go faster, any suggestion would be helpful. In terms of complexity I want to keep it below O(n^2)
I'm wondering if trying to convert the array that I'm using into like a set or hash because that is quicker right? How much faster in terms of complexity might this allow me to run?
The main problem I think might be my use of the ruby combination function which runs pretty slow, does anyone know exactly the complexity for this ruby function? is there a faster alternative to this?
the point of this code is basically to find the single point that is the shortest combined distance from all the other points ie (the friends house that is most convenient for everyone to go to). there is a little extra code here which has some debugging/printing functions.
class Point
attr_accessor :x, :y, :distance, :done, :count
def initialize(x,y)
#x = x
#y = y
#distance = 0
#closestPoint = []
#done = false
#count = 0
end
end
class Edge
attr_accessor :edge1, :edge2, :weight
def initialize(edge1,edge2,weight)
#edge1 = edge1
#edge2 = edge2
#weight = weight
end
end
class AdjacencyList
attr_accessor :name, :minSumList, :current
def initialize(name)
#name = name
#minSumList = []
#current = nil
#vList = []
#edgeList = []
end
def addVertex(vertex)
#vList.push(vertex)
end
def generateEdges2
minSumNode = nil
current = nil
last = nil
#vList.combination(2) { |vertex1, vertex2|
distance = distance2points(vertex1,vertex2)
edge = Edge.new(vertex1,vertex2,distance)
if (current == nil)
current = vertex1
minSumNode = vertex1
end
vertex1.distance += distance
vertex2.distance += distance
vertex1.count += 1
vertex2.count += 1
if (vertex1.count == #vList.length-1)
vertex1.done = true
elsif (vertex2.count == #vList.length-1)
vertex2.done = true
end
if ((vertex1.distance < minSumNode.distance) && (vertex1.done == true))
minSumNode = vertex1
end
##edgeList.push(edge)
}
return minSumNode.distance
end
def generateEdges
#vList.combination(2) { |vertex1, vertex2|
distance = distance2points(vertex1,vertex2)
#edgeList.push(Edge.new(vertex1,vertex2,distance))
}
end
def printEdges
#edgeList.each {|edge| puts "(#{edge.edge1.x},#{edge.edge1.y}) <=> (#{edge.edge2.x},#{edge.edge2.y}) weight: #{edge.weight}"}
end
def printDistances
#vList.each {|v| puts "(#{v.x},#{v.y} distance = #{v.distance})"}
end
end
def distance2points(point1,point2)
xdistance = (point1.x - point2.x).abs
ydistance = (point1.y - point2.y).abs
total_raw = xdistance + ydistance
return totaldistance = total_raw - [xdistance,ydistance].min
end
#pointtest1 = Point.new(0,1)
#pointtest2 = Point.new(2,5)
#pointtest3 = Point.new(3,1)
#pointtest4 = Point.new(4,0)
graph = AdjacencyList.new("graph1")
gets
while (line = gets)
graph.addVertex(Point.new(line.split[0].to_i,line.split[1].to_i))
end
#graph.addVertex(pointtest1)
#graph.addVertex(pointtest2)
#graph.addVertex(pointtest3)
#graph.addVertex(pointtest4)
puts graph.generateEdges2
#graph.printEdges
#graph.printDistances

Try to do this, and then post some more code:
ruby -rprofile your_script your_args
This will run the script under the profiler, and generate a nice table with results. If you post that here, it's more likely to get better help. Plus, you will have a more exact idea of what's consuming your CPU cycles.

Sets are basically hashes, and the advantage of hashes over arrays is O(1) find operations. Since you are simply iterating over the entire array, hashes will not offer any speed improvements if you simply replace the arrays with hashes.
Your real problem is that the running time of your algorithm is O(n^2), as in given a set of n points it will have to perform n^2 operations since you're matching every point with every other possible point.
This can be somewhat improved using hashes to cache values. For example, lets say you want the distance between point "a" and point "b". You could have a hash #distances which stores #distances["a,b"] = 52 (of course you'll have to be smart about what to use as the key). Basically just try to remove redundant operations wherever you can.
That said, the largest speed boost would be from a smarter algorithm, but I can't think of something applicable off the top of my head right now.

There's something many people know, and it won't cost you anything.
While you're trying to guess how to make the code faster, or scouring the internet for some kind of profiler, just run the program under the debugger and interrupt it while it's being slow.
Do it several times, and each time take careful note of what it's doing and why.
Here's an example in python.
The slower it is, the more obvious the problem will be.

Related

Ruby - Create a hash, where Keys are newly initialized Array objects

Please bear with me...I need basic concepts...I am not aware of advanced prog concepts yet.
I have a class called Circle which initializes and calculates area
class Circle
def initialize (radius)
#radius = radius
end
def area
3.14*#radius*#radius
end
end
I want to take user input and create however many instances of Circle objects and its sides.
p "How many Circles"
i = gets.to_i
j = 1
while j != i+1
p "Enter radius of Circle #{j}"
$s << Circle.new(gets.to_i)
j = j +1
end
The $s[] now holds array of objects I created.
Now, I want to do something like,
area_array[] = 0
area_array[Circle1] = Circle1.area
area_array[Circle1] = Circle2.area
and so on...where Circle1 and Circle2 are the objects I created earlier in my while loop....
Can someone tell me how can I put each of the created object in another array and assign an area value to it?
Do you need another array because you will modify or destroy the properties of the Circles in the first array? If so, and you can rely on the Cirlces' order in the array remaining the same, then just use the index value to correlate the values:
circle_area_hash = $s.reduce{|a, c| a[c.object_id] = c.area }
Also, consider that for your analyses, you may care more about the values, than the objects, per se. So then you could create
circle_area_hash = $s.reduce do |a, c|
a[c.area] = (a[c.area].nil?) ? [c] : a[c.area] << c
end
This make the hash-keys bey the area value as, and the hash-values are each an array of the objects that have that area.
Then to get the key (largest area) you can:
circle_area_hash.max_by{|k,v| v.count}
Also, as a thought:
puts "How many Circles"
$s = (1...gets.to_i).each |j|
puts "Enter radius of Circle #{j}"
$s << Circle.new(gets.to_i)
end
$s[3].area
To create a new array of areas:
area_array = $s.map{ |circle| circle.area }
area_array = $s.map( &:area ) # Same thing, but shorter
To create one big hash:
areas = Hash[ $s.map{ |circle| [ circle, circle.area ] } ]
This creates an array of arrays like:
[
[ <Circle #radius=3>, 28.27 ],
[ <Circle #radius=4>, 50.27 ],
…
]
…and then uses the Hash.[] method to convert that into a Hash.
Another technique is:
areas = $s.inject({}){ |hash,circle| hash.merge(circle=>circle.area) }
For more details, read up on Array#map and Enumerable#inject.
However, why would you want to create this hash? It seems like you're perhaps wanting to only calculate the area once each. Although it's not needed for this simple calculation, you can memoize a method's return value with a pattern like this:
class Circle
def initialize(radius)
#radius = radius
end
def area
#area ||= Math::PI*#radius*#radius
end
end
This will calculate the area the first time it's needed, and store it in an instance variable; thereafter it will just use the value of that variable as the return value of the method, without needing to recalculate it.
This is very straightforward. You should just iterate over $s, using each element as a hash key and the result of its area as the corresponding value.
Another few points that should be useful to you:
You can use Math::PI instead of 3.14
You should only use p for debugging. It prints the result of the inspect method of its parameter, which is rarely what you want for tidy output. Use print if you want to make your newlines explicit in the string, or puts to append a newline if there isn't one already
It is rarely appropriate to use while in Ruby. In this instance you just want i.times do { ... }
class Circle
def initialize (radius)
#radius = radius
end
def area
Math::PI * #radius * #radius
end
end
print 'How many Circles: '
i = gets.to_i
shapes = []
i.times do |n|
print "Enter radius of Circle #{n+1}? "
shapes << Circle.new(gets.to_i)
end
area_hash = {}
shapes.each do |shape|
area_hash[shape] = shape.area
end
However it seems more appropriate to memoize the area method here, writing it as
def area
#area = Math::PI * #radius * #radius unless #area
#area
end
Then you can use the method repeatedly and the calculation will be done only once.
After reading your comment on NewAlexandria's answer, perhaps something like this would work for you:
p "How many Circles"
(1..gets.to_i) do |j|
c = Circle.new
p "Enter radius of Circle #{j}"
s[c] = c.area(gets.to_i)}
end
where s is a pre-defined hash that may contain keys for instances of other circles, rectangles, etc.
This only makes sense, however, if you plan to add additional constants or methods to your shape classes that you will want to reference with the keys of s.
You should edit your question to incorporate your comment above.

Fibers vs. explicit enumerators

I am toying around with Ruby to learn the language. Currently I'm trying to wrap my head around the concept of fibers. According to this answer, they are fairly often used for creating (infinite) external enumerators. On the other hand, this seems to overlap with the concept of so called explicit enumerators.
Say, I want to write a code snippet that fires consecutive prime numbers (yes, the following algorithm has a runtime of O(scary)). I can implement it by using fibers:
prime_fiber = Fiber.new do
primes = [2]
Fiber.yield 2
current = 1
loop do
current += 2
unless primes.find {|value| (current % value) == 0}
Fiber.yield current
primes << current
end
end
end
ARGV[0].to_i.times {print "#{prime_fiber.resume}, "}
This does not emit an enumerator object by itself, although it is not difficult to create one out of it. In contrast, I can also utilize an explicitly defined enumerator, which has the added benefit of already being an enumerator object:
prime_enum = Enumerator.new do |yielder|
primes = [2]
yielder.yield 2
current = 1
loop do
current += 2
unless primes.find {|value| (current % value) == 0}
yielder.yield current
primes << current
end
end
end
ARGV[0].to_i.times {print "#{prime_enum.next}, "}
# I could also write:
# p prime_enum.first(ARGV[0].to_i)
Both methods allow me to implement some sort of co-routines and they seem to be interchangeable to me. So when do I prefer one over the other? Is there some commonly agreed practice? I find it difficult to get all those idioms in my head, so I apologize in advance if this is considered a dumb question.
I would use Enumerator, it allows you to use take, take_while, even each if your sequence is finite. While Fiber is designed for light weight concurrency and is pretty limited as enumerator.
prime_enum.take(ARGV[0].to_i).each { |x| puts x }
or
prime_enum.take_while { |x| x < ARGV[0].to_i }.each { |x| puts x }

Lychrel numbers

First of all, for those of you, who don't know (or forgot) about Lychrel numbers, here is an entry from Wikipedia: http://en.wikipedia.org/wiki/Lychrel_number.
I want to implement the Lychrel number detector in the range from 0 to 10_000. Here is my solution:
class Integer
# Return a reversed integer number, e.g.:
#
# 1632.reverse #=> 2361
#
def reverse
self.to_s.reverse.to_i
end
# Check, whether given number
# is the Lychrel number or not.
#
def lychrel?(depth=30)
if depth == 0
return true
elsif self == self.reverse and depth != 30 # [1]
return false
end
# In case both statements are false, try
# recursive "reverse and add" again.
(self + self.reverse).lychrel?(depth-1)
end
end
puts (0..10000).find_all(&:lychrel?)
The issue with this code is the depth value [1]. So, basically, depth is a value, that defines how many times we need to proceed through the iteration process, to be sure, that current number is really a Lychrel number. The default value is 30 iterations, but I want to add more latitude, so programmer can specify his own depth through method's parameter. The 30 iterations is perfect for such small range as I need, but if I want to cover all natural numbers, I have to be more agile.
Because of the recursion, that takes a place in Integer#lychrel?, I can't be agile. If I had provided an argument to the lychrel?, there wouldn't have been any changes because of the [1] statement.
So, my question sounds like this: "How do I refactor my method, so it will accept parameters correctly?".
What you currently have is known as tail recursion. This can usually be re-written as a loop to get rid of the recursive call and eliminate the risk of running out of stack space. Try something more like this:
def lychrel?(depth=30)
val = self
first_iteration = true
while depth > 0 do
# Return false if the number has become a palindrome,
# but allow a palindrome as input
if first_iteration
first_iteration = false
else
if val == val.reverse
return false
end
# Perform next iteration
val = (val + val.reverse)
depth = depth - 1
end
return true
end
I don't have Ruby installed on this machine so I can't verify whether that 's 100% correct, but you get the idea. Also, I'm assuming that the purpose of the and depth != 30 bit is to allow a palindrome to be provided as input without immediately returning false.
By looping, you can use a state variable like first_iteration to keep track of whether or not you need to do the val == val.reverse check. With the recursive solution, scoping limitations prevent you from tracking this easily (you'd have to add another function parameter and pass the state variable to each recursive call in turn).
A more clean and ruby-like solution:
class Integer
def reverse
self.to_s.reverse.to_i
end
def lychrel?(depth=50)
n = self
depth.times do |i|
r = n.reverse
return false if i > 0 and n == r
n += r
end
true
end
end
puts (0...10000).find_all(&:lychrel?) #=> 249 numbers
bta's solution with some corrections:
class Integer
def reverse
self.to_s.reverse.to_i
end
def lychrel?(depth=30)
this = self
first_iteration = true
begin
if first_iteration
first_iteration = false
elsif this == this.reverse
return false
end
this += this.reverse
depth -= 1
end while depth > 0
return true
end
end
puts (1..10000).find_all { |num| num.lychrel?(255) }
Not so fast, but it works:
code/practice/ruby% time ruby lychrel.rb > /dev/null
ruby lychrel.rb > /dev/null 1.14s user 0.00s system 99% cpu 1.150 total

Efficient Ruby LRU cache

What's the most efficient way to build a cache with arbitrary Ruby objects as keys that are expired based on a least recently used algorithm. It should use Ruby's normal hashing semantics (not equal?)
I know its a few years late, but I just implemented what I believe is the fastest LRU Cache out there for Ruby.
It is also tested and optionally safe to use in multi threaded environments.
https://github.com/SamSaffron/lru_redux
Note: in Ruby 1.9 Hash is ordered, so you can cheat and build the fastest LRU cache in a few lines of code
class LruRedux::Cache19
def initialize(max_size)
#max_size = max_size
#data = {}
end
def max_size=(size)
raise ArgumentError.new(:max_size) if #max_size < 1
#max_size = size
if #max_size < #data.size
#data.keys[0..#max_size-#data.size].each do |k|
#data.delete(k)
end
end
end
def [](key)
found = true
value = #data.delete(key){ found = false }
if found
#data[key] = value
else
nil
end
end
def []=(key,val)
#data.delete(key)
#data[key] = val
if #data.length > #max_size
#data.delete(#data.first[0])
end
val
end
def each
#data.reverse.each do |pair|
yield pair
end
end
# used further up the chain, non thread safe each
alias_method :each_unsafe, :each
def to_a
#data.to_a.reverse
end
def delete(k)
#data.delete(k)
end
def clear
#data.clear
end
def count
#data.count
end
# for cache validation only, ensures all is sound
def valid?
true
end
end
This pushes the boundaries of my understanding of how Ruby uses memory, but I suspect that the most efficient implementation would be a doubly-linked list where every access moves the key to the front of the list, and every insert drops an item if the maximum size has been reached.
However, assuming Ruby's Hash class is already very efficient, I'd bet that the somewhat naive solution of simply adding age data to a Hash would be fairly good. Here's a quick toy example that does this:
class Cache
attr_accessor :max_size
def initialize(max_size = 4)
#data = {}
#max_size = max_size
end
def store(key, value)
#data.store key, [0, value]
age_keys
prune
end
def read(key)
if value = #data[key]
renew(key)
age_keys
end
value
end
private # -------------------------------
def renew(key)
#data[key][0] = 0
end
def delete_oldest
m = #data.values.map{ |v| v[0] }.max
#data.reject!{ |k,v| v[0] == m }
end
def age_keys
#data.each{ |k,v| #data[k][0] += 1 }
end
def prune
delete_oldest if #data.size > #max_size
end
end
There's probably a faster way of finding the oldest item, and this is not thoroughly tested, but I'd be curious to know how anyone thinks this compares to a more sophisticated design, linked list or otherwise.
Remaze has a reasonably well tested LRU Cache: See http://github.com/manveru/ramaze/blob/master/lib/ramaze/snippets/ramaze/lru_hash.rb
And there is also the hashery gem by rubyworks which should be more efficient than the remaze one for large caches.
The rufus-lru gem is another option.
Instead of a count it just keeps a sorted array of keys from oldest to newest
I threw together a new gem lrucache which you may find useful. It may be faster than Alex's approach for collections with a significant number of elements.
Very simple and fast lru cache I use in our http backend https://github.com/grosser/i18n-backend-http/blob/master/lib/i18n/backend/http/lru_cache.rb
gem install ruby-cache
--> http://www.nongnu.org/pupa/ruby-cache-README.html

How to return a Ruby array intersection with duplicate elements? (problem with bigrams in Dice Coefficient)

I'm trying to script Dice's Coefficient, but I'm having a bit of a problem with the array intersection.
def bigram(string)
string.downcase!
bgarray=[]
bgstring="%"+string+"#"
bgslength = bgstring.length
0.upto(bgslength-2) do |i|
bgarray << bgstring[i,2]
end
return bgarray
end
def approx_string_match(teststring, refstring)
test_bigram = bigram(teststring) #.uniq
ref_bigram = bigram(refstring) #.uniq
bigram_overlay = test_bigram & ref_bigram
result = (2*bigram_overlay.length.to_f)/(test_bigram.length.to_f+ref_bigram.length.to_f)*100
return result
end
The problem is, as & removes duplicates, I get stuff like this:
string1="Almirante Almeida Almada"
string2="Almirante Almeida Almada"
puts approx_string_match(string1, string2) => 76.0%
It should return 100.
The uniq method nails it, but there is information loss, which may bring unwanted matches in the particular dataset I'm working.
How can I get an intersection with all duplicates included?
As Yuval F said you should use multiset. However, there is nomultiset in Ruby standard library , Take at look at here and here.
If performance is not that critical for your application, you still can do it usingArray with a little bit code.
def intersect a , b
a.inject([]) do |intersect, s|
index = b.index(s)
unless index.nil?
intersect << s
b.delete_at(index)
end
intersect
end
end
a= ["al","al","lc" ,"lc","ld"]
b = ["al","al" ,"lc" ,"ef"]
puts intersect(a ,b).inspect #["al", "al", "lc"]
From this link I believe you should not use Ruby's sets but rather multisets, so that every bigram gets counted the number of times it appears. Maybe you can use this gem for multisets. This should give a correct behavior for recurring bigrams.
I toyed with this, based on the answer from #pierr, for a while and ended up with this.
a = ["al","al","lc","lc","lc","lc","ld"]
b = ["al","al","al","al","al","lc","ef"]
result=[]
h1,h2=Hash.new(0),Hash.new(0)
a.each{|x| h1[x]+=1}
b.each{|x| h2[x]+=1}
h1.each_pair{|key,val| result<<[key]*[val,h2[key]].min if h2[key]!=0}
result.flatten
=> ["al", "al", "lc"]
This could be a kind of multiset intersect of a & b but don't take my word for it because I haven't tested it enough to be sure.

Resources