Efficient Ruby LRU cache - ruby

What's the most efficient way to build a cache with arbitrary Ruby objects as keys that are expired based on a least recently used algorithm. It should use Ruby's normal hashing semantics (not equal?)

I know its a few years late, but I just implemented what I believe is the fastest LRU Cache out there for Ruby.
It is also tested and optionally safe to use in multi threaded environments.
https://github.com/SamSaffron/lru_redux
Note: in Ruby 1.9 Hash is ordered, so you can cheat and build the fastest LRU cache in a few lines of code
class LruRedux::Cache19
def initialize(max_size)
#max_size = max_size
#data = {}
end
def max_size=(size)
raise ArgumentError.new(:max_size) if #max_size < 1
#max_size = size
if #max_size < #data.size
#data.keys[0..#max_size-#data.size].each do |k|
#data.delete(k)
end
end
end
def [](key)
found = true
value = #data.delete(key){ found = false }
if found
#data[key] = value
else
nil
end
end
def []=(key,val)
#data.delete(key)
#data[key] = val
if #data.length > #max_size
#data.delete(#data.first[0])
end
val
end
def each
#data.reverse.each do |pair|
yield pair
end
end
# used further up the chain, non thread safe each
alias_method :each_unsafe, :each
def to_a
#data.to_a.reverse
end
def delete(k)
#data.delete(k)
end
def clear
#data.clear
end
def count
#data.count
end
# for cache validation only, ensures all is sound
def valid?
true
end
end

This pushes the boundaries of my understanding of how Ruby uses memory, but I suspect that the most efficient implementation would be a doubly-linked list where every access moves the key to the front of the list, and every insert drops an item if the maximum size has been reached.
However, assuming Ruby's Hash class is already very efficient, I'd bet that the somewhat naive solution of simply adding age data to a Hash would be fairly good. Here's a quick toy example that does this:
class Cache
attr_accessor :max_size
def initialize(max_size = 4)
#data = {}
#max_size = max_size
end
def store(key, value)
#data.store key, [0, value]
age_keys
prune
end
def read(key)
if value = #data[key]
renew(key)
age_keys
end
value
end
private # -------------------------------
def renew(key)
#data[key][0] = 0
end
def delete_oldest
m = #data.values.map{ |v| v[0] }.max
#data.reject!{ |k,v| v[0] == m }
end
def age_keys
#data.each{ |k,v| #data[k][0] += 1 }
end
def prune
delete_oldest if #data.size > #max_size
end
end
There's probably a faster way of finding the oldest item, and this is not thoroughly tested, but I'd be curious to know how anyone thinks this compares to a more sophisticated design, linked list or otherwise.

Remaze has a reasonably well tested LRU Cache: See http://github.com/manveru/ramaze/blob/master/lib/ramaze/snippets/ramaze/lru_hash.rb
And there is also the hashery gem by rubyworks which should be more efficient than the remaze one for large caches.

The rufus-lru gem is another option.
Instead of a count it just keeps a sorted array of keys from oldest to newest

I threw together a new gem lrucache which you may find useful. It may be faster than Alex's approach for collections with a significant number of elements.

Very simple and fast lru cache I use in our http backend https://github.com/grosser/i18n-backend-http/blob/master/lib/i18n/backend/http/lru_cache.rb

gem install ruby-cache
--> http://www.nongnu.org/pupa/ruby-cache-README.html

Related

How to finetune map function and avoid using flatten

I have the following code to list all possible permutations of a give string. But due to my awkward list (ruby array) manipulation and limited knowledge on functional programming, I have to use flatten to get the result array. It is pretty much a hack. How can I refactor the code and avoid using (abusing) flatten?
class String
def remove_char_at(i)
if i==0
self[1..-1]
else
self[0..i-1] + self[i+1..-1]
end
end
end
def permute(str,prefix="")
if str.size==0
prefix
else
str.chars.each_with_index.map do |s,i|
permute(str.remove_char_at(i),prefix+s)
end.flatten
end
end
You can find intresting things about functional programming in first chapters of SICP
def permute2(str,prefix="")
if str.size==0
[prefix] #revise for concatenate with memo
else
str.chars.each_with_index.inject([]) do |memo, ary|
s = ary[0]
i = ary[1]
memo += permute2(str.remove_char_at(i),prefix+s) #memoize
end
end
end
Ruby has done much of the hard work for you. To get all permutations for a string, myString, do the following:
myString.split('').permutation.map(&:join).uniq
This splits the string components into an array; gets all the permutations of the array; joins those back into strings; weeds out duplicates.
class String
def remove_char_at(i)
if i==0
self[1..-1]
else
self[0..i-1] + self[i+1..-1]
end
end
end
can be refactored as follows by using ... instead of ..
class String
def remove_char_at(i)
self[0...i] + self[i+1..-1]
end
end
I'm specifically answering the How can I refactor the code and avoid using (abusing) flatten? part:
Instead of map + flatten, you can just use flat_map which was introduced in 1.9.2.

Explanation of Ruby code for building Trie data structures

So I have this ruby code I grabbed from wikipedia and I modified a bit:
#trie = Hash.new()
def build(str)
node = #trie
str.each_char { |ch|
cur = ch
prev_node = node
node = node[cur]
if node == nil
prev_node[cur] = Hash.new()
node = prev_node[cur]
end
}
end
build('dogs')
puts #trie.inspect
I first ran this on console irb, and each time I output node, it just keeps giving me an empty hash each time {}, but when I actually invoke that function build with parameter 'dogs' string, it actually does work, and outputs {"d"=>{"o"=>{"g"=>{"s"=>{}}}}}, which is totally correct.
This is probably more of a Ruby question than the actual question about how the algorithm works. I don't really have adequate Ruby knowledge to decipher what is going on there I guess.
You're probably getting lost inside that mess of code which takes an approach that seems a better fit for C++ than for Ruby. Here's the same thing in a more concise format that uses a special case Hash for storage:
class Trie < Hash
def initialize
# Ensure that this is not a special Hash by disallowing
# initialization options.
super
end
def build(string)
string.chars.inject(self) do |h, char|
h[char] ||= { }
end
end
end
It works exactly the same but doesn't have nearly the same mess with pointers and such:
trie = Trie.new
trie.build('dogs')
puts trie.inspect
Ruby's Enumerable module is full of amazingly useful methods like inject which is precisely what you want for a situation like this.
I think you are just using irb incorrectly. You should type the whole function in, then run it, and see if you get correct results. If it doesn't work, how about you post your entire IRB session here.
Also here is a simplified version of your code:
def build(str)
node = #trie
str.each_char do |ch|
node = (node[ch] ||= {})
end
# not sure what the return value should be
end

optimize this ruby code, switch arrays to sets/hash?

I need to optimize this code. Any suggestions to make it go faster, please tell me. I don't have a specific amount that I want it to go faster, any suggestion would be helpful. In terms of complexity I want to keep it below O(n^2)
I'm wondering if trying to convert the array that I'm using into like a set or hash because that is quicker right? How much faster in terms of complexity might this allow me to run?
The main problem I think might be my use of the ruby combination function which runs pretty slow, does anyone know exactly the complexity for this ruby function? is there a faster alternative to this?
the point of this code is basically to find the single point that is the shortest combined distance from all the other points ie (the friends house that is most convenient for everyone to go to). there is a little extra code here which has some debugging/printing functions.
class Point
attr_accessor :x, :y, :distance, :done, :count
def initialize(x,y)
#x = x
#y = y
#distance = 0
#closestPoint = []
#done = false
#count = 0
end
end
class Edge
attr_accessor :edge1, :edge2, :weight
def initialize(edge1,edge2,weight)
#edge1 = edge1
#edge2 = edge2
#weight = weight
end
end
class AdjacencyList
attr_accessor :name, :minSumList, :current
def initialize(name)
#name = name
#minSumList = []
#current = nil
#vList = []
#edgeList = []
end
def addVertex(vertex)
#vList.push(vertex)
end
def generateEdges2
minSumNode = nil
current = nil
last = nil
#vList.combination(2) { |vertex1, vertex2|
distance = distance2points(vertex1,vertex2)
edge = Edge.new(vertex1,vertex2,distance)
if (current == nil)
current = vertex1
minSumNode = vertex1
end
vertex1.distance += distance
vertex2.distance += distance
vertex1.count += 1
vertex2.count += 1
if (vertex1.count == #vList.length-1)
vertex1.done = true
elsif (vertex2.count == #vList.length-1)
vertex2.done = true
end
if ((vertex1.distance < minSumNode.distance) && (vertex1.done == true))
minSumNode = vertex1
end
##edgeList.push(edge)
}
return minSumNode.distance
end
def generateEdges
#vList.combination(2) { |vertex1, vertex2|
distance = distance2points(vertex1,vertex2)
#edgeList.push(Edge.new(vertex1,vertex2,distance))
}
end
def printEdges
#edgeList.each {|edge| puts "(#{edge.edge1.x},#{edge.edge1.y}) <=> (#{edge.edge2.x},#{edge.edge2.y}) weight: #{edge.weight}"}
end
def printDistances
#vList.each {|v| puts "(#{v.x},#{v.y} distance = #{v.distance})"}
end
end
def distance2points(point1,point2)
xdistance = (point1.x - point2.x).abs
ydistance = (point1.y - point2.y).abs
total_raw = xdistance + ydistance
return totaldistance = total_raw - [xdistance,ydistance].min
end
#pointtest1 = Point.new(0,1)
#pointtest2 = Point.new(2,5)
#pointtest3 = Point.new(3,1)
#pointtest4 = Point.new(4,0)
graph = AdjacencyList.new("graph1")
gets
while (line = gets)
graph.addVertex(Point.new(line.split[0].to_i,line.split[1].to_i))
end
#graph.addVertex(pointtest1)
#graph.addVertex(pointtest2)
#graph.addVertex(pointtest3)
#graph.addVertex(pointtest4)
puts graph.generateEdges2
#graph.printEdges
#graph.printDistances
Try to do this, and then post some more code:
ruby -rprofile your_script your_args
This will run the script under the profiler, and generate a nice table with results. If you post that here, it's more likely to get better help. Plus, you will have a more exact idea of what's consuming your CPU cycles.
Sets are basically hashes, and the advantage of hashes over arrays is O(1) find operations. Since you are simply iterating over the entire array, hashes will not offer any speed improvements if you simply replace the arrays with hashes.
Your real problem is that the running time of your algorithm is O(n^2), as in given a set of n points it will have to perform n^2 operations since you're matching every point with every other possible point.
This can be somewhat improved using hashes to cache values. For example, lets say you want the distance between point "a" and point "b". You could have a hash #distances which stores #distances["a,b"] = 52 (of course you'll have to be smart about what to use as the key). Basically just try to remove redundant operations wherever you can.
That said, the largest speed boost would be from a smarter algorithm, but I can't think of something applicable off the top of my head right now.
There's something many people know, and it won't cost you anything.
While you're trying to guess how to make the code faster, or scouring the internet for some kind of profiler, just run the program under the debugger and interrupt it while it's being slow.
Do it several times, and each time take careful note of what it's doing and why.
Here's an example in python.
The slower it is, the more obvious the problem will be.

When is the Enumerator::Yielder#yield method useful?

The question "Meaning of the word yield" mentions the Enumerator::Yielder#yield method. I haven't used it before, and wonder under what circumstances it would be useful.
Is it mainly useful when you want to create an infinite list of items, such as the Sieve of Eratosthenes, and when you need to use an external iterator?
"How to create an infinite enumerable of Times?" talks about constructing and lazy iterators, but my favorite usage is wrapping an existing Enumerable with additional functionality (any enumerable, without needing to know what it really is, whether it's infinite or not etc).
A trivial example would be implementing the each_with_index method (or, more generally, with_index method):
module Enumerable
def my_with_index
Enumerator.new do |yielder|
i = 0
self.each do |e|
yielder.yield e, i
i += 1
end
end
end
def my_each_with_index
self.my_with_index.each do |e, i|
yield e, i
end
end
end
[:foo, :bar, :baz].my_each_with_index do |e,i|
puts "#{i}: #{e}"
end
#=>0: foo
#=>1: bar
#=>2: baz
Extending to something not already implemented in the core library, such as cyclically assigning value from a given array to each enumerable element (say, for coloring table rows):
module Enumerable
def with_cycle values
Enumerator.new do |yielder|
self.each do |e|
v = values.shift
yielder.yield e, v
values.push v
end
end
end
end
p (1..10).with_cycle([:red, :green, :blue]).to_a # works with any Enumerable, such as Range
#=>[[1, :red], [2, :green], [3, :blue], [4, :red], [5, :green], [6, :blue], [7, :red], [8, :green], [9, :blue], [10, :red]]
The whole point is that these methods return an Enumerator, which you then combine with the usual Enumerable methods, such as select, map, inject etc.
For example you can use it to construct Rack response bodies inline, without creating classes. An Enumerator can also work "outside-in" - you call Enumerator#each which calls next on the enumerator and returns every value in sequence. For example, you can make a Rack response body returning a sequence of numbers:
run ->(env) {
body = Enumerator.new do |y|
9.times { |i| y.yield(i.to_s) }
end
[200, {'Content-Length' => '9'}, body]
}
Since Mladen mentioned getting other answers, I thought I would give an example of something I just did earlier today while writing an application that will receive data from multiple physical devices, analyze the data, and connect related data (that we see from multiple devices). This is a long-running application, and if I never threw away data (say, at least a day old with no updates), then it would grow infinitely large.
In the past, I would have done something like this:
delete_old_stuff if rand(300) == 0
and accomplish this using random numbers. However, this is not purely deterministic. I know that it will run approximately once every 300 evaluations (i.e. seconds), but it won't be exactly once every 300 times.
What I wrote up earlier looks like this:
counter = Enumerator.new do |y|
a = (0..300)
loop do
a.each do |b|
y.yield b
end
delete_old_stuff
end
end
and I can replace delete_old_stuff if rand(300) == 0 with counter.next
Now, I'm sure there is a more efficient or pre-made way of doing this, but being sparked to play with Enumerator::Yielder#yield by your question and the linked question, this is what I came up with.
It seems to be useful when you have multiple objects you want to enumerate over, but flat_map isn't suitable, and you want to chain the enumeration with another action:
module Enumerable
def count_by
items_grouped_by_criteria = group_by {|object| yield object}
counts = items_grouped_by_criteria.map{|key, array| [key, array.length]}
Hash[counts]
end
end
def calculate_letter_frequencies
each_letter.count_by {|letter| letter}
end
def each_letter
filenames = ["doc/Quickstart", "doc/Coding style"]
# Joining the text of each file into a single string would be memory-intensive
enumerator = Enumerator.new do |yielder|
filenames.each do |filename|
text = File.read(filename)
text.chars.each {|letter| yielder.yield(letter)}
end
end
enumerator
end
calculate_letter_frequencies

please help with my "shuffle" code in ruby

this is the question
Shuffle. Now that you’ve finished your
new sorting algorithm, how about the
opposite? Write a shuffle method that
takes an array and returns a totally
shuffled version. As always, you’ll
want to test it, but testing this one
is trickier: How can you test to make
sure you are getting a perfect
shuffle? What would you even say a
perfect shuffle would be? Now test for
it.
This is my code answer:
def shuffle arr
x = arr.length
while x != 0
new_arr = []
rand_arr = (rand(x))
x--
new_arr.push rand_arr
arr.pop rand_arr
end
new_arr
end
puts (shuffle ([1,2,3]))
What are my mistakes? Why doesn't this code work?
Here's a far more Rubyish version:
class Array
def shuffle!
size.downto(1) { |n| push delete_at(rand(n)) }
self
end
end
puts [1,2,3].shuffle!
Here's a more concise way of writing it:
def shuffle(arr)
new_arr = []
while (arr.any?) do
new_arr << arr.delete_at(rand(arr.length))
end
new_arr
end
And some tests:
5.times do
puts shuffle((1..5).to_a).join(',')
end
>> 4,2,1,3,5
>> 3,2,1,4,5
>> 4,2,5,1,3
>> 5,2,1,4,3
>> 4,3,1,5,2
Beside minor other errors you seems not to understand what pop and push are doing (taking or adding some items from the end of the array).
You are probably trying to write something like below.
def shuffle arr
x = arr.length
new_arr = []
while x != 0
randpos = rand(x)
x = x-1
item = arr[randpos]
new_arr.push item
arr[randpos] = arr[x]
arr.pop
end
new_arr
end
puts (shuffle ([1,2,3]))
You're getting your indexes mixed up with your values. When you do new_arr.push rand_arr, you're putting whatever random index you came up with as a value on the end of new_arr. What you meant to do is new_arr.push arr[rand_arr], where arr[rand_arr] is the value at the index rand_arr in arr.
Ruby 1.8.7 and 1.9.2 have a built-in Array#shuffle method.
A variant of Mark Thomas's answer. His algorithm can be quite slow with a large array, due to delete operation performance.
class Array
def shuffle!
size.downto(1) do |n|
index=rand(n)
# swap elements at index and the end
self[index], self[size-1] = self[size-1],self[index]
end
self
end
end
puts [1,2,3].shuffle!
This algorithm is O(size), while Mark's algorithm is O(size^2). On my computer, Mark's answer takes 400 seconds to shuffle an array of 1,000,000 elements on my machine, versus 0.5 seconds with my method.

Resources